У меня возникли проблемы с попыткой ускорить запрос, который занимает около 11 секунд всего на 2 миллиона строк. Here is a link to my sqlfiddle. И вот инструкция, которую я пытаюсь запустить, и оператор EXPLAIN.Как ускорить оператор Group By с несколькими соединениями?
Запросов:
SELECT crawl.pk Pk,domains.domain Domain,
CONCAT(schemes.scheme, "://", domains.domain, remainders.remainder) Uri,
crawl.redirect Redirect FROM crawl
LEFT JOIN dates ON crawl.date_crawled=dates.pk
LEFT JOIN schemes ON crawl.scheme=schemes.pk
LEFT JOIN domains ON crawl.domain=domains.pk
LEFT JOIN remainders ON crawl.remainder=remainders.pk
WHERE (dates.date < CURDATE() - INTERVAL 30 DAY)
AND crawl.redirect=0
GROUP BY crawl.domain
ORDER BY crawl.date_crawled ASC
LIMIT 50
EXPLAIN:
+----+-------------+------------+--------+-----------------------+-----------------------+---------+----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------+-----------------------+---------+----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | dates | ALL | PRIMARY,date | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | crawl | ref | date_crawled_redirect | date_crawled_redirect | 8 | mytable.dates.pk,const | 408644 | |
| 1 | SIMPLE | schemes | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.scheme | 1 | |
| 1 | SIMPLE | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | SIMPLE | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
+----+-------------+------------+--------+-----------------------+-----------------------+---------+----------------------------+--------+----------------------------------------------+
5 rows in set (2.26 sec)
EDIT # 1: Согласно комментариям я заменил левый Присоединяется ж/Соединения и перенесли дату фильтр объединения , К сожалению, это не сократило время запроса.
SELECT crawl.pk Pk,domains.domain Domain, CONCAT(schemes.scheme, "://", domains.domain, remainders.remainder) Uri, crawl.redirect Redirect
FROM crawl
JOIN schemes ON crawl.scheme=schemes.pk
JOIN domains ON crawl.domain=domains.pk
JOIN remainders ON crawl.remainder=remainders.pk
JOIN dates ON crawl.date_crawled=dates.pk AND dates.date < CURDATE() - INTERVAL 30 DAY
WHERE crawl.redirect=0
GROUP BY crawl.domain
ORDER BY crawl.date_crawled ASC
LIMIT 50
EDIT # 2: Мой обновленный Объясняю:
+----+-------------+------------+--------+---------------------------------------------------------+-----------------------+---------+----------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------------------------------------------------+-----------------------+---------+----------------------------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | dates | range | PRIMARY,date,date_pk,dateBtreeIdx,pk | date_pk | 3 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | crawl | ref | domain_remainder,remainder,scheme,date_crawled_redirect | date_crawled_redirect | 8 | mytable.dates.pk,const | 408644 | |
| 1 | SIMPLE | schemes | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using where; Using join buffer |
| 1 | SIMPLE | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | SIMPLE | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
+----+-------------+------------+--------+---------------------------------------------------------+-----------------------+---------+----------------------------+--------+-----------------------------------------------------------+
EDIT # 3
+----+--------------------+------------+-----------------+------------------------------------------+---------+---------+----------------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-----------------+------------------------------------------+---------+---------+----------------------------+---------+---------------------------------+
| 1 | PRIMARY | schemes | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using temporary; Using filesort |
| 1 | PRIMARY | crawl | ref | domain_remainder,remainder,scheme,domain | scheme | 4 | mytable.schemes.pk | 1448223 | Using where |
| 1 | PRIMARY | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | PRIMARY | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
| 2 | DEPENDENT SUBQUERY | dates | unique_subquery | PRIMARY,date,date_pk,dateBtreeIdx,pk | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+------------+-----------------+------------------------------------------+---------+---------+----------------------------+---------+---------------------------------+
5 rows in set (0.04 sec)
EDIT # 4:
+----+-------------+------------+--------+--------------------------------------+-------------------------+---------+----------------------------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------------------+-------------------------+---------+----------------------------+---------+-----------------------------------------------------------+
| 1 | SIMPLE | dates | range | PRIMARY,date,date_pk,dateBtreeIdx,pk | date_pk | 3 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | schemes | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using join buffer |
| 1 | SIMPLE | crawl | ref | scheme_domain_remainder | scheme_domain_remainder | 4 | mytable.schemes.pk | 1455517 | Using where |
| 1 | SIMPLE | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | SIMPLE | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
+----+-------------+------------+--------+--------------------------------------+-------------------------+---------+----------------------------+---------+-----------------------------------------------------------+
5 rows in set (0.04 sec)
EDIT # 5
SELECT urls.pk PK, domains.domain Domain, CONCAT(schemes.scheme, "://", domains.domain, remainders.remainder) Uri, urls.redirect Redirect, urls.date_crawled DC FROM
(SELECT * FROM (
SELECT * FROM crawl as urls ORDER BY date_crawled ASC
) AS tmp GROUP BY tmp.domain) as urls
JOIN schemes ON urls.scheme=schemes.pk
JOIN domains ON urls.domain=domains.pk
JOIN remainders ON urls.remainder=remainders.pk
JOIN dates ON urls.date_crawled=dates.pk AND dates.date < CURDATE() - INTERVAL 30 DAY
WHERE urls.redirect=0
ORDER BY urls.date_crawled ASC
LIMIT 50
Просто, чтобы подтвердить .. Вы хотите, чтобы все записи более 30 дней ??? или вы имели в виду>, чтобы получить все записи в течение последних 30 дней.Ваши индексы в противном случае выглядят нормально – DRapp
50 записей, которые старше 30 дней. –
Вам нужны левые соединения для всех этих? и попробуйте поставить условия в ГДЕ, чтобы быть частью ваших объединений. –