2016-11-16 1 views
5

в исходном запросеУДАЛИТЬ производительность запроса

delete B from 
TABLE_BASE B , 
TABLE_INC I 
where B.ID = I.IDID and B.NUM = I.NUM; 

статистика Performanace для вышеупомянутого запроса

+-------------------+---------+-----------+ 
| Response Time | SumCPU | ImpactCPU | 
+-------------------+---------+-----------+ 
| 00:05:29.190000 | 2852 | 319672 | 
+-------------------+---------+-----------+ 

Оптимизированный Запрос 1

DEL FROM TABLE_BASE WHERE (ID, NUM) IN 
(SELECT ID, NUM FROM TABLE_INC); 

Статистика выше запроса

+-----------------+--------+-----------+ 
| QryRespTime | SumCPU | ImpactCPU | 
+-----------------+--------+-----------+ 
| 00:00:00.570000 | 15.42 |  49.92 | 
+-----------------+--------+-----------+ 

Оптимизированный Запрос 2

DELETE FROM TABLE_BASE B WHERE EXISTS 
(SELECT * FROM TABLE_INC I WHERE B.ID = I.ID AND B.NUM = I.NUM); 

Статистика для вышеупомянутого запроса

+-----------------+--------+-----------+ 
| QryRespTime | SumCPU | ImpactCPU | 
+-----------------+--------+-----------+ 
| 00:00:00.400000 | 11.96 |  44.93 | 
+-----------------+--------+-----------+ 

Мой вопрос -

  • Как/Почему оптимизированный запрос 1 и 2 значительно влияет на производительность?
  • Какова наилучшая практика для таких запросов DELETE?
  • Должен ли я выбирать Query 1 или Query 2? Какой из них идеален/лучше/надежнее? Я чувствую, что Query 1 будет идеальным, потому что вместо SELECT * я использую SELECT ID,NUM, сокращая только до двух столбцов, но Query 2 показывает лучшие результаты.

QUERY 1 

This query is optimized using type 2 profile T2_Linux64, profileid 21. 
    1) First, we lock TEMP_DB.TABLE_BASE for write on a 
    reserved RowHash to prevent global deadlock. 
    2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we 
    lock TEMP_DB.TABLE_BASE for write. 
    3) We execute the following steps in parallel. 
     1) We do an all-AMPs RETRIEVE step from 
      TEMP_DB.TABLE_BASE by way of an all-rows scan 
      with no residual conditions into Spool 2 (all_amps), which is 
      redistributed by the hash code of (
      TEMP_DB.TABLE_BASE.NUM, 
      TEMP_DB.TABLE_BASE.ID) to all AMPs. Then 
      we do a SORT to order Spool 2 by row hash. The size of Spool 
      2 is estimated with low confidence to be 168,480 rows (
      5,054,400 bytes). The estimated time for this step is 0.03 
      seconds. 
     2) We do an all-AMPs RETRIEVE step from 
      TEMP_DB_T.TABLE_INC by way of an all-rows scan 
      with no residual conditions into Spool 3 (all_amps), which is 
      redistributed by the hash code of (
      TEMP_DB_T.TABLE_INC.NUM, 
      TEMP_DB_T.TABLE_INC.ID) to all AMPs. Then 
      we do a SORT to order Spool 3 by row hash and the sort key in 
      spool field1 eliminating duplicate rows. The size of Spool 3 
      is estimated with high confidence to be 5,640 rows (310,200 
      bytes). The estimated time for this step is 0.03 seconds. 
    4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an 
    all-rows scan, which is joined to Spool 3 (Last Use) by way of an 
    all-rows scan. Spool 2 and Spool 3 are joined using an inclusion 
    merge join, with a join condition of ("(ID = ID) AND 
    (NUM = NUM)"). The result goes into Spool 1 (all_amps), 
    which is redistributed by the hash code of (
    TEMP_DB.TABLE_BASE.ROWID) to all AMPs. Then we do 
    a SORT to order Spool 1 by row hash and the sort key in spool 
    field1 eliminating duplicate rows. The size of Spool 1 is 
    estimated with no confidence to be 168,480 rows (3,032,640 bytes). 
    The estimated time for this step is 1.32 seconds. 
    5) We do an all-AMPs MERGE DELETE to 
    TEMP_DB.TABLE_BASE from Spool 1 (Last Use) via the 
    row id. The size is estimated with no confidence to be 168,480 
    rows. The estimated time for this step is 42.95 seconds. 
    6) We spoil the parser's dictionary cache for the table. 
    7) Finally, we send out an END TRANSACTION step to all AMPs involved 
    in processing the request. 
    -> No rows are returned to the user as the result of statement 1. 

QUERY 2 EXPLAIN PLAN 

This query is optimized using type 2 profile T2_Linux64, profileid 21. 
    1) First, we lock TEMP_DB.TABLE_BASE for write on a reserved RowHash to 
    prevent global deadlock. 
    2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we 
    lock TEMP_DB.TABLE_BASE for write. 
    3) We execute the following steps in parallel. 
     1) We do an all-AMPs RETRIEVE step from TEMP_DB.TABLE_BASE by way of 
      an all-rows scan with no residual conditions into Spool 2 
      (all_amps), which is redistributed by the hash code of (
      TEMP_DB.TABLE_BASE.NUM, TEMP_DB.TABLE_BASE.ID) to all AMPs. 
      Then we do a SORT to order Spool 2 by row hash. The size of 
      Spool 2 is estimated with low confidence to be 168,480 rows (
      5,054,400 bytes). The estimated time for this step is 0.03 
      seconds. 
     2) We do an all-AMPs RETRIEVE step from 
      TEMP_DB_T.TABLE_INC by way of an all-rows scan 
      with no residual conditions into Spool 3 (all_amps), which is 
      redistributed by the hash code of (
      TEMP_DB_T.TABLE_INC.NUM, 
      TEMP_DB_T.TABLE_INC.ID) to all AMPs. Then 
      we do a SORT to order Spool 3 by row hash and the sort key in 
      spool field1 eliminating duplicate rows. The size of Spool 3 
      is estimated with high confidence to be 5,640 rows (310,200 
      bytes). The estimated time for this step is 0.03 seconds. 
    4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an 
    all-rows scan, which is joined to Spool 3 (Last Use) by way of an 
    all-rows scan. Spool 2 and Spool 3 are joined using an inclusion 
    merge join, with a join condition of ("(NUM = NUM) AND 
    (ID = ID)"). The result goes into Spool 1 (all_amps), which 
    is redistributed by the hash code of (TEMP_DB.TABLE_BASE.ROWID) to all 
    AMPs. Then we do a SORT to order Spool 1 by row hash and the sort 
    key in spool field1 eliminating duplicate rows. The size of Spool 
    1 is estimated with no confidence to be 168,480 rows (3,032,640 
    bytes). The estimated time for this step is 1.32 seconds. 
    5) We do an all-AMPs MERGE DELETE to TEMP_DB.TABLE_BASE from Spool 1 (Last 
    Use) via the row id. The size is estimated with no confidence to 
    be 168,480 rows. The estimated time for this step is 42.95 
    seconds. 
    6) We spoil the parser's dictionary cache for the table. 
    7) Finally, we send out an END TRANSACTION step to all AMPs involved 
    in processing the request. 
    -> No rows are returned to the user as the result of statement 1. 

Для TABLE_BASE

+----------------+----------+ 
| table_bytes | skewness | 
+----------------+----------+ 
| 16842085888.00 | 22.78 | 
+----------------+----------+ 

Для TABLE_INC

+-------------+----------+ 
| table_bytes | skewness | 
+-------------+----------+ 
| 5317120.00 | 44.52 | 
+-------------+----------+ 
+0

(1) TD версию? (2) Объемы, объемы и асимметрия таблиц? (3) Пожалуйста, добавьте все планы выполнения –

+0

'... ГДЕ (ID, NUM) IN ...' В какой версии SQL это возможно? –

+0

@ChristophStaudinger: Это так называемый * многостолбцовый подзапрос *, некоторые СУБД реализовали его. – dnoeth

ответ

1

Что отношение между TABLE_BASE и TABLE_INC?

Если это один-ко-многим Q1, возможно, создается огромная катушка, а Q2 & 3 может применяться DISTINCT перед соединением.

Относительно IN против EXISTS не должно быть никакой разницы, вы проверили dbc.QryLogStepsV?

Edit:

Если (ID,Num) является PI целевой таблицы переписывания к MERGE DELETE должно обеспечить наилучшую производительность:

MERGE INTO TABLE_BASE AS tgt 
USING TABLE_INC AS src 
ON src.ID = tgt.ID, 
AND src.Num = tgt.Num 
WHEN MATCHED 
THE DELETE 
+0

Отношение ОДНО-ОДНО. Я не проверял 'dbc.QryLogStepsV'. Что я должен искать? –

+0

@PirateX: количество строк/CPU/IO за шаг. Btw, если отношение равно 1: 1, фактическое количество удаленных строк должно быть 5 640. – dnoeth

+0

@dnoeth Вы можете объяснить разницу между исходным запросом OP (удалить с соединением) и слить? почему слияние происходит быстрее? –

Смежные вопросы