У меня есть один выпуск. У меня есть таблица ниже, где 6 столбцов были взяты как PI. Тем не менее, к нему обычно обращается LXSTATE_ID. Проблема заключается в том, что LXSTATE_ID имеет около 8 миллионов дубликатов, и я не вижу другого столбца, который достаточно уникален для использования в PI. В таблице содержится около 215 миллионов записей, и я делаю MINUS между этапом и массой для захвата измененных записей. Это бросает проблему с пространственным пространством. Что можно сделать здесь?PI column выпуск
SHOW TABLE GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1;
CREATE MULTISET TABLE GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
LXSTATE_ID VARCHAR(4000) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'LXSTATE_ID' NOT NULL,
BUS_OBJ_OID INTEGER TITLE 'BUS_OBJ_OID',
MXSTATEREQ_OID INTEGER TITLE 'MXSTATEREQ_OID',
ACTUAL_DT_GMT TIMESTAMP(0) TITLE 'ACTUAL_DT_GMT',
START_DT_GMT TIMESTAMP(0) TITLE 'START_DT_GMT',
END_DT_GMT TIMESTAMP(0) TITLE 'END_DT_GMT',
DW_LOAD_DATE TIMESTAMP(0) TITLE 'DW_LOAD_DATE',
DW_CREATED_BY VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'DW_CREATED_BY',
DW_UPDATED_DATE TIMESTAMP(0) TITLE 'DW_UPDATED_DATE',
DW_UPDATED_BY VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'DW_UPDATED_BY')
PRIMARY INDEX CDR_ODS_LXSTATE_398850F1_S_PK (LXSTATE_ID ,BUS_OBJ_OID ,
MXSTATEREQ_OID ,ACTUAL_DT_GMT ,START_DT_GMT ,END_DT_GMT);
Вот МИНУС запрос: ЗДЕСЬ VT_LXSTATE_398850F1 является летучим таблица, в которой измененные записи перехватываются.
INSERT INTO VT_LXSTATE_398850F1
(
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
)
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_PLP_S.CDR_ODS_LXSTATE_398850F1_S
MINUS
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1;
Ниже объяснить план для вставки: Ниже объяснить план INSERT.
Explain INSERT INTO VT_LXSTATE_398850F1
(
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
)
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP
MINUS
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP;
1) First, we lock a distinct GEEDW_D_PLP_S."pseudo table" for read on
a RowHash to prevent global deadlock for
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.
2) Next, we lock a distinct GEEDW_D_PLM_ODS_BULK_T."pseudo table" for
read on a RowHash to prevent global deadlock for
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.
3) We lock GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP for read, and
we lock GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP for
read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP by way of an
all-rows scan with no residual conditions into Spool 2
(all_amps), which is redistributed by the hash code of (
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.END_DT_GMT,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.START_DT_GMT,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.ACTUAL_DT_GMT,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.MXSTATEREQ_OID,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.BUS_OBJ_OID,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.LXSTATE_ID) to
all AMPs. Then we do a SORT to order Spool 2 by row hash and
the sort key in spool field1 eliminating duplicate rows. The
input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with no
confidence to be 322,724,040 rows (1,755,618,777,600 bytes).
The estimated time for this step is 1 hour and 55 minutes.
2) We do an all-AMPs RETRIEVE step from
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP by way of
an all-rows scan with no residual conditions into Spool 3
(all_amps), which is redistributed by the hash code of (
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.END_DT_GMT,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.START_DT_GMT,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.ACTUAL_DT_GMT,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.MXSTATEREQ_OID,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.BUS_OBJ_OID,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.LXSTATE_ID)
to all AMPs. Then we do a SORT to order Spool 3 by row hash
and the sort key in spool field1 eliminating duplicate rows.
The input table will not be cached in memory, but it is
eligible for synchronized scanning. The result spool file
will not be cached in memory. The size of Spool 3 is
estimated with no confidence to be 161,362,020 rows (
877,809,388,800 bytes). The estimated time for this step is
56 minutes and 33 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using an exclusion
merge join, with a join condition of ("Field_1 = Field_1"). The
result goes into Spool 1 (all_amps), which is built locally on the
AMPs. The size of Spool 1 is estimated with no confidence to be
242,043,030 rows (1,316,714,083,200 bytes). The estimated time
for this step is 9 minutes and 11 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 4 (all_amps), which is redistributed
by the hash code of (
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.LXSTATE_ID,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.MXSTATEREQ_OID) to
all AMPs. Then we do a SORT to order Spool 4 by row hash. The
result spool file will not be cached in memory. The size of Spool
4 is estimated with no confidence to be 242,043,030 rows (
331,114,865,040 bytes). The estimated time for this step is 59
minutes and 11 seconds.
7) We do an all-AMPs MERGE into "502332938".VT_LXSTATE_398850F1 from
Spool 4 (Last Use). The size is estimated with no confidence to
be 242,043,030 rows. **The estimated time for this step is 19 hours
and 53 minutes.**
8) We spoil the parser's dictionary cache for the table.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
Я пытался принимать другие столбцы в PI распределение выглядит хорошо, но запрос бросает вопрос золотника при вставке в неустойчивом table – user3901666
Имеется ли в таблице Volatile таблица PI, соответствующая таблицам, участвующим в MINUS? Какой шаг в EXPLAIN для инструкции INSERT - это запрос на запуск катушки? Можете ли вы опубликовать вывод EXPLAIN для инструкции INSERT? –
Да, волатильная таблица имеет соответствие PI с таблицами, участвующими в MINUS. – user3901666