Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 63 Insurance
- 535.7K On-Premises Infrastructure
- 138.1K Analytics Software
- 38.6K Application Development Software
- 5.6K Cloud Platform
- 109.3K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
SCD 1 slow performance

ScoobySi
Member Posts: 423
I have a SCD Type 1 dimension, the initial load was 5million+ rows. I'm now running a daily update which should insert/update 68 rows, however this is taking over 1hr. The 68 rows are in a staging table so there are no complexities with the source query, it's a straight map to the dimension.
The dimension has 3 hierarchies (5 levels, 3 levels and 4 levels) 2 of the levels are common across 2 of the hierarchies.
What indexing strategies do people follow for dimension tables, at the moment I only have the default PK index on the Dimension_Key? It looks as though a Merge is generated for each level, basically a HASH join between source and target.
Cheers
Si
The dimension has 3 hierarchies (5 levels, 3 levels and 4 levels) 2 of the levels are common across 2 of the hierarchies.
What indexing strategies do people follow for dimension tables, at the moment I only have the default PK index on the Dimension_Key? It looks as though a Merge is generated for each level, basically a HASH join between source and target.
Cheers
Si
Tagged:
Answers
-
Hi Si,
Did you load all levels of dimension in single mapping?
If yes how is loading time distributed across levels ()?
What type of storage you use for dimension - Star of Snowflake? -
Hi Oleg
Yes, I load the dimension in a single mapping, it's a straight source table to dimension mapping without any further transformation.
Loading time seems similar across levels e.g. County, Town, Postcode etc. although I got bored and killed the job after about 1hr 15mins.
I'm using a Star i.e. a single table for the dimension.
Out of interest, I rebuilt my dimension so that it has one level (Household) and no hierarchy, the insert/update of the 68 rows then took approx. 1.5 mins.
I was hoping to make use of the Oracle metadata relating to levels, hierarchies in Discoverer or BIEE but maybe I'll just have to live without it?
Regards
Si -
I guess the number of selected/merged rows for each level (mapping stage) very close to number of rows in your dimension. Am I right?
Oleg -
No I don't think so (if I've understod what you mean), I'm loading 68 rows, so for instance the maximum number of Towns/Postcodes etc. in the source is 68 out of potentially thousands in the dimension.
From what I can see, I think the performance is being hit by the number of FULL SCANs done during the HASH joins, although I can't seem to influence the execution path with any indexes.
Si -
Si,
for example, you have in your source table 2 distinct values for country level, but in your target dimension to these countries are linked 1 million rows.
In this case OWB updates ALL 1 million rows. I think it may be a reason for bad performance.
I looked at generated query and it seems that indexes on Surrogate and Business identifiers may be helpful.
Don't forget to gather statistics on target table.
As a variant try to use Snowflake variant of dimension.
Can you show generated query for any level (not the lowest) and execution plan for this query?
Regards,
Oleg -
Oleg
Looks as though you are correct, the MERGE statement for e.g. TOWN level will update every row in D_HOUSEHOLD for that Town. I've played around with Indexes but generally they're ignored but this is probably to be expected if a significant proportion of the rows are to be updated.
This appears to be highly inefficient, not sure how anyone manages to process volumes if they use OWBs method for defining dimension hierarchies and levels?!?
Cheer
Si
MERGE
/*+ APPEND PARALLEL("PAF_TOWN") */
INTO
"D_HOUSEHOLD"
USING
(SELECT
"INGRP1"."PAF_TOWN_ID" "PAF_TOWN_ID",
"INGRP1"."PAF_TOWN_NAME" "PAF_TOWN_NAME",
"D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_SK" "PAF_COUNTY_SK$2",
"INGRP2"."TOTAL_TOWN_SK" "TOTAL_TOWN_SK$2",
"INGRP2"."TOTAL_TOWN_DSC" "TOTAL_TOWN_DSC$3",
"D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_DSC" "PAF_COUNTY_DSC$3",
"D_HOUSEHOLD_PAF_COUNTY"."TOTAL_HOUSEHOLD_SK" "TOTAL_HOUSEHOLD_SK$4",
"D_HOUSEHOLD_PAF_COUNTY"."TOTAL_HOUSEHOLD_DSC" "TOTAL_HOUSEHOLD_DSC$5",
"INGRP2"."TOTAL_TOWN_ID" "TOTAL_TOWN_ID$3",
"D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_ID" "PAF_COUNTY_ID$3",
"D_HOUSEHOLD_PAF_COUNTY"."LOOKUP$$$_1_TOTAL_HOUSEHOLD_I" "LOOKUP$$$_1_TOTAL_HOUSEHOLD$1"
FROM
(SELECT
NULL "PAF_TOWN_SK",
"LOOKUP_INPUT_SUBQUERY$5"."PAF_TOWN_ID$1" "PAF_TOWN_ID",
"LOOKUP_INPUT_SUBQUERY$5"."PAF_TOWN_NAME$1" "PAF_TOWN_NAME",
"LOOKUP_INPUT_SUBQUERY$5"."LOOKUP$$$_1_PAF_COUNTY_ID" "LOOKUP$$$_1_PAF_COUNTY_ID",
"LOOKUP_INPUT_SUBQUERY$5"."LOOKUP$$$_2_TOTAL_TOWN_ID" "LOOKUP$$$_2_TOTAL_TOWN_ID"
FROM
(SELECT
"DEDUP_SRC_4"."PAF_TOWN_SK$1" "PAF_TOWN_SK",
"DEDUP_SRC_4"."PAF_TOWN_ID$2" "PAF_TOWN_ID$1",
"DEDUP_SRC_4"."PAF_TOWN_NAME$2" "PAF_TOWN_NAME$1",
"DEDUP_SRC_4"."LOOKUP$$$_1_PAF_COUNTY_ID$1" "LOOKUP$$$_1_PAF_COUNTY_ID",
"DEDUP_SRC_4"."LOOKUP$$$_2_TOTAL_TOWN_ID$1" "LOOKUP$$$_2_TOTAL_TOWN_ID"
FROM
(SELECT
NULL/* DEDUP_SRC_4.OUTGRP1.PAF_TOWN_SK */ "PAF_TOWN_SK$1",
"STG_D_HOUSEHOLD"."PAF_TOWN"/* DEDUP_SRC_4.OUTGRP1.PAF_TOWN_ID */ "PAF_TOWN_ID$2",
MIN("STG_D_HOUSEHOLD"."PAF_TOWN") KEEP (DENSE_RANK FIRST ORDER BY NULL)/* DEDUP_SRC_4.OUTGRP1.PAF_TOWN_NAME */ "PAF_TOWN_NAME$2",
MIN("STG_D_HOUSEHOLD"."PAF_COUNTY") KEEP (DENSE_RANK FIRST ORDER BY NULL)/* DEDUP_SRC_4.OUTGRP1.LOOKUP$$$_1_PAF_COUNTY_ID */ "LOOKUP$$$_1_PAF_COUNTY_ID$1",
MIN("STG_D_HOUSEHOLD"."TOTAL_TOWN_ID") KEEP (DENSE_RANK FIRST ORDER BY NULL)/* DEDUP_SRC_4.OUTGRP1.LOOKUP$$$_2_TOTAL_TOWN_ID */ "LOOKUP$$$_2_TOTAL_TOWN_ID$1"
FROM
"STG_D_HOUSEHOLD" "STG_D_HOUSEHOLD"
GROUP BY
"STG_D_HOUSEHOLD"."PAF_TOWN",NULL,NULL/* D_HOUSEHOLD.DEDUP_SRC_4 */) "DEDUP_SRC_4") "LOOKUP_INPUT_SUBQUERY$5"
WHERE
( NOT ( "LOOKUP_INPUT_SUBQUERY$5"."PAF_TOWN_ID$1" IS NULL ) )) "INGRP1",
(SELECT
"DEDUP_INPUT_SUBQUERY2$2"."TOTAL_TOWN_SK$3" "TOTAL_TOWN_SK",
"DEDUP_INPUT_SUBQUERY2$2"."TOTAL_TOWN_ID$4" "TOTAL_TOWN_ID",
"DEDUP_INPUT_SUBQUERY2$2"."TOTAL_TOWN_DSC$4" "TOTAL_TOWN_DSC"
FROM
(SELECT
DISTINCT
"D_HOUSEHOLD"."TOTAL_TOWN_SK" "TOTAL_TOWN_SK$3",
"D_HOUSEHOLD"."TOTAL_TOWN_ID" "TOTAL_TOWN_ID$4",
"D_HOUSEHOLD"."TOTAL_TOWN_DSC" "TOTAL_TOWN_DSC$4",
"D_HOUSEHOLD"."DIMENSION_KEY" "DIMENSION_KEY$2"
FROM
"D_HOUSEHOLD" "D_HOUSEHOLD"
WHERE
( "D_HOUSEHOLD"."DIMENSION_KEY" = "D_HOUSEHOLD"."TOTAL_TOWN_SK" ) AND
( "D_HOUSEHOLD"."TOTAL_TOWN_SK" IS NOT NULL )) "DEDUP_INPUT_SUBQUERY2$2") "INGRP2",
(SELECT
"DEDUP_INPUT_SUBQUERY2$3"."PAF_COUNTY_SK$3" "PAF_COUNTY_SK",
"DEDUP_INPUT_SUBQUERY2$3"."PAF_COUNTY_ID$4" "PAF_COUNTY_ID",
"DEDUP_INPUT_SUBQUERY2$3"."PAF_COUNTY_DSC$4" "PAF_COUNTY_DSC",
"DEDUP_INPUT_SUBQUERY2$3"."LOOKUP$$$_1_TOTAL_HOUSEHOLD$2" "LOOKUP$$$_1_TOTAL_HOUSEHOLD_I",
"DEDUP_INPUT_SUBQUERY2$3"."TOTAL_HOUSEHOLD_SK$5" "TOTAL_HOUSEHOLD_SK",
"DEDUP_INPUT_SUBQUERY2$3"."TOTAL_HOUSEHOLD_DSC$6" "TOTAL_HOUSEHOLD_DSC"
FROM
(SELECT
DISTINCT
"D_HOUSEHOLD"."PAF_COUNTY_SK" "PAF_COUNTY_SK$3",
"D_HOUSEHOLD"."PAF_COUNTY_ID" "PAF_COUNTY_ID$4",
"D_HOUSEHOLD"."PAF_COUNTY_DSC" "PAF_COUNTY_DSC$4",
"D_HOUSEHOLD"."TOTAL_HOUSEHOLD_ID" "LOOKUP$$$_1_TOTAL_HOUSEHOLD$2",
"D_HOUSEHOLD"."TOTAL_HOUSEHOLD_SK" "TOTAL_HOUSEHOLD_SK$5",
"D_HOUSEHOLD"."DIMENSION_KEY" "DIMENSION_KEY$3",
"D_HOUSEHOLD"."TOTAL_HOUSEHOLD_DSC" "TOTAL_HOUSEHOLD_DSC$6"
FROM
"D_HOUSEHOLD" "D_HOUSEHOLD"
WHERE
( "D_HOUSEHOLD"."DIMENSION_KEY" = "D_HOUSEHOLD"."PAF_COUNTY_SK" ) AND
( "D_HOUSEHOLD"."PAF_COUNTY_SK" IS NOT NULL )) "DEDUP_INPUT_SUBQUERY2$3") "D_HOUSEHOLD_PAF_COUNTY"
WHERE
( "D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_ID" (+) = "INGRP1"."LOOKUP$$$_1_PAF_COUNTY_ID" ) AND
( "INGRP2"."TOTAL_TOWN_ID" (+) = "INGRP1"."LOOKUP$$$_2_TOTAL_TOWN_ID" )
)
MERGE_SUBQUERY$5
ON (
"D_HOUSEHOLD"."PAF_TOWN_ID" = "MERGE_SUBQUERY$5"."PAF_TOWN_ID"
)
WHEN MATCHED THEN
UPDATE
SET
"PAF_TOWN_NAME" = "MERGE_SUBQUERY$5"."PAF_TOWN_NAME",
"PAF_COUNTY_SK" = "MERGE_SUBQUERY$5"."PAF_COUNTY_SK$2",
"TOTAL_TOWN_SK" = "MERGE_SUBQUERY$5"."TOTAL_TOWN_SK$2",
"TOTAL_TOWN_DSC" = "MERGE_SUBQUERY$5"."TOTAL_TOWN_DSC$3",
"PAF_COUNTY_DSC" = "MERGE_SUBQUERY$5"."PAF_COUNTY_DSC$3",
"TOTAL_HOUSEHOLD_SK" = "MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_SK$4",
"TOTAL_HOUSEHOLD_DSC" = "MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_DSC$5",
"TOTAL_TOWN_ID" = "MERGE_SUBQUERY$5"."TOTAL_TOWN_ID$3",
"PAF_COUNTY_ID" = "MERGE_SUBQUERY$5"."PAF_COUNTY_ID$3",
"TOTAL_HOUSEHOLD_ID" = "MERGE_SUBQUERY$5"."LOOKUP$$$_1_TOTAL_HOUSEHOLD$1"
WHEN NOT MATCHED THEN
INSERT
("D_HOUSEHOLD"."PAF_TOWN_SK",
"D_HOUSEHOLD"."PAF_TOWN_ID",
"D_HOUSEHOLD"."PAF_TOWN_NAME",
"D_HOUSEHOLD"."PAF_COUNTY_SK",
"D_HOUSEHOLD"."TOTAL_TOWN_SK",
"D_HOUSEHOLD"."TOTAL_TOWN_DSC",
"D_HOUSEHOLD"."PAF_COUNTY_DSC",
"D_HOUSEHOLD"."TOTAL_HOUSEHOLD_SK",
"D_HOUSEHOLD"."TOTAL_HOUSEHOLD_DSC",
"D_HOUSEHOLD"."TOTAL_TOWN_ID",
"D_HOUSEHOLD"."PAF_COUNTY_ID",
"D_HOUSEHOLD"."TOTAL_HOUSEHOLD_ID",
"D_HOUSEHOLD"."DIMENSION_KEY")
VALUES
(-1 * ("D_HOUSEHOLD_SEQ".NEXTVAL)/* MULT_BY_NEG1_5.OUTGRP1.NEG_NEXTVAL */,
"MERGE_SUBQUERY$5"."PAF_TOWN_ID",
"MERGE_SUBQUERY$5"."PAF_TOWN_NAME",
"MERGE_SUBQUERY$5"."PAF_COUNTY_SK$2",
"MERGE_SUBQUERY$5"."TOTAL_TOWN_SK$2",
"MERGE_SUBQUERY$5"."TOTAL_TOWN_DSC$3",
"MERGE_SUBQUERY$5"."PAF_COUNTY_DSC$3",
"MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_SK$4",
"MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_DSC$5",
"MERGE_SUBQUERY$5"."TOTAL_TOWN_ID$3",
"MERGE_SUBQUERY$5"."PAF_COUNTY_ID$3",
"MERGE_SUBQUERY$5"."LOOKUP$$$_1_TOTAL_HOUSEHOLD$1",
-1 * ("D_HOUSEHOLD_SEQ".CURRVAL)/* MULT_BY_NEG1_5.OUTGRP1.NEG_CURRVAL */)
This discussion has been closed.