Forum Stats

  • 3,855,433 Users
  • 2,264,502 Discussions
  • 7,905,993 Comments

Discussions

SCD 1 slow performance

ScoobySi
ScoobySi Member Posts: 423
edited Oct 22, 2008 4:56AM in Warehouse Builder
I have a SCD Type 1 dimension, the initial load was 5million+ rows. I'm now running a daily update which should insert/update 68 rows, however this is taking over 1hr. The 68 rows are in a staging table so there are no complexities with the source query, it's a straight map to the dimension.

The dimension has 3 hierarchies (5 levels, 3 levels and 4 levels) 2 of the levels are common across 2 of the hierarchies.

What indexing strategies do people follow for dimension tables, at the moment I only have the default PK index on the Dimension_Key? It looks as though a Merge is generated for each level, basically a HASH join between source and target.

Cheers
Si
Tagged:

Answers

  • Oleg
    Oleg Member Posts: 1,463
    Hi Si,

    Did you load all levels of dimension in single mapping?
    If yes how is loading time distributed across levels ()?
    What type of storage you use for dimension - Star of Snowflake?
  • ScoobySi
    ScoobySi Member Posts: 423
    edited Oct 21, 2008 10:27AM
    Hi Oleg

    Yes, I load the dimension in a single mapping, it's a straight source table to dimension mapping without any further transformation.

    Loading time seems similar across levels e.g. County, Town, Postcode etc. although I got bored and killed the job after about 1hr 15mins.

    I'm using a Star i.e. a single table for the dimension.

    Out of interest, I rebuilt my dimension so that it has one level (Household) and no hierarchy, the insert/update of the 68 rows then took approx. 1.5 mins.

    I was hoping to make use of the Oracle metadata relating to levels, hierarchies in Discoverer or BIEE but maybe I'll just have to live without it?

    Regards

    Si
  • Oleg
    Oleg Member Posts: 1,463
    I guess the number of selected/merged rows for each level (mapping stage) very close to number of rows in your dimension. Am I right?

    Oleg
  • ScoobySi
    ScoobySi Member Posts: 423
    edited Oct 21, 2008 11:00AM
    No I don't think so (if I've understod what you mean), I'm loading 68 rows, so for instance the maximum number of Towns/Postcodes etc. in the source is 68 out of potentially thousands in the dimension.

    From what I can see, I think the performance is being hit by the number of FULL SCANs done during the HASH joins, although I can't seem to influence the execution path with any indexes.

    Si
  • Oleg
    Oleg Member Posts: 1,463
    Si,
    for example, you have in your source table 2 distinct values for country level, but in your target dimension to these countries are linked 1 million rows.
    In this case OWB updates ALL 1 million rows. I think it may be a reason for bad performance.

    I looked at generated query and it seems that indexes on Surrogate and Business identifiers may be helpful.
    Don't forget to gather statistics on target table.
    As a variant try to use Snowflake variant of dimension.

    Can you show generated query for any level (not the lowest) and execution plan for this query?

    Regards,
    Oleg
    Oleg
  • ScoobySi
    ScoobySi Member Posts: 423
    edited Oct 22, 2008 4:56AM
    Oleg

    Looks as though you are correct, the MERGE statement for e.g. TOWN level will update every row in D_HOUSEHOLD for that Town. I've played around with Indexes but generally they're ignored but this is probably to be expected if a significant proportion of the rows are to be updated.

    This appears to be highly inefficient, not sure how anyone manages to process volumes if they use OWBs method for defining dimension hierarchies and levels?!?

    Cheer

    Si

    MERGE
    /*+ APPEND PARALLEL("PAF_TOWN") */
    INTO
    "D_HOUSEHOLD"
    USING
    (SELECT
    "INGRP1"."PAF_TOWN_ID" "PAF_TOWN_ID",
    "INGRP1"."PAF_TOWN_NAME" "PAF_TOWN_NAME",
    "D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_SK" "PAF_COUNTY_SK$2",
    "INGRP2"."TOTAL_TOWN_SK" "TOTAL_TOWN_SK$2",
    "INGRP2"."TOTAL_TOWN_DSC" "TOTAL_TOWN_DSC$3",
    "D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_DSC" "PAF_COUNTY_DSC$3",
    "D_HOUSEHOLD_PAF_COUNTY"."TOTAL_HOUSEHOLD_SK" "TOTAL_HOUSEHOLD_SK$4",
    "D_HOUSEHOLD_PAF_COUNTY"."TOTAL_HOUSEHOLD_DSC" "TOTAL_HOUSEHOLD_DSC$5",
    "INGRP2"."TOTAL_TOWN_ID" "TOTAL_TOWN_ID$3",
    "D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_ID" "PAF_COUNTY_ID$3",
    "D_HOUSEHOLD_PAF_COUNTY"."LOOKUP$$$_1_TOTAL_HOUSEHOLD_I" "LOOKUP$$$_1_TOTAL_HOUSEHOLD$1"
    FROM
    (SELECT
    NULL "PAF_TOWN_SK",
    "LOOKUP_INPUT_SUBQUERY$5"."PAF_TOWN_ID$1" "PAF_TOWN_ID",
    "LOOKUP_INPUT_SUBQUERY$5"."PAF_TOWN_NAME$1" "PAF_TOWN_NAME",
    "LOOKUP_INPUT_SUBQUERY$5"."LOOKUP$$$_1_PAF_COUNTY_ID" "LOOKUP$$$_1_PAF_COUNTY_ID",
    "LOOKUP_INPUT_SUBQUERY$5"."LOOKUP$$$_2_TOTAL_TOWN_ID" "LOOKUP$$$_2_TOTAL_TOWN_ID"
    FROM
    (SELECT
    "DEDUP_SRC_4"."PAF_TOWN_SK$1" "PAF_TOWN_SK",
    "DEDUP_SRC_4"."PAF_TOWN_ID$2" "PAF_TOWN_ID$1",
    "DEDUP_SRC_4"."PAF_TOWN_NAME$2" "PAF_TOWN_NAME$1",
    "DEDUP_SRC_4"."LOOKUP$$$_1_PAF_COUNTY_ID$1" "LOOKUP$$$_1_PAF_COUNTY_ID",
    "DEDUP_SRC_4"."LOOKUP$$$_2_TOTAL_TOWN_ID$1" "LOOKUP$$$_2_TOTAL_TOWN_ID"
    FROM
    (SELECT
    NULL/* DEDUP_SRC_4.OUTGRP1.PAF_TOWN_SK */ "PAF_TOWN_SK$1",
    "STG_D_HOUSEHOLD"."PAF_TOWN"/* DEDUP_SRC_4.OUTGRP1.PAF_TOWN_ID */ "PAF_TOWN_ID$2",
    MIN("STG_D_HOUSEHOLD"."PAF_TOWN") KEEP (DENSE_RANK FIRST ORDER BY NULL)/* DEDUP_SRC_4.OUTGRP1.PAF_TOWN_NAME */ "PAF_TOWN_NAME$2",
    MIN("STG_D_HOUSEHOLD"."PAF_COUNTY") KEEP (DENSE_RANK FIRST ORDER BY NULL)/* DEDUP_SRC_4.OUTGRP1.LOOKUP$$$_1_PAF_COUNTY_ID */ "LOOKUP$$$_1_PAF_COUNTY_ID$1",
    MIN("STG_D_HOUSEHOLD"."TOTAL_TOWN_ID") KEEP (DENSE_RANK FIRST ORDER BY NULL)/* DEDUP_SRC_4.OUTGRP1.LOOKUP$$$_2_TOTAL_TOWN_ID */ "LOOKUP$$$_2_TOTAL_TOWN_ID$1"
    FROM
    "STG_D_HOUSEHOLD" "STG_D_HOUSEHOLD"
    GROUP BY
    "STG_D_HOUSEHOLD"."PAF_TOWN",NULL,NULL/* D_HOUSEHOLD.DEDUP_SRC_4 */) "DEDUP_SRC_4") "LOOKUP_INPUT_SUBQUERY$5"
    WHERE
    ( NOT ( "LOOKUP_INPUT_SUBQUERY$5"."PAF_TOWN_ID$1" IS NULL ) )) "INGRP1",
    (SELECT
    "DEDUP_INPUT_SUBQUERY2$2"."TOTAL_TOWN_SK$3" "TOTAL_TOWN_SK",
    "DEDUP_INPUT_SUBQUERY2$2"."TOTAL_TOWN_ID$4" "TOTAL_TOWN_ID",
    "DEDUP_INPUT_SUBQUERY2$2"."TOTAL_TOWN_DSC$4" "TOTAL_TOWN_DSC"
    FROM
    (SELECT
    DISTINCT
    "D_HOUSEHOLD"."TOTAL_TOWN_SK" "TOTAL_TOWN_SK$3",
    "D_HOUSEHOLD"."TOTAL_TOWN_ID" "TOTAL_TOWN_ID$4",
    "D_HOUSEHOLD"."TOTAL_TOWN_DSC" "TOTAL_TOWN_DSC$4",
    "D_HOUSEHOLD"."DIMENSION_KEY" "DIMENSION_KEY$2"
    FROM
    "D_HOUSEHOLD" "D_HOUSEHOLD"
    WHERE
    ( "D_HOUSEHOLD"."DIMENSION_KEY" = "D_HOUSEHOLD"."TOTAL_TOWN_SK" ) AND
    ( "D_HOUSEHOLD"."TOTAL_TOWN_SK" IS NOT NULL )) "DEDUP_INPUT_SUBQUERY2$2") "INGRP2",
    (SELECT
    "DEDUP_INPUT_SUBQUERY2$3"."PAF_COUNTY_SK$3" "PAF_COUNTY_SK",
    "DEDUP_INPUT_SUBQUERY2$3"."PAF_COUNTY_ID$4" "PAF_COUNTY_ID",
    "DEDUP_INPUT_SUBQUERY2$3"."PAF_COUNTY_DSC$4" "PAF_COUNTY_DSC",
    "DEDUP_INPUT_SUBQUERY2$3"."LOOKUP$$$_1_TOTAL_HOUSEHOLD$2" "LOOKUP$$$_1_TOTAL_HOUSEHOLD_I",
    "DEDUP_INPUT_SUBQUERY2$3"."TOTAL_HOUSEHOLD_SK$5" "TOTAL_HOUSEHOLD_SK",
    "DEDUP_INPUT_SUBQUERY2$3"."TOTAL_HOUSEHOLD_DSC$6" "TOTAL_HOUSEHOLD_DSC"
    FROM
    (SELECT
    DISTINCT
    "D_HOUSEHOLD"."PAF_COUNTY_SK" "PAF_COUNTY_SK$3",
    "D_HOUSEHOLD"."PAF_COUNTY_ID" "PAF_COUNTY_ID$4",
    "D_HOUSEHOLD"."PAF_COUNTY_DSC" "PAF_COUNTY_DSC$4",
    "D_HOUSEHOLD"."TOTAL_HOUSEHOLD_ID" "LOOKUP$$$_1_TOTAL_HOUSEHOLD$2",
    "D_HOUSEHOLD"."TOTAL_HOUSEHOLD_SK" "TOTAL_HOUSEHOLD_SK$5",
    "D_HOUSEHOLD"."DIMENSION_KEY" "DIMENSION_KEY$3",
    "D_HOUSEHOLD"."TOTAL_HOUSEHOLD_DSC" "TOTAL_HOUSEHOLD_DSC$6"
    FROM
    "D_HOUSEHOLD" "D_HOUSEHOLD"
    WHERE
    ( "D_HOUSEHOLD"."DIMENSION_KEY" = "D_HOUSEHOLD"."PAF_COUNTY_SK" ) AND
    ( "D_HOUSEHOLD"."PAF_COUNTY_SK" IS NOT NULL )) "DEDUP_INPUT_SUBQUERY2$3") "D_HOUSEHOLD_PAF_COUNTY"
    WHERE
    ( "D_HOUSEHOLD_PAF_COUNTY"."PAF_COUNTY_ID" (+) = "INGRP1"."LOOKUP$$$_1_PAF_COUNTY_ID" ) AND
    ( "INGRP2"."TOTAL_TOWN_ID" (+) = "INGRP1"."LOOKUP$$$_2_TOTAL_TOWN_ID" )
    )
    MERGE_SUBQUERY$5
    ON (
    "D_HOUSEHOLD"."PAF_TOWN_ID" = "MERGE_SUBQUERY$5"."PAF_TOWN_ID"
    )

    WHEN MATCHED THEN
    UPDATE
    SET
    "PAF_TOWN_NAME" = "MERGE_SUBQUERY$5"."PAF_TOWN_NAME",
    "PAF_COUNTY_SK" = "MERGE_SUBQUERY$5"."PAF_COUNTY_SK$2",
    "TOTAL_TOWN_SK" = "MERGE_SUBQUERY$5"."TOTAL_TOWN_SK$2",
    "TOTAL_TOWN_DSC" = "MERGE_SUBQUERY$5"."TOTAL_TOWN_DSC$3",
    "PAF_COUNTY_DSC" = "MERGE_SUBQUERY$5"."PAF_COUNTY_DSC$3",
    "TOTAL_HOUSEHOLD_SK" = "MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_SK$4",
    "TOTAL_HOUSEHOLD_DSC" = "MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_DSC$5",
    "TOTAL_TOWN_ID" = "MERGE_SUBQUERY$5"."TOTAL_TOWN_ID$3",
    "PAF_COUNTY_ID" = "MERGE_SUBQUERY$5"."PAF_COUNTY_ID$3",
    "TOTAL_HOUSEHOLD_ID" = "MERGE_SUBQUERY$5"."LOOKUP$$$_1_TOTAL_HOUSEHOLD$1"

    WHEN NOT MATCHED THEN
    INSERT
    ("D_HOUSEHOLD"."PAF_TOWN_SK",
    "D_HOUSEHOLD"."PAF_TOWN_ID",
    "D_HOUSEHOLD"."PAF_TOWN_NAME",
    "D_HOUSEHOLD"."PAF_COUNTY_SK",
    "D_HOUSEHOLD"."TOTAL_TOWN_SK",
    "D_HOUSEHOLD"."TOTAL_TOWN_DSC",
    "D_HOUSEHOLD"."PAF_COUNTY_DSC",
    "D_HOUSEHOLD"."TOTAL_HOUSEHOLD_SK",
    "D_HOUSEHOLD"."TOTAL_HOUSEHOLD_DSC",
    "D_HOUSEHOLD"."TOTAL_TOWN_ID",
    "D_HOUSEHOLD"."PAF_COUNTY_ID",
    "D_HOUSEHOLD"."TOTAL_HOUSEHOLD_ID",
    "D_HOUSEHOLD"."DIMENSION_KEY")
    VALUES
    (-1 * ("D_HOUSEHOLD_SEQ".NEXTVAL)/* MULT_BY_NEG1_5.OUTGRP1.NEG_NEXTVAL */,
    "MERGE_SUBQUERY$5"."PAF_TOWN_ID",
    "MERGE_SUBQUERY$5"."PAF_TOWN_NAME",
    "MERGE_SUBQUERY$5"."PAF_COUNTY_SK$2",
    "MERGE_SUBQUERY$5"."TOTAL_TOWN_SK$2",
    "MERGE_SUBQUERY$5"."TOTAL_TOWN_DSC$3",
    "MERGE_SUBQUERY$5"."PAF_COUNTY_DSC$3",
    "MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_SK$4",
    "MERGE_SUBQUERY$5"."TOTAL_HOUSEHOLD_DSC$5",
    "MERGE_SUBQUERY$5"."TOTAL_TOWN_ID$3",
    "MERGE_SUBQUERY$5"."PAF_COUNTY_ID$3",
    "MERGE_SUBQUERY$5"."LOOKUP$$$_1_TOTAL_HOUSEHOLD$1",
    -1 * ("D_HOUSEHOLD_SEQ".CURRVAL)/* MULT_BY_NEG1_5.OUTGRP1.NEG_CURRVAL */)
This discussion has been closed.