Forum Stats

  • 3,875,729 Users
  • 2,266,956 Discussions
  • 7,912,321 Comments

Discussions

select's MODEL clause to add columns on the resultset, not just new rows?

Ruslan D.
Ruslan D. Member Posts: 25
edited Jul 27, 2016 8:14PM in General Database Discussions

PIVOT can only pivot one value into its one column.

Also PIVOT is not helpful if for examle we want to pivot and (group by) intersecting ranges of values.

So we're planning to use MODEL clause to overcome this limits.

We have to pivot into columns:

- sum(), count() of data over separate quarters for last 2 years and present each quarter's result into its own set of column;

- sum(), count() of data over separate years for last 2 years and present each years's result into its own set of column;

- sum(), count() of data over separate months for last 3  months and present each month's result into its own set of column.

The list of quarters / years / months in the bucketing is fixed and can be hard-coded in the MODEL clause (i.e. it'll be always to look back 2 years).

I have a working prototype doing above using three separate pivots, but it is very inefficient because each pivot has to pass

our huge dataset again.

We expect MODEL might require just one pass over the dataset.

Questions:

- Is it possible for MODEL clause to create new columns (ie. we group by month and product category in a subquery, but MODEL should add columns for the above quarter/year/month buckets)?

- Can't think of a best way to define DIMENSION BY and MEASURES so we could create new columns..

Any ideas are highly appreciated.

Tagged:

Answers

  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited Jul 26, 2016 3:25PM
    Ruslan D. wrote:
    
    PIVOT can only pivot one value into its one column.
    Also PIVOT is not helpful if for examle we want to pivot and (group by) intersecting ranges of values.
    
    So we're planning to use MODEL clause to overcome this limits.
    
    We have to pivot into columns:
    - sum(), count() of data over separate quarters for last 2 years and present each quarter's result into its own set of column;
    - sum(), count() of data over separate years for last 2 years and present each years's result into its own set of column;
    - sum(), count() of data over separate months for last 3  months and present each month's result into its own set of column.
    The list of quarters / years / months in the bucketing is fixed and can be hard-coded in the MODEL clause (i.e. it'll be always to look back 2 years).
    
    I have a working prototype doing above using three separate pivots, but it is very inefficient because each pivot has to pass
    our huge dataset again.
    
    We expect MODEL might require just one pass over the dataset.
    
    Questions:
    - Is it possible for MODEL clause to create new columns (ie. we group by month and product category in a subquery, but MODEL should add columns for the above quarter/year/month buckets)?
    - Can't think of a best way to define DIMENSION BY and MEASURES so we could create new columns..
    
    Any ideas are highly appreciated.
    

    I wouldn't abandon the PIVOT query, it is made EXACTLY for your situation.

    Please post the real query, with the real execution plan. I would guess it can be written so that only one pass of the table is required.

  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited Jul 27, 2016 1:46AM

    Please post the real query that you have run with separate pivots. Include the real execution plan and all ddl required.

    I stand by my previous statement, you haven't given any information that proves it's not possible.

  • GregV
    GregV Member Posts: 3,106 Gold Crown
    edited Jul 27, 2016 5:46AM

    Hi,

    With a MODEL clause you can define new columns, defined most likely as MEASURES. But I'm not sure a MODEL clause is required here. Can you provide the necessary sample data so that we can offer suggestions?

  • Ruslan D.
    Ruslan D. Member Posts: 25
    edited Jul 27, 2016 6:18PM

    I tried to create a simple small test case that includes data and shows quickly that two pivots will do two full scans:

    with q as

    (

        select 1 as d, 10 as hhid, 5 as dollars from dual union all

        select 2     , 11        , 6            from dual union all

        select 3     , 12        , 7            from dual union all

        select 4     , 13        , 8            from dual union all

        select 5     , 14        , 9            from dual union all

        select 6     , 15        , 0            from dual union all

        select 7     , 10        , 1            from dual union all

        select 7     , 11        , 2            from dual union all

        select 7     , 12        , 3            from dual union all

        select 7     , 13        , 4            from dual

    )

    , q2 as   -- adding overlapping bucket ids:

    (

      SELECT q.*

          , case when d between 1 and 3 then 1

                 when d=4               then 2

                 when d between 5 and 6 then 3

                 when d=7               then 2

            end quarter_bucket_id

          ,

            case when d between 1 and 4 then 1

                 when d between 5 and 6 then 2

            end year_bucket_id

      FROM q

    )

    , pivot_by_quarter as

    (

    SELECT * FROM q2

        PIVOT

         (  SUM(dollars) as sum_dollars, COUNT(*) AS count_orders, COUNT(DISTINCT hhid) as dcnt_hhid

            FOR (quarter_bucket_id) IN (1 AS q1, 2 AS q2, 3 AS q3, 4 as q4)

         )

    )

    , pivot_by_year as

    (

    SELECT * FROM q2

        PIVOT

         (  SUM(dollars) as sum_dollars, COUNT(*) AS count_orders, COUNT(DISTINCT hhid) as dcnt_hhid

            FOR (year_bucket_id) IN (1 AS y1, 2 AS y2)

         )

    )

    SELECT q.*, y.*

    FROM pivot_by_quarter q JOIN pivot_by_year y ON (q.d=y.d)

    ;

    Explain plan:

    Header 1

    ----------------------------------------------------------------------------------------------------------

    | Id  | Operation                  | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |

    ----------------------------------------------------------------------------------------------------------

    |   0 | SELECT STATEMENT           |                             |     1 |   246 |    26   (8)| 00:00:01 |

    |   1 |  TEMP TABLE TRANSFORMATION |                             |       |       |            |          |

    |   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D7872_57BE4CB2 |       |       |            |          |

    |   3 |    VIEW                    |                             |    10 |    90 |    20   (0)| 00:00:01 |

    |   4 |     UNION-ALL              |                             |       |       |            |          |

    |   5 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |   6 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |   7 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |   8 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |   9 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |  10 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |  11 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |  12 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |  13 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |  14 |      FAST DUAL             |                             |     1 |       |     2   (0)| 00:00:01 |

    |* 15 |   HASH JOIN                |                             |     1 |   246 |     6  (34)| 00:00:01 |

    |  16 |    VIEW                    |                             |     1 |   162 |     3  (34)| 00:00:01 |

    |  17 |     SORT GROUP BY PIVOT    |                             |     1 |    15 |     3  (34)| 00:00:01 |

    |  18 |      VIEW                  |                             |    10 |   150 |     2   (0)| 00:00:01 |

    |  19 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D7872_57BE4CB2 |    10 |    90 |     2   (0)| 00:00:01 |

    |  20 |    VIEW                    |                             |     1 |    84 |     3  (34)| 00:00:01 |

    |  21 |     SORT GROUP BY PIVOT    |                             |     1 |    15 |     3  (34)| 00:00:01 |

    |  22 |      VIEW                  |                             |    10 |   150 |     2   (0)| 00:00:01 |

    |  23 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D7872_57BE4CB2 |    10 |    90 |     2   (0)| 00:00:01 |

  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited Jul 27, 2016 6:59PM

    Reading the SYS_TEMP.. table is not nearly as intensive as a normal table read - only the data that you put in there will exist so it could be small, most importantly it will be in memory already so it doesn't actually have much of an overhead reading it more than once.

    That's assuming you put your normal table inside a WITH clause when you run your real query (I don't see why you wouldn't)

  • Ruslan D.
    Ruslan D. Member Posts: 25
    edited Jul 27, 2016 7:11PM

    > Reading the SYS_TEMP.. table is not nearly as intensive as a normal table read


    What makes you think this way? We have tested with and without TEMP TABLE TRANSFORMATION and it's almost the same performance.

    What I posted is just simple test case.

    Our actual query is a 5-table join with tables adding up to 0.5 Tb of data and billions of rows.

    "SYS_TEMP.. table is not nearly as intensive as a normal" is not correct in our case, it does not filter much from the actual tables.

    The only reason Oracle does TEMP TABLE TRANSFORMATION I guess is because we have star_transformation_enabled=TRUE.

    > most importantly it will be in memory already


    Miss again. We see a lot of temp read/write wait events. So no, it's not in-memory.

    Although that might be partially because of "SORT GROUP BY PIVOT" two steps in the execution plan.

  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited Jul 27, 2016 7:26PM
    Ruslan D. wrote:
    
    > Reading the SYS_TEMP.. table is not nearly as intensive as a normal table read 
    
    What makes you think this way? We have tested with and without TEMP TABLE TRANSFORMATION and it's almost the same performance. What I posted is just simple test case. Our actual query is a 5-table join with tables adding up to 0.5 Tb of data and billions of rows. "SYS_TEMP.. table is not nearly as intensive as a normal" is not correct in our case, it does not filter much from the actual tables. The only reason Oracle does TEMP TABLE TRANSFORMATION I guess is because we have star_transformation_enabled=TRUE. > most importantly it will be in memory already
    Miss again. We see a lot of temp read/write wait events. So no, it's not in-memory. Although that might be partially because of "SORT GROUP BY PIVOT" two steps in the execution plan.

    I explained why the materialized CTE would be less intensive.

    If the actual result set from your 5 table join is 0.5 Tb then sure that won't fit in memory but is that really the size of the result set or just the size of your tables?

    Even if it doesn't fit in memory, it still doesn't have to do that 5 table join for each pivot - it's done already and you already have the result set in as small as Oracle can make it.

    start_transformation_enabled = true will only give you temp table transformations that get used on bitmap merges, is this happening? It's not the case I am describing here - I am describing the use of a WITH clause (aka CTE) which gets referenced more than once causing the results to be materialized.

    If you looked at the real execution plan with row source level statistics you could see what operations had to spill memory and how many times. If I demonstrate on the sample case:

    alter session set statistics_level=all;

    set serverout off

    with q as

    (...

    ;

    select * from table(dbms_xplan.display_cursor(format=>'typical allstats'));

    Plan hash value: 251130963

    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                  | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   |
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT           |                           |      1 |        |       |    26 (100)|          |      7 |00:00:00.01 |      19 |      1 |      1 |       |       |          |
    |   1 |  TEMP TABLE TRANSFORMATION |                           |      1 |        |       |            |          |      7 |00:00:00.01 |      19 |      1 |      1 |       |       |          |
    |   2 |   LOAD AS SELECT           |                           |      1 |        |       |            |          |      0 |00:00:00.01 |       4 |      0 |      1 |  1036K|  1036K|          |
    |   3 |    VIEW                    |                           |      1 |     10 |    90 |    20   (0)| 00:00:01 |     10 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |   4 |     UNION-ALL              |                           |      1 |        |       |            |          |     10 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |   5 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |   6 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |   7 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |   8 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |   9 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |  10 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |  11 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |  12 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |  13 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |  14 |      FAST DUAL             |                           |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
    |* 15 |   HASH JOIN                |                           |      1 |      1 |   246 |     6  (34)| 00:00:01 |      7 |00:00:00.01 |       9 |      1 |      0 |  1055K|  1055K|     1/0/0|
    |  16 |    VIEW                    |                           |      1 |      1 |   162 |     3  (34)| 00:00:01 |      7 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
    |  17 |     SORT GROUP BY PIVOT    |                           |      1 |      1 |    15 |     3  (34)| 00:00:01 |      7 |00:00:00.01 |       6 |      1 |      0 |  4096 |  4096 |     1/0/0|
    |  18 |      VIEW                  |                           |      1 |     10 |   150 |     2   (0)| 00:00:01 |     10 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
    |  19 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6662_A23270 |      1 |     10 |    90 |     2   (0)| 00:00:01 |     10 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
    |  20 |    VIEW                    |                           |      1 |      1 |    84 |     3  (34)| 00:00:01 |      7 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
    |  21 |     SORT GROUP BY PIVOT    |                           |      1 |      1 |    15 |     3  (34)| 00:00:01 |      7 |00:00:00.01 |       3 |      0 |      0 |  2048 |  2048 |     1/0/0|
    |  22 |      VIEW                  |                           |      1 |     10 |   150 |     2   (0)| 00:00:01 |     10 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
    |  23 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6662_A23270 |      1 |     10 |    90 |     2   (0)| 00:00:01 |     10 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

      15 - access("Q"."D"="Y"."D")

    The O/1/M column is telling us that each one sort that occurred on lines 17 and 21 were done completely in memory.

    How does the same output look for your real query.

  • Unknown
    edited Jul 27, 2016 7:39PM
    We have to pivot into columns:
    - sum(), count() of data over separate quarters for last 2 years and present each quarter's result into its own set of column;
    - sum(), count() of data over separate years for last 2 years and present each years's result into its own set of column;
    - sum(), count() of data over separate months for last 3  months and present each month's result into its own set of column.
    The
    list of quarters / years / months in the bucketing is fixed and can be
    hard-coded in the MODEL clause (i.e. it'll be always to look back 2
    years).

    Ok

    1. there are 12 months per year

    2. there are 3 months per quarter

    So aren't there AT MOST only 24 base data points needed? That is, 24 months of numbers?

    Sums and counts are 'cumulative' - you can ADD monthly values to obtain 3 month values, quarterly values and yearly values.

    So just write one, simple query to produce those 24 rows. That requires ONE PASS over your 'huge' data set.

    Then write a query based on the first query to produce the result set you need. That requires a pass over 24 rows.

    Of course if you display a row at a granularity of MONTH what do you expect to have in the QUARTER and YEAR columns?

  • Ruslan D.
    Ruslan D. Member Posts: 25
    edited Jul 27, 2016 8:14PM

    > Sums and counts are 'cumulative' - you can ADD monthly values to obtain 3 month values, quarterly values and yearly values.

    We have COUNT DISTINCT and that's not additive operation.

    Thanks.

This discussion has been closed.