Discussions
Categories
- 385.5K All Categories
- 5.1K Data
- 2.5K Big Data Appliance
- 2.5K Data Science
- 453.4K Databases
- 223.2K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 47 Multilingual Engine
- 606 MySQL Community Space
- 486 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.2K ORDS, SODA & JSON in the Database
- 584 SQLcl
- 4K SQL Developer Data Modeler
- 188K SQL & PL/SQL
- 21.5K SQL Developer
- 45 Data Integration
- 45 GoldenGate
- 298.4K Development
- 4 Application Development
- 20 Developer Projects
- 166 Programming Languages
- 295K Development Tools
- 150 DevOps
- 3.1K QA/Testing
- 646.7K Java
- 37 Java Learning Subscription
- 37.1K Database Connectivity
- 201 Java Community Process
- 108 Java 25
- 22.2K Java APIs
- 138.3K Java Development Tools
- 165.4K Java EE (Java Enterprise Edition)
- 22 Java Essentials
- 176 Java 8 Questions
- 86K Java Programming
- 82 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 208 Java User Groups
- 25 JavaScript - Nashorn
- Programs
- 666 LiveLabs
- 41 Workshops
- 10.3K Software
- 6.7K Berkeley DB Family
- 3.6K JHeadstart
- 6K Other Languages
- 2.3K Chinese
- 207 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 474 Portuguese
select's MODEL clause to add columns on the resultset, not just new rows?

PIVOT can only pivot one value into its one column.
Also PIVOT is not helpful if for examle we want to pivot and (group by) intersecting ranges of values.
So we're planning to use MODEL clause to overcome this limits.
We have to pivot into columns:
- sum(), count() of data over separate quarters for last 2 years and present each quarter's result into its own set of column;
- sum(), count() of data over separate years for last 2 years and present each years's result into its own set of column;
- sum(), count() of data over separate months for last 3 months and present each month's result into its own set of column.
The list of quarters / years / months in the bucketing is fixed and can be hard-coded in the MODEL clause (i.e. it'll be always to look back 2 years).
I have a working prototype doing above using three separate pivots, but it is very inefficient because each pivot has to pass
our huge dataset again.
We expect MODEL might require just one pass over the dataset.
Questions:
- Is it possible for MODEL clause to create new columns (ie. we group by month and product category in a subquery, but MODEL should add columns for the above quarter/year/month buckets)?
- Can't think of a best way to define DIMENSION BY and MEASURES so we could create new columns..
Any ideas are highly appreciated.
Answers
-
Ruslan D. wrote: PIVOT can only pivot one value into its one column. Also PIVOT is not helpful if for examle we want to pivot and (group by) intersecting ranges of values. So we're planning to use MODEL clause to overcome this limits. We have to pivot into columns: - sum(), count() of data over separate quarters for last 2 years and present each quarter's result into its own set of column; - sum(), count() of data over separate years for last 2 years and present each years's result into its own set of column; - sum(), count() of data over separate months for last 3 months and present each month's result into its own set of column. The list of quarters / years / months in the bucketing is fixed and can be hard-coded in the MODEL clause (i.e. it'll be always to look back 2 years). I have a working prototype doing above using three separate pivots, but it is very inefficient because each pivot has to pass our huge dataset again. We expect MODEL might require just one pass over the dataset. Questions: - Is it possible for MODEL clause to create new columns (ie. we group by month and product category in a subquery, but MODEL should add columns for the above quarter/year/month buckets)? - Can't think of a best way to define DIMENSION BY and MEASURES so we could create new columns.. Any ideas are highly appreciated.
I wouldn't abandon the PIVOT query, it is made EXACTLY for your situation.
Please post the real query, with the real execution plan. I would guess it can be written so that only one pass of the table is required.
-
Nope, PIVOT is not flexible enough for all cases.
- see my comment there.
-
Please post the real query that you have run with separate pivots. Include the real execution plan and all ddl required.
I stand by my previous statement, you haven't given any information that proves it's not possible.
-
Hi,
With a MODEL clause you can define new columns, defined most likely as MEASURES. But I'm not sure a MODEL clause is required here. Can you provide the necessary sample data so that we can offer suggestions?
-
I tried to create a simple small test case that includes data and shows quickly that two pivots will do two full scans:
with q as
(
select 1 as d, 10 as hhid, 5 as dollars from dual union all
select 2 , 11 , 6 from dual union all
select 3 , 12 , 7 from dual union all
select 4 , 13 , 8 from dual union all
select 5 , 14 , 9 from dual union all
select 6 , 15 , 0 from dual union all
select 7 , 10 , 1 from dual union all
select 7 , 11 , 2 from dual union all
select 7 , 12 , 3 from dual union all
select 7 , 13 , 4 from dual
)
, q2 as -- adding overlapping bucket ids:
(
SELECT q.*
, case when d between 1 and 3 then 1
when d=4 then 2
when d between 5 and 6 then 3
when d=7 then 2
end quarter_bucket_id
,
case when d between 1 and 4 then 1
when d between 5 and 6 then 2
end year_bucket_id
FROM q
)
, pivot_by_quarter as
(
SELECT * FROM q2
PIVOT
( SUM(dollars) as sum_dollars, COUNT(*) AS count_orders, COUNT(DISTINCT hhid) as dcnt_hhid
FOR (quarter_bucket_id) IN (1 AS q1, 2 AS q2, 3 AS q3, 4 as q4)
)
)
, pivot_by_year as
(
SELECT * FROM q2
PIVOT
( SUM(dollars) as sum_dollars, COUNT(*) AS count_orders, COUNT(DISTINCT hhid) as dcnt_hhid
FOR (year_bucket_id) IN (1 AS y1, 2 AS y2)
)
)
SELECT q.*, y.*
FROM pivot_by_quarter q JOIN pivot_by_year y ON (q.d=y.d)
;
Explain plan:
Header 1 ----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 246 | 26 (8)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D7872_57BE4CB2 | | | | |
| 3 | VIEW | | 10 | 90 | 20 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 11 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 12 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 13 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 14 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 15 | HASH JOIN | | 1 | 246 | 6 (34)| 00:00:01 |
| 16 | VIEW | | 1 | 162 | 3 (34)| 00:00:01 |
| 17 | SORT GROUP BY PIVOT | | 1 | 15 | 3 (34)| 00:00:01 |
| 18 | VIEW | | 10 | 150 | 2 (0)| 00:00:01 |
| 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7872_57BE4CB2 | 10 | 90 | 2 (0)| 00:00:01 |
| 20 | VIEW | | 1 | 84 | 3 (34)| 00:00:01 |
| 21 | SORT GROUP BY PIVOT | | 1 | 15 | 3 (34)| 00:00:01 |
| 22 | VIEW | | 10 | 150 | 2 (0)| 00:00:01 |
| 23 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7872_57BE4CB2 | 10 | 90 | 2 (0)| 00:00:01 |
-
Reading the SYS_TEMP.. table is not nearly as intensive as a normal table read - only the data that you put in there will exist so it could be small, most importantly it will be in memory already so it doesn't actually have much of an overhead reading it more than once.
That's assuming you put your normal table inside a WITH clause when you run your real query (I don't see why you wouldn't)
-
> Reading the SYS_TEMP.. table is not nearly as intensive as a normal table read
What makes you think this way? We have tested with and without TEMP TABLE TRANSFORMATION and it's almost the same performance.
What I posted is just simple test case.
Our actual query is a 5-table join with tables adding up to 0.5 Tb of data and billions of rows.
"SYS_TEMP.. table is not nearly as intensive as a normal" is not correct in our case, it does not filter much from the actual tables.
The only reason Oracle does TEMP TABLE TRANSFORMATION I guess is because we have star_transformation_enabled=TRUE.
> most importantly it will be in memory already
Miss again. We see a lot of temp read/write wait events. So no, it's not in-memory.
Although that might be partially because of "SORT GROUP BY PIVOT" two steps in the execution plan.
-
Ruslan D. wrote: > Reading the SYS_TEMP.. table is not nearly as intensive as a normal table read
What makes you think this way? We have tested with and without TEMP TABLE TRANSFORMATION and it's almost the same performance. What I posted is just simple test case. Our actual query is a 5-table join with tables adding up to 0.5 Tb of data and billions of rows. "SYS_TEMP.. table is not nearly as intensive as a normal" is not correct in our case, it does not filter much from the actual tables. The only reason Oracle does TEMP TABLE TRANSFORMATION I guess is because we have star_transformation_enabled=TRUE. > most importantly it will be in memory already
Miss again. We see a lot of temp read/write wait events. So no, it's not in-memory. Although that might be partially because of "SORT GROUP BY PIVOT" two steps in the execution plan.I explained why the materialized CTE would be less intensive.
If the actual result set from your 5 table join is 0.5 Tb then sure that won't fit in memory but is that really the size of the result set or just the size of your tables?
Even if it doesn't fit in memory, it still doesn't have to do that 5 table join for each pivot - it's done already and you already have the result set in as small as Oracle can make it.
start_transformation_enabled = true will only give you temp table transformations that get used on bitmap merges, is this happening? It's not the case I am describing here - I am describing the use of a WITH clause (aka CTE) which gets referenced more than once causing the results to be materialized.
If you looked at the real execution plan with row source level statistics you could see what operations had to spill memory and how many times. If I demonstrate on the sample case:
alter session set statistics_level=all;
set serverout off
with q as
(...
;
select * from table(dbms_xplan.display_cursor(format=>'typical allstats'));
Plan hash value: 251130963
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 26 (100)| | 7 |00:00:00.01 | 19 | 1 | 1 | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | 7 |00:00:00.01 | 19 | 1 | 1 | | | |
| 2 | LOAD AS SELECT | | 1 | | | | | 0 |00:00:00.01 | 4 | 0 | 1 | 1036K| 1036K| |
| 3 | VIEW | | 1 | 10 | 90 | 20 (0)| 00:00:01 | 10 |00:00:00.01 | 0 | 0 | 0 | | | |
| 4 | UNION-ALL | | 1 | | | | | 10 |00:00:00.01 | 0 | 0 | 0 | | | |
| 5 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 6 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 7 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 8 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 9 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 10 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 11 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 12 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 13 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 14 | FAST DUAL | | 1 | 1 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
|* 15 | HASH JOIN | | 1 | 1 | 246 | 6 (34)| 00:00:01 | 7 |00:00:00.01 | 9 | 1 | 0 | 1055K| 1055K| 1/0/0|
| 16 | VIEW | | 1 | 1 | 162 | 3 (34)| 00:00:01 | 7 |00:00:00.01 | 6 | 1 | 0 | | | |
| 17 | SORT GROUP BY PIVOT | | 1 | 1 | 15 | 3 (34)| 00:00:01 | 7 |00:00:00.01 | 6 | 1 | 0 | 4096 | 4096 | 1/0/0|
| 18 | VIEW | | 1 | 10 | 150 | 2 (0)| 00:00:01 | 10 |00:00:00.01 | 6 | 1 | 0 | | | |
| 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6662_A23270 | 1 | 10 | 90 | 2 (0)| 00:00:01 | 10 |00:00:00.01 | 6 | 1 | 0 | | | |
| 20 | VIEW | | 1 | 1 | 84 | 3 (34)| 00:00:01 | 7 |00:00:00.01 | 3 | 0 | 0 | | | |
| 21 | SORT GROUP BY PIVOT | | 1 | 1 | 15 | 3 (34)| 00:00:01 | 7 |00:00:00.01 | 3 | 0 | 0 | 2048 | 2048 | 1/0/0|
| 22 | VIEW | | 1 | 10 | 150 | 2 (0)| 00:00:01 | 10 |00:00:00.01 | 3 | 0 | 0 | | | |
| 23 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6662_A23270 | 1 | 10 | 90 | 2 (0)| 00:00:01 | 10 |00:00:00.01 | 3 | 0 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------15 - access("Q"."D"="Y"."D")
The O/1/M column is telling us that each one sort that occurred on lines 17 and 21 were done completely in memory.
How does the same output look for your real query.
-
We have to pivot into columns: - sum(), count() of data over separate quarters for last 2 years and present each quarter's result into its own set of column; - sum(), count() of data over separate years for last 2 years and present each years's result into its own set of column; - sum(), count() of data over separate months for last 3 months and present each month's result into its own set of column. The
list of quarters / years / months in the bucketing is fixed and can be
hard-coded in the MODEL clause (i.e. it'll be always to look back 2
years).Ok
1. there are 12 months per year
2. there are 3 months per quarter
So aren't there AT MOST only 24 base data points needed? That is, 24 months of numbers?
Sums and counts are 'cumulative' - you can ADD monthly values to obtain 3 month values, quarterly values and yearly values.
So just write one, simple query to produce those 24 rows. That requires ONE PASS over your 'huge' data set.
Then write a query based on the first query to produce the result set you need. That requires a pass over 24 rows.
Of course if you display a row at a granularity of MONTH what do you expect to have in the QUARTER and YEAR columns?
-
> Sums and counts are 'cumulative' - you can ADD monthly values to obtain 3 month values, quarterly values and yearly values.
We have COUNT DISTINCT and that's not additive operation.
Thanks.