1 2 3 Previous Next 35 Replies Latest reply on Jun 16, 2017 10:02 AM by Jonathan Lewis Go to original post
      • 15. Re: How to take advantage of data already being "grouped" in rows
        mathguy

        Stew Ashton wrote:

         

         

        This thread includes some dissing of UNPIVOT. mathguy says "[as] demonstrated recently in another thread by Solomon Yakobson ... UNPIVOT, unfortunately, does a UNION ALL behind the scenes - in this case, the table is read five times."

         

        Perhaps Solomon had another test case in the thread mathguy mentioned. I would appreciate a link.

         

        Best regards, Stew Ashton

         

         

        I feel that this topic (how UNPIVOT is implemented behind the scenes) deserves its own discussion. I just don't feel qualified to be the one who moderates it.

         

        Anyway:  here is a link to that discussion:  split a single row into more rows    See Reply 3.

         

        Here is a shameless piece of plagiarizing from Solomon's example - you can read that other thread, or you can read this example, they illustrate the issue in the same way.

         

        Consider the simple query:

         

        select  name

        from    hr.employees

        unpivot ( name for descr in (first_name, last_name) )

        ;

         

        The plan looks good enough:

         

        ---------------------------------------------------------------------------------

        | Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |

        ---------------------------------------------------------------------------------

        |  0  | SELECT STATEMENT  |             |  214  |  2996 |    2  (0)  | 00:00:01 |

        |*  1 |  VIEW             |             |  214  |  2996 |    2  (0)  | 00:00:01 |

        |  2  |  UNPIVOT          |             |       |       |            |          |

        |  3  |    INDEX FULL SCAN| EMP_NAME_IX |  107  |  1605 |    1  (0)  | 00:00:01 |

        ---------------------------------------------------------------------------------

         

        However, if you look under the hood:

         

        SQL> declare
          2    c clob;
          3  begin
          4    dbms_utility.expand_sql_text(
          5        q'[
          6              select  name
          7              from    hr.employees
          8              unpivot ( name for descr in (first_name, last_name) )
          9          ]'
        10      , c
        11    );
        12    dbms_output.put_line(c);
        13  end;
        14  /


        SELECT "A1"."NAME" "NAME" FROM  (SELECT "A2"."EMPLOYEE_ID" "EMPLOYEE_ID","A2"."EMAIL" "EMAIL","A2"."PHONE_NUMBER" "PHONE
        _NUMBER","A2"."HIRE_DATE" "HIRE_DATE","A2"."JOB_ID" "JOB_ID","A2"."SALARY" "SALARY","A2"."COMMISSION_PCT" "COMMISSION_PC
        T","A2"."MANAGER_ID" "MANAGER_ID","A2"."DEPARTMENT_ID" "DEPARTMENT_ID","A2"."DESCR" "DESCR","A2"."NAME" "NAME" FROM  ( (
        SELECT "A4"."EMPLOYEE_ID" "EMPLOYEE_ID","A4"."EMAIL" "EMAIL","A4"."PHONE_NUMBER" "PHONE_NUMBER","A4"."HIRE_DATE" "HIRE_D
        ATE","A4"."JOB_ID" "JOB_ID","A4"."SALARY" "SALARY","A4"."COMMISSION_PCT" "COMMISSION_PCT","A4"."MANAGER_ID" "MANAGER_ID"
        ,"A4"."DEPARTMENT_ID" "DEPARTMENT_ID",'FIRST_NAME' "DESCR","A4"."FIRST_NAME" "NAME" FROM "HR"."EMPLOYEES" "A4") UNION AL
        L (SELECT "A5"."EMPLOYEE_ID" "EMPLOYEE_ID","A5"."EMAIL" "EMAIL","A5"."PHONE_NUMBER" "PHONE_NUMBER","A5"."HIRE_DATE" "HI
        RE_DATE","A5"."JOB_ID" "JOB_ID","A5"."SALARY" "SALARY","A5"."COMMISSION_PCT" "COMMISSION_PCT","A5"."MANAGER_ID" "MANAGER
        _ID","A5"."DEPARTMENT_ID" "DEPARTMENT_ID",'LAST_NAME' "DESCR","A5"."LAST_NAME" "NAME" FROM "HR"."EMPLOYEES" "A5")) "A2"
        WHERE "A2"."NAME" IS NOT NULL) "A1"

         

        PL/SQL procedure successfully completed.

         

        Formatting the output:

         

        SELECT "A1"."NAME" "NAME"

        FROM  (SELECT "A2"."EMPLOYEE_ID" "EMPLOYEE_ID","A2"."EMAIL" "EMAIL","A2"."PHONE_NUMBER" "PHONE_NUMBER",

                      "A2"."HIRE_DATE" "HIRE_DATE","A2"."JOB_ID" "JOB_ID","A2"."SALARY" "SALARY","A2"."COMMISSION_PCT" "COMMISSION_PCT",

                      "A2"."MANAGER_ID" "MANAGER_ID","A2"."DEPARTMENT_ID" "DEPARTMENT_ID","A2"."DESCR" "DESCR","A2"."NAME" "NAME"

               FROM  ( (SELECT "A4"."EMPLOYEE_ID" "EMPLOYEE_ID","A4"."EMAIL" "EMAIL","A4"."PHONE_NUMBER" "PHONE_NUMBER",

                               "A4"."HIRE_DATE" "HIRE_DATE","A4"."JOB_ID" "JOB_ID","A4"."SALARY" "SALARY","A4"."COMMISSION_PCT" "COMMISSION_PCT",

                               "A4"."MANAGER_ID" "MANAGER_ID","A4"."DEPARTMENT_ID" "DEPARTMENT_ID",

                               'FIRST_NAME' "DESCR","A4"."FIRST_NAME" "NAME"

                          FROM "HR"."EMPLOYEES" "A4")

                       UNION ALL

                       (SELECT "A5"."EMPLOYEE_ID" "EMPLOYEE_ID","A5"."EMAIL" "EMAIL","A5"."PHONE_NUMBER" "PHONE_NUMBER",

                               "A5"."HIRE_DATE" "HIRE_DATE","A5"."JOB_ID" "JOB_ID","A5"."SALARY" "SALARY","A5"."COMMISSION_PCT" "COMMISSION_PCT",

                               "A5"."MANAGER_ID" "MANAGER_ID","A5"."DEPARTMENT_ID" "DEPARTMENT_ID",

                               'LAST_NAME' "DESCR","A5"."LAST_NAME" "NAME"

                          FROM "HR"."EMPLOYEES" "A5")) "A2"

               WHERE "A2"."NAME" IS NOT NULL) "A1"   --  because I did not request INCLUDE NULLS in UNPIVOT

        • 16. Re: How to take advantage of data already being "grouped" in rows
          mathguy

          Duh!

           

          I know what Solomon said, I know his approach using Cartesian join, and I still didn't think to use it...    One way is to use it with LATERAL or CROSS JOIN, as you suggested. Another way is to replace UNPIVOT with a Cartesian join in the "global" unpivot solution, where we then need to group by.

           

          Surprisingly: Using the Cartesian join instead of UNPIVOT with the CROSS JOIN approach results in approximately the same execution time as with UNPIVOT and LATERAL.  The bigger surprise, though, was when I used Cartesian join instead of UNPIVOT globally:  the time is consistently over 20 seconds, 2.5 to 3 times longer than all the other solutions. (And 7 times longer than the GREATEST solution with adjustments for NULL.)

           

          Cheers,   -   mathguy

          • 17. Re: How to take advantage of data already being "grouped" in rows
            rp0428

            That can be done easily, but may lead to wrong results.

            Let's  say just for the sake of the argument that -1000 is the least number (read "negative number of greatest absolute value") possible in the
            NUMBER data type. (We all know it's not -1000, but let's just say it is). Then I could do NVL(COLn, -1000) and take the GREATEST over the
            results. Now if the result is -1000, all I have learnt is that all values are EITHER NULL OR -1000, and I don't know more than that. The
            correct answer is either NULL or -1000 and I can't tell from what I've done so far (although I can, indeed, check if at least one value is
            non-NULL).

            Huh? I guess I assumed that a 'math guy' would understand the rather simple concept! Even Frank understood it.

            The point of replies #1 and #7 is, when you know a lower bound to the possible values, use GREATEST and NVL with some value lower than (not equal to) that lower bound.

             

            This is what I said:

            Then if the 'greatest' value is the right multiple of that large negative value it means ALL values were null so replace that with a null result.

            1. you have possible values that can NOT be as small as -1000

            2. you have FIVE columns any, or all of which can be null

            3. you replace NULL with -1000 for any, or each, of the five

            4. if the sum of the five values is 5 times that 'large negative number' then ALL FIVE values were null and replaced

            5. otherwise you get the true largest value of the five

             

            My question, though, wasn't so much "how do we solve this problem using GREATEST and NVL", but rather "what other techniques/features are available for this type of situation", which may prove helpful in broader classes of similar problems.

            This was also part of what you said:

            This is something that we can use to our advantage - for example if a function like GREATEST IGNORE NULLS (COL1, ... , COL5) existed. Unfortunately it doesn't.

            That 'technique' emulates what that 'GREATEST IGNORE NULLS' would do if it existed. And then you complain that you don't want to use 'greatest' or even test it! Somewhat hypocritical don't you think? If you didn't want to use GREATEST you should have just said so. But saying "Unfortunately it doesn't" suggests to me it is more than valid as a possible solution.

             

            That 'technique' of introducing artificial, 'can not really exist', values is in fact a common one when quantizing and normalizing data. It is used to allow ALL data values to be used (not having to filter them out) but acts to shift unusable values to a bucket (or buckets) to keep them from affecting the calculations on the valid, in range, data.

             

            As a mathematician you should be able to appreciate why there are use cases where you can't just throw values away without screwing up things like averages and counts. In scientific work you have to account for ALL of the results somehow - even if you just throw them into a 'not sure what to do with these' bucket.

            • 18. Re: How to take advantage of data already being "grouped" in rows
              mathguy

              Frank Kulash wrote:

               

               

              The point of replies #1 and #7 is, when you know a lower bound to the possible values, use GREATEST and NVL with some value lower than (not equal to) that lower bound. So, if you know that the columns might contain values as low as -1000 (but no lower), then use something lower than -1000 (like -1001 or, as Manik did, -1E66) as the 2nd argument to NVL, like this:

               

              Sorry if my point in Reply 3 wasn't very clear. What I meant was, suppose -1000 is the least value allowed for the NUMBER data type - NOT the lowest of the values (or a lower bound for the values) in my table.

               

              Suppose -1E66 is the "most negative number" that can be represented in the NUMBER data type in Oracle. If it's not -1000, and if it's not -1E66, there is some such number, because the number of numbers that can be represented is necessarily finite, and every finite set has a smallest element. Now how will this strategy work if that "least possible number" is already present in my data?

               

              The answer is a symbolic value, like -infinity; as Stew pointed out though, that doesn't seem possible in Oracle SQL, at least not without some loss of information.

               

              On the other hand, as I suggested in my reply to rp4028, that can be fixed by assuming the result is NULL if it is equal to -1000 (or to -1E66), but then coalescing this null with the column values; if they are all null then the result remains null, but if at least one of them was non-null, then the result reverts to -1000 (or -1E66) again, as it should.

               

              All of this discussion, though, tries to solve this very specific (and somewhat contrived) problem; my purpose was to see what other general techniques for manipulating data that is already "grouped", in a sense, by being presented in rows, may exist. Things like LATERAL/CROSS APPLY, UNPIVOT and Cartesian joins, or perhaps writing one's function in a WITH clause (Oracle 12.1 and higher)... what else?

               

              Cheers,    -    mathguy

              • 19. Re: How to take advantage of data already being "grouped" in rows
                mathguy

                LOL - I see we were writing at the same time.  Please see my Reply 18 and see if you understand what I meant.     Cheers,   -   mathguy

                 

                (As an aside, you also mentioned "multiple" and "sum" - that doesn't change anything in the discussion, but please note that I am talking about GREATEST and MAX, rather than sum. What you said can be transferred easily to this framework.)

                • 20. Re: How to take advantage of data already being "grouped" in rows
                  mathguy

                  A couple more points...

                   

                  I did not say I don't want to use GREATEST.  If you read my original post carefully, you will find that there is, in fact, a solution that uses GREATEST (with COALESCE) already. It is probably just slightly faster than wrapping each column within NVL and using a "lower bound" and then un-NVL'ing at the end (and coalescing again to make sure the result is correct). So - I have nothing against GREATEST, I was just pointing out that the solution you proposed requires more work. I have now explained in more detail what I meant, in Reply 18.

                   

                  I see you are having an issue with me being a mathguy, too, since you keep bringing it up. Rest assured that my math is still fine; let's leave it at that.  

                  • 21. Re: How to take advantage of data already being "grouped" in rows
                    Sven W.

                    In general I believe all the discussions about pivot/unpivot or other ways of transposing columns into rows and back (lateral joins, etc.) will lead to suboptimal performance. it could make sense if this is already done during some previous step, e.g. when the data is first created.

                     

                    Some other points to think about:

                     

                    1) How is the data distribution?

                     

                    Are the column values usually very different or is there a distinct set of repeating values?

                    If we have a lot of similar values, then a result cached function with 5 input parameters could spead up performance in spite of the obvious plsql context switch overhead.

                     

                    2) Unfortunatly we are not able to write our own GREATEST function with a flexible amount if input parameters (ADA has an ellipsis parameter construct, which some of the plsql internal functions can use, but this is not exposed to us).

                    As an alternative, we can consider to use nested tables and varrays. An example suggestion is made below

                     

                    3) I think the version using NULLIF and NVL should be very hard to beat

                    btw: there is no noticable performance difference between NVL and COALESCE if the second parameter is a constant. COALESCE is better, if the second parameter does not need to be evaluated, however if it is a constant then the parameter is already evaluated at parsing time and since NVL is faster to type it is therefore better.

                     

                    Having said that, here is a special version of a lateral join that is very compact. It uses the predefined VARRAY of NUMBER sys.odciNumberList.

                    It seems resonably fast, although i didn't test it for a larger dataset.

                     

                    with
                        test_data ( id, col1, col2, col3, col4, col5 ) as (
                          select 1, 100 , 200 , 300 , 400 , 500  from dual union all
                          select 2, 300 , 100 , 100 , 300 , 300  from dual union all
                          select 3, null, 300 , 100 , null, 400  from dual union all
                          select 4, null, null, null, null, null from dual union all
                          select 5, null, 300 , null, null, 300  from dual
                        )
                    select td.id,
                            (select max(column_value)
                            from table(sys.ODCINumberList(td.col1, td.col2, td.col3, td.col4, td.col5))
                            ) max_data
                    from test_data td;
                    

                    Works in an 11R2 database too.

                    1 person found this helpful
                    • 22. Re: How to take advantage of data already being "grouped" in rows
                      mathguy

                      My choice of COALESCE over NVL cannot be easily avoided, because I use things like COALESCE(COL1, ... , COL5). NVL only takes two arguments. Of course, I could do NVL(COL1, NVL(.....) ...) but that would be four function calls instead of one.

                       

                      The alternative, I believe, is (using -1E66 as a placeholder for the most negative number that can be represented in the NUMBER data type):

                       

                      COALESCE( NULLIF (GREATEST (NVL(COL1, -1E66), ... , NVL(COL5, -1E66)), -1E66),   COL1, ... , COL5)

                       

                      We have five NVL calls, one call to GREATEST, then one call to NULLIF, and then still one more call to COALESCE for the reason I explained in an earlier Reply.

                       

                      My solution (which may still not be the most efficient) has GREATEST (COALESCE (C1, ..., C5), COALESCE(C2, ..., C5, C1), .....)   - five calls to COALESCE and one to GREATEST. This seems to be fewer function calls.

                       

                      Cheers,    -    mathguy

                       

                      ADDED:  I just tried the solution you proposed - it runs in about 4 seconds, not much worse than the "best" solutions (based on GREATEST). And the table function approach is indeed a general concept that may be used for a diversity of similar problems.

                      • 23. Re: How to take advantage of data already being "grouped" in rows
                        rp0428

                        All of this discussion, though, tries to solve this very specific (and somewhat contrived) problem; my purpose was to see what other general techniques for manipulating data that is already "grouped", in a sense, by being presented in rows, may exist.

                        And one such 'technique' is the one I mentioned - shift/quantize/normalize data values or outliers.

                         

                        In a data warehouse that technique would be used DURING the rollup when the data is being denormalized. So you might even add a new column that indicates the combined status/max/min/sum/other of all five columns.

                         

                        So if you have 12 date columns in a denormalized table (e.g. yearly sales - one column for Jan, one for Feb, etc) you might also other raw and precomputed columns for:

                         

                        1. JAN_COUNT - count of values for Jan, etc

                        2. AVG_SALES

                        3. MAX_SALES

                         

                        That 'technique' being to wring the MOST amount of info out of the data as possible at the time the rollup is done since you can often do additional things (max, min, counts, sums, avg) for little cost once you have to incur the cost to do even one of them.

                         

                        That type of rollup is an exception to the general rule about denormalizing data.

                        • 24. Re: How to take advantage of data already being "grouped" in rows
                          Sven W.

                          mathguy wrote:

                           

                          My choice of COALESCE over NVL cannot be easily avoided, because ...

                          Yes. i was specifically talking about cases where the second parameter is a constant. Your case is not covered by that and I never ment to cover it. But in this thread are a lot of versions using Nvl or coalesce with a constant second parameter. I just wanted to remark that in this specific case there is no relevant performance difference between those two.

                           

                          Edit: I think it was Stew, who specifically mentioned that he preferred COALESCE because of the short circuit evaluation.

                          • 25. Re: How to take advantage of data already being "grouped" in rows
                            AndrewSayer

                            mathguy wrote:

                             

                            Stew Ashton wrote:

                             

                             

                            This thread includes some dissing of UNPIVOT. mathguy says "[as] demonstrated recently in another thread by Solomon Yakobson ... UNPIVOT, unfortunately, does a UNION ALL behind the scenes - in this case, the table is read five times."

                             

                            Perhaps Solomon had another test case in the thread mathguy mentioned. I would appreciate a link.

                             

                            Best regards, Stew Ashton

                             

                             

                            I feel that this topic (how UNPIVOT is implemented behind the scenes) deserves its own discussion. I just don't feel qualified to be the one who moderates it.

                             

                            Anyway: here is a link to that discussion: split a single row into more rows See Reply 3.

                             

                            Here is a shameless piece of plagiarizing from Solomon's example - you can read that other thread, or you can read this example, they illustrate the issue in the same way.

                             

                            Consider the simple query:

                             

                            select name

                            from hr.employees

                            unpivot ( name for descr in (first_name, last_name) )

                            ;

                             

                            The plan looks good enough:

                             

                            ---------------------------------------------------------------------------------

                            | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

                            ---------------------------------------------------------------------------------

                            | 0 | SELECT STATEMENT | | 214 | 2996 | 2 (0) | 00:00:01 |

                            |* 1 | VIEW | | 214 | 2996 | 2 (0) | 00:00:01 |

                            | 2 | UNPIVOT | | | | | |

                            | 3 | INDEX FULL SCAN| EMP_NAME_IX | 107 | 1605 | 1 (0) | 00:00:01 |

                            ---------------------------------------------------------------------------------

                             

                            However, if you look under the hood:

                             

                            SQL> declare
                            2 c clob;
                            3 begin
                            4 dbms_utility.expand_sql_text(
                            5 q'[
                            6 select name
                            7 from hr.employees
                            8 unpivot ( name for descr in (first_name, last_name) )
                            9 ]'
                            10 , c
                            11 );
                            12 dbms_output.put_line(c);
                            13 end;
                            14 /


                            SELECT "A1"."NAME" "NAME" FROM (SELECT "A2"."EMPLOYEE_ID" "EMPLOYEE_ID","A2"."EMAIL" "EMAIL","A2"."PHONE_NUMBER" "PHONE
                            _NUMBER","A2"."HIRE_DATE" "HIRE_DATE","A2"."JOB_ID" "JOB_ID","A2"."SALARY" "SALARY","A2"."COMMISSION_PCT" "COMMISSION_PC
                            T","A2"."MANAGER_ID" "MANAGER_ID","A2"."DEPARTMENT_ID" "DEPARTMENT_ID","A2"."DESCR" "DESCR","A2"."NAME" "NAME" FROM ( (
                            SELECT "A4"."EMPLOYEE_ID" "EMPLOYEE_ID","A4"."EMAIL" "EMAIL","A4"."PHONE_NUMBER" "PHONE_NUMBER","A4"."HIRE_DATE" "HIRE_D
                            ATE","A4"."JOB_ID" "JOB_ID","A4"."SALARY" "SALARY","A4"."COMMISSION_PCT" "COMMISSION_PCT","A4"."MANAGER_ID" "MANAGER_ID"
                            ,"A4"."DEPARTMENT_ID" "DEPARTMENT_ID",'FIRST_NAME' "DESCR","A4"."FIRST_NAME" "NAME" FROM "HR"."EMPLOYEES" "A4") UNION AL
                            L (SELECT "A5"."EMPLOYEE_ID" "EMPLOYEE_ID","A5"."EMAIL" "EMAIL","A5"."PHONE_NUMBER" "PHONE_NUMBER","A5"."HIRE_DATE" "HI
                            RE_DATE","A5"."JOB_ID" "JOB_ID","A5"."SALARY" "SALARY","A5"."COMMISSION_PCT" "COMMISSION_PCT","A5"."MANAGER_ID" "MANAGER
                            _ID","A5"."DEPARTMENT_ID" "DEPARTMENT_ID",'LAST_NAME' "DESCR","A5"."LAST_NAME" "NAME" FROM "HR"."EMPLOYEES" "A5")) "A2"
                            WHERE "A2"."NAME" IS NOT NULL) "A1"

                             

                            PL/SQL procedure successfully completed.

                             

                            Formatting the output:

                             

                            SELECT "A1"."NAME" "NAME"

                            FROM (SELECT "A2"."EMPLOYEE_ID" "EMPLOYEE_ID","A2"."EMAIL" "EMAIL","A2"."PHONE_NUMBER" "PHONE_NUMBER",

                            "A2"."HIRE_DATE" "HIRE_DATE","A2"."JOB_ID" "JOB_ID","A2"."SALARY" "SALARY","A2"."COMMISSION_PCT" "COMMISSION_PCT",

                            "A2"."MANAGER_ID" "MANAGER_ID","A2"."DEPARTMENT_ID" "DEPARTMENT_ID","A2"."DESCR" "DESCR","A2"."NAME" "NAME"

                            FROM ( (SELECT "A4"."EMPLOYEE_ID" "EMPLOYEE_ID","A4"."EMAIL" "EMAIL","A4"."PHONE_NUMBER" "PHONE_NUMBER",

                            "A4"."HIRE_DATE" "HIRE_DATE","A4"."JOB_ID" "JOB_ID","A4"."SALARY" "SALARY","A4"."COMMISSION_PCT" "COMMISSION_PCT",

                            "A4"."MANAGER_ID" "MANAGER_ID","A4"."DEPARTMENT_ID" "DEPARTMENT_ID",

                            'FIRST_NAME' "DESCR","A4"."FIRST_NAME" "NAME"

                            FROM "HR"."EMPLOYEES" "A4")

                            UNION ALL

                            (SELECT "A5"."EMPLOYEE_ID" "EMPLOYEE_ID","A5"."EMAIL" "EMAIL","A5"."PHONE_NUMBER" "PHONE_NUMBER",

                            "A5"."HIRE_DATE" "HIRE_DATE","A5"."JOB_ID" "JOB_ID","A5"."SALARY" "SALARY","A5"."COMMISSION_PCT" "COMMISSION_PCT",

                            "A5"."MANAGER_ID" "MANAGER_ID","A5"."DEPARTMENT_ID" "DEPARTMENT_ID",

                            'LAST_NAME' "DESCR","A5"."LAST_NAME" "NAME"

                            FROM "HR"."EMPLOYEES" "A5")) "A2"

                            WHERE "A2"."NAME" IS NOT NULL) "A1" -- because I did not request INCLUDE NULLS in UNPIVOT

                            I know you don't want to discuss it here but expand_sql_text doesn't necessarily mean that's how the unpivot operation is executed. If you do an extended SQL trace you'd get something like this:

                             

                            create table t( id, col1, col2, col3, col4, col5 ) cache as  

                                   select 1, 100 , 200 , 300 , 400 , 500  from dual union all 

                                   select 2, 300 , 100 , 100 , 300 , 300  from dual union all 

                                   select 3, null, 300 , 100 , null, 400  from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 4, null, null, null, null, null from dual union all 

                                   select 5, null, 300 , null, null, 300  from dual 

                                 ; 

                            alter session set sql_Trace=true;

                            select id, max(val) as high_val 

                            from t 

                            unpivot include nulls ( val for source in (col1, col2, col3, col4, col5) ) 

                            group by id 

                            order by id 

                            exit;

                             

                            PARSING IN CURSOR #931155056 len=138 dep=0 uid=25949 oct=3 lid=25949 tim=440375510114 hv=666938644 ad='1bdb50030' sqlid='71apy8hmw1b8n'

                            select id, max(val) as high_val

                            from t

                            unpivot include nulls ( val for source in (col1, col2, col3, col4, col5) )

                            group by id

                            order by id

                            END OF STMT

                            PARSE #931155056:c=15625,e=3394,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=4079384166,tim=440375510113

                            EXEC #931155056:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4079384166,tim=440375510270

                            FETCH #931155056:c=0,e=1054,p=1,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=4079384166,tim=440375511382

                            FETCH #931155056:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=1,plh=4079384166,tim=440375525810

                            STAT #931155056 id=1 cnt=5 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=1 pw=0 time=1058 us cost=16 size=80 card=5)'

                            STAT #931155056 id=2 cnt=70 pid=1 pos=1 obj=0 op='VIEW  (cr=3 pr=1 pw=0 time=1115 us cost=15 size=1120 card=70)'

                            STAT #931155056 id=3 cnt=70 pid=2 pos=1 obj=0 op='UNPIVOT  (cr=3 pr=1 pw=0 time=1043 us)'

                            STAT #931155056 id=4 cnt=14 pid=3 pos=1 obj=928822126 op='TABLE ACCESS FULL T (cr=3 pr=1 pw=0 time=981 us cost=3 size=140 card=14)'

                            PARSE #664473640:c=0,e=67,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=440375554246

                            EXEC #664473640:c=0,e=292,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=440375554723

                             

                            The line with id=4 tells us that we do 3 consistent reads for that full table scan, if it was done once per column you would have 5*n where n is the number of consistent reads to read the table once.

                             

                            Does that settle it?

                            • 26. Re: How to take advantage of data already being "grouped" in rows
                              mathguy

                              The reason I didn't want this discussed in the current thread is that it's a bigger and more important issue than my modest question.

                               

                              If I felt qualified I would have started a different thread myself, about that question (how is UNPIVOT working behind the scenes?)

                               

                              I am not so qualified, and your post proves my point perfectly. I have no clue: do I believe the output from DBMS_UTILITY.EXPAND_SQL_TEXT? My understanding (perhaps not entirely correct) is that the output shows the actual query the server receives and processes; the server has no idea what query I wrote, it only knows this transformed text that has been transmitted to it for execution. Or do I believe the extended SQL trace (whatever that means)? And, if they contradict each other, WHY do they contradict each other, and when do we believe one and when do we believe the other, and why?

                               

                              I don't even know how to verify with certainty (or... if such a thing is even possible!) that the UNPIVOT query and the output of DBMS_UTILITY.EXPAND_SQL_TEXT, entered as a stand-alone query in its own right, are (or are not) executed exactly the same way by the server. That would go a long way towards persuading me one way or the other, but I don't even know if it's doable, much less how to do it.

                               

                              I hope you understand why I would rather have that discussion done in its own thread, and not be buried here!     Best,    mathguy

                              • 27. Re: How to take advantage of data already being "grouped" in rows
                                Stew Ashton

                                mathguy,

                                 

                                You know I respect you and enjoy what you write, but I can't let you say repeat something I think is inaccurate without rebuttal, simply because you don't want to discuss it anymore. If you didn't want to discuss it, you shouldn't have joined the discussion first.

                                 

                                Oracle allows itself to rewrite queries as long as the result is the same. In this specific case, DBMS_UTILITY.EXPAND_SQL_TEXT rewrote the query to do a UNION ALL, yes. You can see this from the execution plan:

                                 

                                -------------------------------------------------------------------------------------------
                                | Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
                                -------------------------------------------------------------------------------------------
                                |   0 | SELECT STATEMENT  |             |      1 |        |    214 |00:00:00.01 |       4 |
                                |   1 |  VIEW             |             |      1 |    214 |    214 |00:00:00.01 |       4 |
                                |   2 |   UNION-ALL       |             |      1 |        |    214 |00:00:00.01 |       4 |
                                |*  3 |    INDEX FULL SCAN| EMP_NAME_IX |      1 |    107 |    107 |00:00:00.01 |       2 |
                                |   4 |    INDEX FULL SCAN| EMP_NAME_IX |      1 |    107 |    107 |00:00:00.01 |       2 |
                                -------------------------------------------------------------------------------------------
                                

                                 

                                When I execute the original query, I get this:

                                 

                                -------------------------------------------------------------------------------------------
                                | Id  | Operation           | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
                                -------------------------------------------------------------------------------------------
                                |   0 | SELECT STATEMENT    |           |      1 |        |    214 |00:00:00.01 |      12 |
                                |*  1 |  VIEW               |           |      1 |    214 |    214 |00:00:00.01 |      12 |
                                |   2 |   UNPIVOT           |           |      1 |        |    214 |00:00:00.01 |      12 |
                                |   3 |    TABLE ACCESS FULL| EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |      12 |
                                -------------------------------------------------------------------------------------------
                                

                                 

                                So DBMS_UTILITY.EXPAND_SQL_TEXT rewrote the query into something equivalent, for reasons unknown to me. The original query is executed without the help of a "union-all" step.

                                 

                                I understand you are simply quoting Solomon here, but to me the "who" doesn't matter, just the "what".

                                 

                                To find out how a query is executed, add the /*+ gather_plan_statistics */ hint, then execute the query, then immediately execute

                                 

                                select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

                                 

                                You may have to issue the command "set serveroutput off" beforehand in order to avoid an intermediate hidden call to dbms_output.

                                 

                                Best regards, Stew

                                • 28. Re: How to take advantage of data already being "grouped" in rows
                                  mathguy

                                  Completely agreed re: rebutting incorrect statements. I do it myself, as often as I can. I sometimes even rebut correct statements!    

                                   

                                  I didn't mean that the discussion shouldn't be had, I just think it deserves more respect than being hidden here.

                                   

                                  With all of that said:  I did see the Explain Plan, and the output of EXPAND_SQL_TEXT. As you admit yourself, you don't know why they are different (and don't expect ME to know why!) So - why do we choose to believe one and not the other?

                                   

                                  I hope you can appreciate that this discussion is off-topic to my original question - yet it seems to be capturing the thread. In any case, it seems that neither UNPIVOT nor Cartesian join nor LATERAL/CROSS APPLY are nearly as efficient as tweaks to GREATEST (obviously), but also, for example, a table function - which is closer in concept to what I was reaching for. I wish there was a way to "fork" a sub-thread for how UNPIVOT is implemented, if that was the appropriate resolution - perhaps there is one?

                                   

                                  Regards,    -     mathguy

                                  • 29. Re: How to take advantage of data already being "grouped" in rows
                                    Jonathan Lewis

                                    Stew,

                                     

                                    This is an interesting one.

                                    With the sample query the plan shows a single full table scan - with 1 start in if rowsource execution stats are gathered; however if the OUTLINE formatting option is used on the in-memory plan it shows:

                                     

                                      /*+

                                        BEGIN_OUTLINE_DATA

                                          IGNORE_OPTIM_EMBEDDED_HINTS

                                          OPTIMIZER_FEATURES_ENABLE('12.1.0.2')

                                          DB_VERSION('12.1.0.2')

                                          ALL_ROWS

                                          OUTLINE_LEAF(@"SEL$1")

                                          OUTLINE_LEAF(@"SEL$2")

                                          OUTLINE_LEAF(@"SEL$3")

                                          OUTLINE_LEAF(@"SEL$4")

                                          OUTLINE_LEAF(@"SEL$5")

                                          OUTLINE_LEAF(@"SET$1")

                                          OUTLINE_LEAF(@"SEL$6")

                                          NO_ACCESS(@"SEL$6" "from$_subquery$_002"@"SEL$6")

                                          FULL(@"SEL$5" "T"@"SEL$5")

                                          FULL(@"SEL$4" "T"@"SEL$4")

                                          FULL(@"SEL$3" "T"@"SEL$3")

                                          FULL(@"SEL$2" "T"@"SEL$2")

                                          FULL(@"SEL$1" "T"@"SEL$1")

                                        END_OUTLINE_DATA

                                      */

                                     

                                    Moreover, the T shown in the execution plan reveals itself as t@sel$5 if the ALIAS format option is used.

                                    The session stats, though, show that only a single tablescan of the right number of blocks takes place, and no "buffer is pinned count" statistic so suggest that multiple tablescans of the same blocks occur.

                                     

                                    I'm guessing that what's happening is that a particular set of tablescan calls are invoked which mean a single tablescan is reported, but each multiblock read is pinned and examined as many times as is necessary - so we can report one tablescan while effectively doing 5 (in this case) tablescans as efficiently as possible. As far as workload is concerned I would expect the CPU usage to be a little more than a single tablescan in a time that is still likely to be dominated by disk I/O time.

                                     

                                    It would be interesting to see how this plays out on Exadata with the possible inclusion of smart scan mechanisms, though.

                                     

                                    Regards

                                    Jonathan Lewis