Follow-up to getting a summation from results — oracle-tech

    Forum Stats

  • 3,714,815 Users
  • 2,242,633 Discussions
  • 7,845,076 Comments

Discussions

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Follow-up to getting a summation from results

User_OMEF8
User_OMEF8 Member Posts: 126 Blue Ribbon
edited January 14 in SQL & PL/SQL

This follow-up question is directly related to the question that @Frank Kulash answered.

Getting a summation of results — oracle-tech

Is it possible to get a summation for each column? The previous question was for each row. My current results look like this.

DOC_TYPE | '2021-01-01' | '2021-01-02' | '2021-01-03' | SUM_TOTAL
TYPE_1   |           1  |           3  |           2  | 6
TYPE_2   |           4  |           4  |           1  | 9
TYPE_3   |           1  |           1  |           3  | 5

Desired results would be like...

DOC_TYPE | '2021-01-01' | '2021-01-02' | '2021-01-03' | SUM_TOTAL
TYPE_1   |           1  |           3  |           2  | 6
TYPE_2   |           4  |           4  |           1  | 9
TYPE_3   |           1  |           1  |           3  | 5
TOTAL    |           6  |           8  |           6  | 20

Since the data has to be calculated, I think I have to create another query and nest the previous query inside of it. Any thoughts?


EDIT:

I tried to use something called grouping set, but I must have the wrong syntax since it is not working.

Best Answer

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi,

    Thanks for posting the CREATE TABLE and INSERT statements. Remember why you're posting them: to allow the people who want to help you re-create the problem so they can test their ideas. The people who want to help you won't always be using the same schema- and tablespace names that you use, so it's better if you don't include them in your CREATE TABLE statements. (In fact, you can leave out all the storage information; just include the columns and constraints, if any.)

    Desired results would be like...

    DOC_TYPE | '2021-01-01' | '2021-01-02' | '2021-01-03' | SUM_TOTAL
    TYPE_1   |           1  |           3  |           2  | 6
    TYPE_2   |           10 |           20 |           5  | 35
    TYPE_3   |          100 |          500 |          60  | 660
    TOTAL    |          111 |          523 |          67  | 701
    

    Post the exact results you want from the given sample data. If, given the sample data, the total for 2021-01-01 should be 6, then don't say you want 111. If you actually do want 111, then explain how you get that number from the given data.

    User_OMEF8
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    I edited my table of results. I didn't update the attachments, but I will leave the other information off next time. Thanks for your reply!

  • Paulzip
    Paulzip Member Posts: 8,076 Gold Crown

    Yeah, just select from your existing results and union all a total line...

    union all
    select 'TOTAL', sum("2021-01-01"), SUM("2021-01-02"), SUM("2021-01-03"), SUM(sum_total)
    from your_results
    
    User_OMEF8
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    Thanks! I did try the "union all", but it kept giving me an error message. ORA-01722: invalid number and I suspect that it has to do with the PIVOT that is used.

  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon
    edited January 14

    Thanks Frank!! This query did the trick. It is more than what I know so I will have to dissect it and read up on some of the functions you used. I appreciate the tips about the INSERT too. Would it be correct to assume that the "grouping set" doesn't work because of the PIVOT?

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi, @User_OMEF8

    Would it be correct to assume that the "grouping set" doesn't work because of the PIVOT?

    Sorry, I don't understand the question. What do you mean by "grouping set"? (There is an option for GROUP BY that uses the keyword GROUPING SETS, but I don't see where anyone mentioned it before you did.)

    Pivoting is a specific kind of aggregation. GROUP BY is the most general way to do aggregation, including pivoting. SELECT ... PIVOT is a very handy, concise way do do some specific things that you could do with an explicit GROUP BY clause. It is NOT a handy, concise way to do everything that you could do with an explicit GROUP BY clause. In this problem, you want to do some things that SELECT ... PIVOT was designed to do, and some things that are better done with an explicit GROUP BY clause. I showed one way of using SELECT ... PIVOT and GROUP BY wor together, applying SELECT ... PIVOT to the results of GROUP BY.  Paulzip suggested another way of making them work together, applying GROUP BY to the results of SELECT ... PIVOT.

    User_OMEF8
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    I was referring to the "group by grouping sets ( (), () )". I had tried to use that, but it did not work the way I had wanted and I suspected that it was because of the PIVOT. Either way, thanks again for the extra information and the help!

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi,

    I was referring to the "group by grouping sets ( (), () )". I had tried to use that, ...

    If you want to talk about a query you tried, then post the query. It's hard to say what you did wrong without knowing what you did.

    User_OMEF8
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi, @User_OMEF8

    Here's another way to do what you requested, doing the SELECT ... PIVOT first and then GROUP BY, without using a separate sub-queries for those two:

       WITH data_to_pivot AS
    (
       SELECT UPPER (a.doc_type)                  AS doc_type
       ,      TO_CHAR (a.date_time, 'yyyy-mm-dd') AS dates
       FROM   tablea a
       WHERE  EXISTS ( 
                             SELECT 1
    			 FROM  tableb b 
               	         WHERE  b.filename = 'ABCXYZ'
    			 AND   b.id        = TO_CHAR (a.id)
               ) 
    )
    SELECT    NVL (doc_type, 'TOTAL') AS doc_type
    ,	 SUM (d_1)                AS d_2021_01_01
    ,	 SUM (d_2)                AS d_2021_01_02
    ,	 SUM (d_3)                AS d_2021_01_03
    ,        SUM (d_1 + d_2 + d_3)    AS total
    FROM   data_to_pivot
    PIVOT   (  COUNT (*)
            FOR (dates)  IN  ( '2021-01-01' AS d_1
    	   	   	 , '2021-01-02' AS d_2
    			 , '2021-01-03' AS d_3
    			 )
         )
    GROUP BY ROLLUP (doc_type)
    ORDER BY NULLIF (doc_type, 'TOTAL')
    ;
    

    You could do the same thing with GROUPING SETS instead of ROLLUP. (Anything that ROLLUP can do, GROUPING SETS can do also. Likewise, anything that CUBE can do, GROUPING SETS can do. also. ROLLUP and CUBE are just handy, simpler ways of solving special cases of GROUPING SETS problems.)

    User_OMEF8
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    Thanks Frank! Sorry for the late reply. I had run this query and it returned the same results as well. I ran both against my production data and the first query is significantly faster than this one.

    When I tried to use the GROUPING SETS a few days ago, it gave me some strange results. (I know you spent a lot of time already, so no need to spend anymore unless you really want to.)

    WITH  data_to_pivot  AS
    (
          select NVL(upper (a.doc_type), 'TOTAL')   as DOC_TYPE
          ,      to_char(a.date_time, 'yyyy-mm-dd') as DATES
          ,      count(*)
          from   tableA a
          where  exists ( 
                          select 1 from tableB b 
                          where b.filename = 'ABCXYZ' and to_char(a.id) = b.id
                        ) 
          group by grouping sets ( (), a.doc_type, a.date_time)
    )
    SELECT    p.*
    ,         "'2021-01-01'" + "'2021-01-02'" + "'2021-01-03'" AS total
    FROM     data_to_pivot
    PIVOT    (   COUNT (*)
             FOR (dates) IN ( '2021-01-01', '2021-01-02', '2021-01-03' )
             ) p
    ORDER BY doc_type;
    
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    HI,

    One problem is the COUNT in the sub-query data_to_pivot. Instead of

       ,   count(*)             
    

    assign it an alias so you can use that value later on, like this

       ,   count(*)              as CNT
    

    In the query you posted, that value isn't being used anywhere.


    Then there's the GROUP BY clause. Let's run the sub-query data_to_pivot all by itself, to make sure it's doing what we want. for example

    WITH data_to_pivot AS
    (
       select NVL(upper (a.doc_type), 'TOTAL')  as DOC_TYPE
       ,   to_char(a.date_time, 'yyyy-mm-dd') as DATES
       ,   count(*)              as CNT
       from  tableA a
       where exists ( 
               select 1 from tableB b 
               where b.filename = 'ABCXYZ' and to_char(a.id) = b.id
              ) 
       group by grouping sets ( ()           -- total for entire result set, regardless of doc_type or date_time
       	    		   , a.doc_type  -- total for doc_type, regardless of date_time
    			   , a.date_time -- total for date_time, regardless of doc_type
    			   )
    )
    SELECT   *
    FROM	 data_to_pivot
    ORDER BY doc_type, dates
    ;
    

    I find it helpful to format the GROUPING SETS so that each type of aggregation is specified on a separate line of code. You can even add comments to make it clear what each type is doing.

    The results from the query above are:

    We asked for a grand total, regardless of doc_type or date_time. We got what we asked for; the row in the middle with cnt=20.

    We asked for a total by doc_doc_type, regardless of date_time. We got what we asked for; the three rows at the end, with no date_time. (One row for each doc_type.)

    We asked for a total by date_time, regardless of doc_type. We got what we asked for; the three rows at the top, with the dtes shown. (One row for each value of date_time.)

    Did you want to get results like this:

    ? Why didn't we get the rows with totals for the distinct combinations of doc_type and date_time? We didn't get what we didn't ask for. To get the 16-row result set above, change the GROUP BY clause to:

       group by grouping sets ( ()     -- total for entire result set, regardless of doc_type or date_time
       	    		   , a.doc_type -- total for doc_type, regardless of date_time
    			   , a.date_time -- total for date_time, regardless of doc_type
    			   , (a.doc_type, a.date_time) -- total for each individual combination
    			   )
    

    which, by the way, is equivalent to

      GROUP BY CUBE ( a.date_time
      	   	, a.doc_type
    	        )  
    


    User_OMEF8
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    Thanks @Frank Kulash ! This is what it gave me when I ran my query.

    I saw that there were some correct values, but not in the right format of course.

    Did you want to get results like this:

    No, I still wanted to get the desired results as this.

    When I was researching some more (after my original post), I discovered the GROUPING SETS, so I was trying to get that to work with what you had given me from the last thread.

  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    @Frank Kulash I fiddled with the query based on your suggestions and I was pleasantly pleased to have actually gotten the results. I am not sure if this was just a fluke, but it did get me the results with only a minor distinction.

    The TOTAL is at the top of the row now. I don't mind it though, but I suspect that this is probably sorting it alphabetically...

    WITH  data_to_pivot  AS
    (
          select NVL(upper (a.doc_type), 'TOTAL')   as DOC_TYPE
          ,      to_char(a.date_time, 'yyyy-mm-dd') as DATES
          ,      count(*)                           as CNT -- alias 
          from   tableA a
          where  exists ( 
                          select 1 from tableB b 
                          where b.filename = 'ABCXYZ' and to_char(a.id) = b.id
                        ) 
          group by grouping sets ( (), 
                                   a.doc_type, 
                                   a.date_time, 
                                   (a.doc_type, a.date_time) -- added this one
                                 )
    )
    SELECT    p.*
    ,         "'2021-01-01'" + "'2021-01-02'" + "'2021-01-03'" AS total
    FROM     data_to_pivot
    PIVOT    (  SUM (CNT) -- changed count to sum and referenced alias
             FOR (dates) IN ( '2021-01-01', '2021-01-02', '2021-01-03' )
             ) p
    ORDER BY doc_type;
    
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi,

    No, I still wanted to get the desired results as this.

    That looks lie the results you want from the main query. The question was: What results do you want from running the sub-query (data_to_pivot) alone?

    User_OMEF8
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi,

    The TOTAL is at the top of the row now. I don't mind it though, but I suspect that this is probably sorting it alphabetically...

    Right; 'TO...' comes before 'TY...' in alphabetic order.

    The queries above (using GROUP BY CUDE and GROUP BY ROLLUP) showed one way to put the total at the end.

    User_OMEF8
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon

    It looks like I missed a line. I changed my ORDER BY and it fixed it.

    ORDER BY NULLIF (doc_type, 'TOTAL');
    

    I will still use the query in the answer though because it completes in a matter of seconds, whereas the other queries we experimented take several minutes.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 39,950 Red Diamond

    Hi,

    Make sure you test with some sample data where some doc_type/date_time combinations are absent. (The query you posted will report NULL totals whenever that happens.)

    You're computing the total column two different ways (called "Method A" and "Method B" in the comments below):

                        WITH data_to_pivot AS
    (
       select NVL(upper (a.doc_type), 'TOTAL')  as DOC_TYPE
       ,   to_char(a.date_time, 'yyyy-mm-dd') as DATES
       ,   count(*)              as CNT -- alias 
       from  tableA a
       where exists ( 
               select 1 from tableB b 
               where b.filename = 'ABCXYZ' and to_char(a.id) = b.id
              ) 
       group by grouping sets ( (),                                 -- Method A
                    a.doc_type,                                     -- Method A
                    a.date_time, 
                    (a.doc_type, a.date_time)
                   )
    )
    SELECT  p.*
    ,     "'2021-01-01'" + "'2021-01-02'" + "'2021-01-03'" AS total -- Method B
    FROM   data_to_pivot
    PIVOT  ( SUM (CNT) -- changed count to sum and referenced alias
         FOR (dates) IN ( '2021-01-01', '2021-01-02', '2021-01-03' )
         ) p
    ORDER BY NULLIF (doc_type, 'TOTAL');
    

    That's redundant and inefficient. If you're going to compute the total column in the super-query, then you don't need to compute the totals that don't depend on date_time in the sub-query. On the other hand, if you compute the totals in the sub-query (Method A), then you don't need the + operation in the super-query (Method B), which carries the risk of getting a NULL total.

    I would use the CUBE solution posted earlier. You can make it more efficient by changing the outer join to an inner join. (I used an outer join to get 0 instead of NULL when a doc_type/date_time combination was absent. That was not needed to get the correct totals.) You could also try getting rid of the first sub-query (date_times_wanted) altogether, and hard-coding the desired dates in data_to_pivot. Exactly how you do that depends on exactly what you want. I hesitate to post an example, because I suspect you don't really want separate columns for individual TIMESAMP values. I'm guessing that you want separate columns for different ranges, like calendar days.

Sign In or Register to comment.