Forum Stats

  • 3,874,184 Users
  • 2,266,680 Discussions
  • 7,911,761 Comments

Discussions

How to dynamically pivot on unknown column.

User_5F8TP
User_5F8TP Member Posts: 93 Bronze Badge
edited Nov 18, 2022 10:22PM in SQL & PL/SQL

In the Oracle SQL query, how can I pivot without hardcoding the values of leave_reason?

CREATE TABLE Persons (

  group_id int,

  leave_reason varchar(40),

  person_id int,

  number_of_days int

);

INSERT INTO Persons (group_id, leave_reason, person_id, number_of_days)

VALUES (1, 'Employee''s PTO', 44, 1);

INSERT INTO Persons (group_id, leave_reason, person_id, number_of_days)

VALUES (2, 'EMPLOYEE''S SERIOUS CONDITION', 55, 5);

INSERT INTO Persons (group_id, leave_reason, person_id, number_of_days)

VALUES (3, 'EMPLOYEE''S SERIOUS CONDITION', 66, 7);

INSERT INTO Persons (group_id, leave_reason, person_id, number_of_days)

VALUES (3, 'Family Vacation', 77, 2);

I want the output as follows;

Please kindly let me know how can I write a pivot query to obtain this output without hardcoding the values of LEAVE_REASON field?

I appreciate your help and thank you in advance!

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,746 Red Diamond
    Answer ✓

    Hi, @User_5F8TP

    If you really, really need dynamic pivoting, here's one way to do it. (I used SQL*Plus features, such as substitution variables, but you can use other features instead.)

    The first step in writing dynamic SQL is to write a static SQL statement for a typical case. Fr this problem, you might start with:

    SELECT    *
    FROM	  persons
    PIVOT	  ( COUNT (person_id)     AS cnt
    	  , SUM (number_of_days)  AS days
    	  FOR  leave_reason  IN ( Q'{Employee's PTO}'		     AS Employees_PTO
    	       		     	, Q'{EMPLOYEE'S SERIOUS CONDITION}'  AS EMPLOYEES_SERIOUS_CONDITION
    				, Q'{Family Vacation}'    	     AS Family_Vacation
    				)
    	  )
    ORDER BY  group_id
    ;
    

    This works for 3 leave_reasons, whose values you know. You want to make this query dynamic, so it can have however many leave_reasons happen to be in the table, with whatever values they happen to have. In this case, you just need to re-write the FOR leave_reason IN (...) clause, like this:

    -- ==========  Preliminary Query  ==========
    --SET	TERMOUT  OFF
    
    
    COLUMN  pivot_in_clause_col	NEW_VALUE pivot_in_clause
    
    
    WITH    leave_reasons_wanted    AS
    (
        SELECT DISTINCT  leave_reason
        ,	   	     TRANSLATE ( leave_reason
        		     	       , ' '''
    			       , '_'
    			       )  AS col_alias
        FROM    persons
    --  WHERE   ...		-- if needed
    )
    SELECT  LISTAGG ( 'Q''{' || leave_reason
    		       	 || '}''	AS '
    		       	 || col_alias
    		, CHR (10) || ', '
    		) WITHIN GROUP (ORDER BY leave_reason)  as pivot_in_clause_col
    FROM    leave_reasons_wanted
    ;
    
    
    SET	TERMOUT  ON
    
    
    -- ==========  Main Quary  ==========
    SELECT    *
    FROM	  persons
    PIVOT	  ( COUNT (person_id)     AS cnt
    	  , SUM (number_of_days)  AS days
    	  FOR  leave_reason  IN ( &pivot_in_clause
    				)
    	  )
    ORDER BY  group_id
    ;
    
    
    

    Output from your sample data (with SET TERMOUT OFF commented out):

    PIVOT_IN_CLAUSE_COL
    ----------------------------------------------------------------------------------------------------
    Q'{EMPLOYEE'S SERIOUS CONDITION}'       AS EMPLOYEES_SERIOUS_CONDITION
    , Q'{Employee's PTO}'   AS Employees_PTO
    , Q'{Family Vacation}'  AS Family_Vacation
    
    1 row selected.
    
              EMPLOYEES  EMPLOYEESadv
               _SERIOUS   _SERIOUS Employees Employees    Family    Family
             _CONDITION _CONDITION      _PTO      _PTO _Vacation _Vacation
    GROUP_ID       _CNT      _DAYS      _CNT     _DAYS      _CNT     _DAYS
    -------- ---------- ---------- --------- --------- --------- ---------
           1          0                    1         1         0
           2          1          5         0                   0
           3          1          7         0                   1         2
    
    3 rows selected.
    

    Now that you know how to do it, you can try improving it, for example

    • Generate column names like Employees_PTO instead of Employees_PTO_CNT
    • Use NVL to display 0 instead of NULL in the _DAYS columns
    • Generate multi-line column headings (like I did)
    • Use @file_name instead of substitution variables


    User_5F8TP

Answers

  • mathguy
    mathguy Member Posts: 10,895 Black Diamond

    In Oracle SQL you can't. You would need to use dynamic pivoting, which can't be done in plain SQL.

    Before you go too far in that direction, are you sure you need the output in that format? This type of question is asked quite frequently, and there are exceptionally few posters who ever explain why they need that particular format (and I have never seen an explanation that made sense).

    Suppose in the data you find 38 distinct values for LEAVE_REASON. Even leaving aside the issue of printing 38 columns (or showing 38 columns on screen) - they are very likely to fit - what human is ever able to read and understand information presented in 38 columns? And please keep in mind that human consumption is the only plausible reason for such a request - pivoted data is useless if it's needed for further processing.

    If you are absolutely certain that you do need dynamic pivoting, check to see if your reporting software doesn't already offer that capability. Writing correct dynamic pivoting by hand is only for advanced programmers - are you one of them?

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,746 Red Diamond
    Answer ✓

    Hi, @User_5F8TP

    If you really, really need dynamic pivoting, here's one way to do it. (I used SQL*Plus features, such as substitution variables, but you can use other features instead.)

    The first step in writing dynamic SQL is to write a static SQL statement for a typical case. Fr this problem, you might start with:

    SELECT    *
    FROM	  persons
    PIVOT	  ( COUNT (person_id)     AS cnt
    	  , SUM (number_of_days)  AS days
    	  FOR  leave_reason  IN ( Q'{Employee's PTO}'		     AS Employees_PTO
    	       		     	, Q'{EMPLOYEE'S SERIOUS CONDITION}'  AS EMPLOYEES_SERIOUS_CONDITION
    				, Q'{Family Vacation}'    	     AS Family_Vacation
    				)
    	  )
    ORDER BY  group_id
    ;
    

    This works for 3 leave_reasons, whose values you know. You want to make this query dynamic, so it can have however many leave_reasons happen to be in the table, with whatever values they happen to have. In this case, you just need to re-write the FOR leave_reason IN (...) clause, like this:

    -- ==========  Preliminary Query  ==========
    --SET	TERMOUT  OFF
    
    
    COLUMN  pivot_in_clause_col	NEW_VALUE pivot_in_clause
    
    
    WITH    leave_reasons_wanted    AS
    (
        SELECT DISTINCT  leave_reason
        ,	   	     TRANSLATE ( leave_reason
        		     	       , ' '''
    			       , '_'
    			       )  AS col_alias
        FROM    persons
    --  WHERE   ...		-- if needed
    )
    SELECT  LISTAGG ( 'Q''{' || leave_reason
    		       	 || '}''	AS '
    		       	 || col_alias
    		, CHR (10) || ', '
    		) WITHIN GROUP (ORDER BY leave_reason)  as pivot_in_clause_col
    FROM    leave_reasons_wanted
    ;
    
    
    SET	TERMOUT  ON
    
    
    -- ==========  Main Quary  ==========
    SELECT    *
    FROM	  persons
    PIVOT	  ( COUNT (person_id)     AS cnt
    	  , SUM (number_of_days)  AS days
    	  FOR  leave_reason  IN ( &pivot_in_clause
    				)
    	  )
    ORDER BY  group_id
    ;
    
    
    

    Output from your sample data (with SET TERMOUT OFF commented out):

    PIVOT_IN_CLAUSE_COL
    ----------------------------------------------------------------------------------------------------
    Q'{EMPLOYEE'S SERIOUS CONDITION}'       AS EMPLOYEES_SERIOUS_CONDITION
    , Q'{Employee's PTO}'   AS Employees_PTO
    , Q'{Family Vacation}'  AS Family_Vacation
    
    1 row selected.
    
              EMPLOYEES  EMPLOYEESadv
               _SERIOUS   _SERIOUS Employees Employees    Family    Family
             _CONDITION _CONDITION      _PTO      _PTO _Vacation _Vacation
    GROUP_ID       _CNT      _DAYS      _CNT     _DAYS      _CNT     _DAYS
    -------- ---------- ---------- --------- --------- --------- ---------
           1          0                    1         1         0
           2          1          5         0                   0
           3          1          7         0                   1         2
    
    3 rows selected.
    

    Now that you know how to do it, you can try improving it, for example

    • Generate column names like Employees_PTO instead of Employees_PTO_CNT
    • Use NVL to display 0 instead of NULL in the _DAYS columns
    • Generate multi-line column headings (like I did)
    • Use @file_name instead of substitution variables


    User_5F8TP
  • mathguy
    mathguy Member Posts: 10,895 Black Diamond

    The code Mr. Kulash proposed above illustrates my point. He is the person with the highest reputation in this forum; and even his code has some defects. I will mention a few in a moment. My point is that you are very unlikely to write better code than Mr. Kulash; do you really have to try?

    Here are some of the issues with that code.

    If the leave reasons aren't written consistently, you may have the same reason written in different capitalization on different rows: perhaps Sick in one row but SICK in another. In the queries suggested above, this will result in two columns by the same name in the output of PIVOT. This is invalid; Oracle will throw ORA-00918, column ambiguously defined.

    The proposed code replaces spaces with underscore and removes apostrophes. There may be other characters that are illegal in column names; it should remove those too. Moreover, the code does not deal with all the other ways in which the text from the LEAVE_REASON column may result in invalid column names in the output of PIVOT; for example 108 Degree Fever (column names aren't allowed to begin with a digit), or AUDIT (this is a reserved keyword, so it is illegal as a column name), etc.

    Worst: little Bobby Tables, on his last day of work at your company, will write a LEAVE_REASON that will drop the PERSONS table, or do other, nastier things. This is called SQL injection; I won't demonstrate how it can be done, but it's quite easy.

    Mr. Kulash can very easily fix all these problems, and perhaps think of a few more and fix those too. But do you really, really need all this?

  • User_5F8TP
    User_5F8TP Member Posts: 93 Bronze Badge

    @Frank Kulash

    Hi Frank,

    I really appreciate your response. I was wondering if there is an alternative for substitution variables or file.

    Please see my code below. Can I create an inline view and then call pivot_in_clause_col in the pivot?

    Thank you in advance!

    Unfortunately, I'm getting an error with my code. It doesn't like "select pivot_in_clause from leave_reason_data" within the pivot.

    WITH leave_reasons_wanted  AS

    (

      SELECT DISTINCT leave_reason

      ,       TRANSLATE ( leave_reason

               , ' '''

        , '_'

        ) AS col_alias

      FROM  persons

    )


    ,leave_reason_data as

    (

    SELECT LISTAGG ( 'Q''{' || leave_reason

         || '}'' AS '

         || col_alias

    , CHR (10) || ', '

    ) WITHIN GROUP (ORDER BY leave_reason) as pivot_in_clause_col

    FROM  leave_reasons_wanted

    )


    -- ========== Main Quary ==========

    SELECT  *

    FROM  persons

    PIVOT  ( COUNT (person_id)   AS cnt

     , SUM (number_of_days) AS days

     FOR leave_reason IN (select pivot_in_clause from leave_reason_data

    )

     )

    ORDER BY group_id

    ;

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,746 Red Diamond

    Hi, @User_5F8TP

    I was wondering if there is an alternative for substitution variables or file.

    In PL/SQL you can use EXECUTE IMMEDIATE.

    Please see my code below. Can I create an inline view and then call pivot_in_clause_col in the pivot?

    Sorry: no. Once again, it is impossible to write a SQL query where the number of columns in the result set depends on the data in the table. SQL statements are compiled first, and if no errors are found in the compilation, then the query is executed. The number of columns is set when the statement is compiled, and data is found only when the statement is executed, so there is no way the number of columns can depend on the data found.

    Instead of getting a variable number of columns, you can get a large fixed number of columns. For example, you could write a query that always produced 11 columns of output, enough for group_id plus 2 columns for each of 5 leave_reasons. If you have fewer than 5 reasons, then some of the columns would just be NULL, but if you have more than 5 reasons. some of them can't be displayed. Of course, there's nothing special about the number 5; you could make it 10, or 50, but who can read that many columns? Also, the column names would have to be generic, like REASON_1, REASON_2, ..... If you wanted to have the actual reasons displayed, they would have to included in the result set, perhaps with a UNION.

    A variation on that approach is to have a fixed number (say 5) reasons on one row, and if there are more than 5 reasons, use a second (and third, fourth, ... if necessary) output row. The number of columns has to be hard-coded into the query, but the number of rows doesn't need to be. Again, the column names would have to be generic.

    A completely different approach is to output a huge string column that is formatted to look like multiple columns. That is, you could get output that looks like this:

              EMPLOYEES  EMPLOYEES
               _SERIOUS   _SERIOUS Employees Employees    Family    Family
             _CONDITION _CONDITION      _PTO      _PTO _Vacation _Vacation
    GROUP_ID       _CNT      _DAYS      _CNT     _DAYS      _CNT     _DAYS
    -------- ---------- ---------- --------- --------- --------- ---------
           1          0          0         1         1         0         0
           2          1          5         0         0         0         0
           3          1          7         0         0         1         2
    

    A person reading this might think it was 3 rows and 7 columns (all NUMBERs), but it's actually 8 rows (including 5 that look like a header) and 2 columns (GROUP_ID and a big VARCHAR2 column).

    Would any of these options be good enough for you?

    User_5F8TP
  • BluShadow
    BluShadow Member, Moderator Posts: 42,555 Red Diamond

    As Frank says, this is about when the query is parsed to get the definition of what the output is going to look like, and when the data is actually fetched, which comes after the parsing.

    An old community document around this:

    https://community.oracle.com/tech/developers/discussion/2311172/pl-sql-101-cursors-and-sql-projection

    (please excuse the "jive" forum formatting that shows on there, it didn't convert well when Oracle moved the forum platform and trying to edit it seems to completely mess it all up so I've left it as it is)

  • User_5F8TP
    User_5F8TP Member Posts: 93 Bronze Badge

    @Frank Kulash

    Hi Frank,

    Thank you for your response. Would you please kindly show me your first option in a query?

    I really appreciate it. Thank you in advance!

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,746 Red Diamond
    edited Nov 23, 2022 2:13AM

    hI, @User_5F8TP

    Would you please kindly show me your first option in a query?

    You mean the fixed number of columns, right?

    Here's one way, with 5 pairs of columns. (That means if you have 6 or more distinct leave_reasons, you'll have to ignore some of them.)

    WITH    got_r_num    AS
    (
    	SELECT DISTINCT  leave_reason
    	,      		 DENSE_RANK () OVER (ORDER BY leave_reason)  AS r_num
    	,		 COUNT (DISTINCT leave_reason) OVER ()	     AS max_r_num
    	FROM		 persons
    )
    ,    data_to_pivot    AS
    (
    	SELECT  p.group_id, p.person_id, p.number_of_days
    	,	r.r_num, r.max_r_num
    	FROM	persons    p
    	JOIN	got_r_num  r  ON  r.leave_reason = p.leave_reason
    )
    SELECT    group_id
    ,	  CASE WHEN max_r_num >= 1 THEN r1_c END	   AS r1_cnt
    ,	  CASE WHEN max_r_num >= 1 THEN NVL (r1_d, 0) END  AS r1_days
    ,	  CASE WHEN max_r_num >= 2 THEN r2_c END	   AS r2_cnt
    ,	  CASE WHEN max_r_num >= 2 THEN NVL (r2_d, 0) END  AS r2_days
    ,	  CASE WHEN max_r_num >= 3 THEN r3_c END	   AS r3_cnt
    ,	  CASE WHEN max_r_num >= 3 THEN NVL (r3_d, 0) END  AS r3_days
    ,	  CASE WHEN max_r_num >= 4 THEN r4_c END	   AS r4_cnt
    ,	  CASE WHEN max_r_num >= 4 THEN NVL (r4_d, 0) END  AS r4_days
    ,	  CASE WHEN max_r_num >= 5 THEN r5_c END	   AS r5_cnt
    ,	  CASE WHEN max_r_num >= 5 THEN NVL (r5_d, 0) END  AS r5_days
    FROM	  data_to_pivot
    PIVOT	  ( COUNT (person_id)	  AS c
    	  , SUM (number_of_days)  AS d
    	  FOR  r_num  IN ( 1  AS r1
    	       	      	 , 2  AS r2
    			 , 3  AS r3
    			 , 4  AS r4
    			 , 5  AS r5
    			 )
    	  )
    ORDER BY  group_id
    ;
    
    
    

    Output:

    GROUP_ID  R1_CNT R1_DAYS  R2_CNT R2_DAYS  R3_CNT R3_DAYS  R4_CNT R4_DAYS  R5_CNT R5_DAYS
    -------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -------
           1       0       0       1       1       0       0
           2       1       5       0       0       0       0
           3       1       7       0       0       1       2
    

    As you can see, the columns have hard-coded generic names. To make this practical, you need to include something that explains what 'R1', 'R2', ..., 'R5' mean. The simplest way to do that is to run a separate query, but that runs the risk of getting wrong results if the table is changing while you run the queries. Another way is to add an decoding key after the r5days column, like this:

    GROUP_ID  R1_CNT R1_DAYS  R2_CNT R2_DAYS  R3_CNT R3_DAYS  R4_CNT R4_DAYS  R5_CNT R5_DAYS KEY
    -------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ---------------------------------
           1       0       0       1       1       0       0                                 R1 = EMPLOYEE'S SERIOUS CONDITION
           2       1       5       0       0       0       0                                 R2 = Employee's PTO
           3       1       7       0       0       1       2                                 R3 = Family Vacation
    

    Here's how you can modify the query above to do that:

    WITH    got_r_num    AS
    (
    	SELECT DISTINCT  leave_reason
    	,      		 DENSE_RANK () OVER (ORDER BY leave_reason)  AS r_num
    	,		 COUNT (DISTINCT leave_reason) OVER ()	     AS max_r_num
    	FROM		 persons
    )
    ,    data_to_pivot    AS
    (
    	SELECT  p.group_id, p.person_id, p.number_of_days
    	,	r.r_num, r.max_r_num
    	FROM	persons    p
    	JOIN	got_r_num  r  ON  r.leave_reason = p.leave_reason
    )
    ,    pivoted_data    AS
    (
    	SELECT    group_id
    	,	  ROW_NUMBER () OVER (ORDER BY group_id)	   AS s_num
    	,	  CASE WHEN max_r_num >= 1 THEN r1_c END	   AS r1_cnt
    	,	  CASE WHEN max_r_num >= 1 THEN NVL (r1_d, 0) END  AS r1_days
    	,	  CASE WHEN max_r_num >= 2 THEN r2_c END	   AS r2_cnt
    	,	  CASE WHEN max_r_num >= 2 THEN NVL (r2_d, 0) END  AS r2_days
    	,	  CASE WHEN max_r_num >= 3 THEN r3_c END	   AS r3_cnt
    	,	  CASE WHEN max_r_num >= 3 THEN NVL (r3_d, 0) END  AS r3_days
    	,	  CASE WHEN max_r_num >= 4 THEN r4_c END	   AS r4_cnt
    	,	  CASE WHEN max_r_num >= 4 THEN NVL (r4_d, 0) END  AS r4_days
    	,	  CASE WHEN max_r_num >= 5 THEN r5_c END	   AS r5_cnt
    	,	  CASE WHEN max_r_num >= 5 THEN NVL (r5_d, 0) END  AS r5_days
    	FROM	  data_to_pivot
    	PIVOT	  ( COUNT (person_id)	  AS c
    	  	  , SUM (number_of_days)  AS d
    	  	  FOR  r_num  IN ( 1  AS r1
    	       	      	      	 , 2  AS r2
    			 	 , 3  AS r3
    			 	 , 4  AS r4
    			 	 , 5  AS r5
    			 	 )
    	          )
    )
    SELECT    p.group_id, p.r1_cnt, p.r1_days, p.r2_cnt, p.r2_days, p.r3_cnt, p.r3_days
    , 	  p.r4_cnt, p.r4_days, p.r5_cnt, p.r5_days
    ,	  'R' || TO_CHAR (g.r_num) || ' = ' || g.leave_reason	AS key
    FROM	  pivoted_data  p
    FULL JOIN got_r_num     g  ON  g.r_num  = p.s_num
    ORDER BY  p.group_id, g.r_num
    ;
    


    User_5F8TP
  • Marwim
    Marwim Member Posts: 3,654 Gold Trophy

    Do you need it as a result of a SQL?

    Otherwise you could select it as rows and then "manually" write it as columns in a sheet (e.g. with as_xlsx)