PL/SQL 101 : Cursors and SQL Projection

Version 5

    PL/SQL 101 : Cursors and SQL Projection
    (update from the original thread: PL/SQL 101 : Cursors and SQL Projection)

     

    Author: BluShadow

    Last Updated: 28th May 2015

    Introduction

     

    There are two integral parts to an SQL Select statement that relate to what data is selected. One is Projection and the other is Selection:-

    Selection is the one that most people will recognise as it forms the WHERE clause of the select statement, and hence selects which rows of data are queried.
    The other, SQL Projection is the one that is less understood, even though we also use this all the time, and it's this term that this article will help to explain.

     

    1. What is SQL Projection


    In short, SQL Projection is the collective name for the columns that are Selected and returned from a query.

    "So what?" "Big deal eh?" "Why do we need to know this?" I hear you say

    The reason for knowing this is that many people are not aware of when SQL projection comes into play when you issue a select statement.  And this can sometimes catch people out when they try and do something that relies on it, as you'll see...

     

    To demonstrate, let's first create some test data...

     

    create table proj_test as

      select 1 as id, 1 as rn, 'Fred' as nm from dual union all

      select 1,2,'Bloggs' from dual union all

      select 2,1,'Scott' from dual union all

      select 2,2,'Smith' from dual union all

      select 3,1,'Jim' from dual union all

      select 3,2,'Jones' from dual

    /

    ... and now query that data...

    SQL> select * from proj_test;

             ID         RN NM

    ---------- ---------- ------

             1          1 Fred

             1          2 Bloggs

             2          1 Scott

             2          2 Smith

             3          1 Jim

             3          2 Jones

    6 rows selected.

     

    OK, so what is that query actually doing?

    To know that we need to consider that all queries are cursors and all cursors are processed in a set manner, roughly speaking...


    1. The cursor is opened
    2. The query is parsed
    3. The query is described to know the projection (what columns are going to be returned, names, datatypes etc.)
    4. Bind variables are bound in
    5. The query is executed to apply the selection and identify the data to be retrieved
    6. A row of data is fetched
    7. The data values from the columns within that row are extracted into the known projection
    8. Step 6 and 7 are repeated until there is no more data or another condition ceases the fetching
    9. The cursor is closed


    The purpose of the projection being determined is so that the internal processing of the cursor can allocate memory etc. ready to fetch the data into.

    We won't get to see that memory allocation happening easily, but we can see the same query being executed in these steps if we do it programatically using the dbms_sql package...

     

    CREATE OR REPLACE PROCEDURE process_cursor (p_query in varchar2) IS
      v_sql       varchar2(32767) := p_query;
      v_cursor    number;            -- A cursor is a handle (numeric identifier) to the query
      --
      col_cnt     integer;
      v_n_val     number;            -- numeric type to fetch data into
      v_v_val     varchar2(20);      -- varchar type to fetch data into
      v_d_val     date;              -- date type to fetch data into
      --
      rec_tab     dbms_sql.desc_tab; -- table structure to hold sql projection info
      dummy       number;
      v_ret       number;            -- number of rows returned
      v_finaltxt  varchar2(100);
      col_num     number;
    BEGIN
      --
      -- 1. Open the cursor
      --
      dbms_output.put_line('1 - Opening Cursor');
      v_cursor := dbms_sql.open_cursor;
      --
      -- 2. Parse the cursor
      --
      dbms_output.put_line('2 - Parsing the query');
      dbms_sql.parse(v_cursor, v_sql, dbms_sql.NATIVE);
      --
      -- 3. Describe the query
      --
      -- Note: The query has been described internally when it was parsed, but we can look at
      --       that description...
      --
      -- Fetch the description into a structure we can read, returning the count of columns that has been projected
      dbms_output.put_line('3 - Describing the query');
      dbms_sql.describe_columns(v_cursor, col_cnt, rec_tab);
      --
      -- Use that description to define local datatypes into which we want to fetch our values
      -- Note: This only defines the types, it doesn't fetch any data and whilst we can also
      --       determine the size of the columns we'll just use some fixed sizes for this example
      dbms_output.put_line(chr(10)||'3a - SQL Projection:-');
      for j in 1..col_cnt
      loop
        v_finaltxt := 'Column Name: '||rpad(upper(rec_tab(j).col_name),30,' ');
        case rec_tab(j).col_type
          -- if the type of column is varchar2, bind that to our varchar2 variable
          when 1 then
            dbms_sql.define_column(v_cursor,j,v_v_val,20);
            v_finaltxt := v_finaltxt||' Datatype: Varchar2';
          -- if the type of the column is number, bind that to our number variable
          when 2 then
            dbms_sql.define_column(v_cursor,j,v_n_val);
            v_finaltxt := v_finaltxt||' Datatype: Number';
          -- if the type of the column is date, bind that to our date variable
          when 12 then
            dbms_sql.define_column(v_cursor,j,v_d_val);
            v_finaltxt := v_finaltxt||' Datatype: Date';
          -- ...Other types can be added as necessary...
        else
          -- All other types we'll assume are varchar2 compatible (implicitly converted)
          dbms_sql.DEFINE_COLUMN(v_cursor,j,v_v_val,2000);
          v_finaltxt := v_finaltxt||' Datatype: Varchar2 (implicit)';
        end case;
        dbms_output.put_line(v_finaltxt);
      end loop;
      --
      -- 4. Bind variables
      --
      dbms_output.put_line(chr(10)||'4 - Binding in values');
      null; -- we have no values to bind in for our test
      --
      -- 5. Execute the query to make it identify the data on the database (Selection)
      -- Note: This doesn't fetch any data, it just identifies what data is required.
      --
      dbms_output.put_line('5 - Executing the query');
      dummy := dbms_sql.execute(v_cursor);
      --
      -- 6.,7.,8. Fetch the rows of data...
      --
      dbms_output.put_line(chr(10)||'6,7 and 8 Fetching Data:-');
      loop
        -- 6. Fetch next row of data
        v_ret := dbms_sql.fetch_rows(v_cursor);
        -- If the fetch returned no row then exit the loop
        exit when v_ret = 0;
        --
        -- 7. Extract the values from the row
        v_finaltxt := null;
        -- loop through each of the Projected columns
        for j in 1..col_cnt
        loop
          case rec_tab(j).col_type
            -- if it's a varchar2 column
            when 1 then
              -- read the value into our varchar2 variable
              dbms_sql.column_value(v_cursor,j,v_v_val);
              v_finaltxt := ltrim(v_finaltxt||','||rpad(v_v_val,20,' '),',');
            -- if it's a number column
            when 2 then
              -- read the value into our number variable
              dbms_sql.column_value(v_cursor,j,v_n_val);
              v_finaltxt := ltrim(v_finaltxt||','||to_char(v_n_val,'fm999999'),',');
            -- if it's a date column
            when 12 then
              -- read the value into our date variable
              dbms_sql.column_value(v_cursor,j,v_d_val);
              v_finaltxt := ltrim(v_finaltxt||','||to_char(v_d_val,'DD/MM/YYYY HH24:MI:SS'),',');
          else
            -- read the value into our varchar2 variable (assumes it can be implicitly converted)
            dbms_sql.column_value(v_cursor,j,v_v_val);
            v_finaltxt := ltrim(v_finaltxt||',"'||rpad(v_v_val,20,' ')||'"',',');
          end case;
        end loop;
        dbms_output.put_line(v_finaltxt);
        -- 8. Loop to fetch next row
      end loop;
      -- 9. Close the cursor
      dbms_output.put_line(chr(10)||'9 - Closing the cursor');
      dbms_sql.close_cursor(v_cursor);
    END;
    /

    SQL> exec process_cursor('select * from proj_test');
    1 - Opening Cursor
    2 - Parsing the query
    3 - Describing the query

    3a - SQL Projection:-
    Column Name: ID                             Datatype: Number
    Column Name: RN                             Datatype: Number
    Column Name: NM                             Datatype: Varchar2

    4 - Binding in values
    5 - Executing the query

    6,7 and 8 Fetching Data:-
    1     ,1     ,Fred
    1     ,2     ,Bloggs
    2     ,1     ,Scott
    2     ,2     ,Smith
    3     ,1     ,Jim
    3     ,2     ,Jones
    1     ,3     ,Freddy
    1     ,4     ,Fud

    9 - Closing the cursor

    PL/SQL procedure successfully completed.

     

    As you can see from the output, the SQL Projection describing what columns were going to be returned by the query, and what their datatypes are etc. has been determined before any data has been fetched.

    These processing steps are essentially the same for any query you run against the database, regardless of whether it's a static SQL statement, an implicit or explicit cursor loop in PL, a ref cursor supplied from a 3rd party application, a dynamic query executed with the execute immediate statement, or a query processed using the DBMS_SQL package like above.  All queries are cursors and all go through these steps even if, as in most cases, the steps are invisible to you.

     

     

    2. What's the point in knowing when SQL Projection occurs in a query?


    Well, the community gets many questions asking things like "How do I convert rows to columns?" (otherwise known as a Pivot) or "How can I get the data back from a dynamic query with different columns"

    Let's look at a regular pivot. We can do something like...

    SQL> select id
      2        ,max(decode(rn,1,nm)) as nm_1
      3        ,max(decode(rn,2,nm)) as nm_2
      4  from proj_test
      5  group by id
      6  /

            ID NM_1   NM_2
    ---------- ------ ------
             1 Fred   Bloggs
             2 Scott  Smith
             3 Jim    Jones

    (or, in 11g onwards, use the new PIVOT statement)

     

    As you can see, the two names for each ID have been pivoted to exist on the same row of data, with the RN value determining which of the columns they should go into.
    However, many of these questioners don't understand it when their issue is that they have an unknown number of rows and therefore don't know how many columns it will produce, and experts tell them that you can't do that in a single SQL statement.

     

    For example, what happens if we now have a 3rd name for one of our ID groups...

    SQL> insert into proj_test (id, rn, nm) values (1,3,'Freddy');


    1 row created.


    SQL> select id
      2        ,max(decode(rn,1,nm)) as nm_1
      3        ,max(decode(rn,2,nm)) as nm_2
      4  from proj_test
      5  group by id
      6  /

            ID NM_1   NM_2
    ---------- ------ ------
             1 Fred   Bloggs
             2 Scott  Smith
             3 Jim    Jones

     

    ... it's not giving us this 3rd entry as a new column.

    We can only get that if we write the expected columns into the query, but then what would happen if even more rows were added later.  If we don't know how many rows there are going to be (per grouping) we can't know when we write the query how many columns we need to include in our query.  Sure, we could write the query to allow for 'up to' a predetermined number of columns, but then are we sure that's our real upper limit?  It's certainly not dynamic, and it's giving us columns where we perhaps don't need them.

    If we look back at the steps of a cursor we see again that the description and projection of what columns are returned by a query happens before any data is fetched back.
    Because of this, it's not possible to have the query return back a number of columns that are based on the data itself, as no data has been fetched at the point the projection is required.  The cursor processing needs to know what columns are going to be produced before the data is fetched, so the columns cannot be based on what data may or may not be there.

     

    3. What is the answer to getting an unknown number of columns in the output?


    1) The most obvious answer is Don't.  Don't use SQL to try and pivot your data.  Pivoting of data relates to how data is presented to a user, rather than how it is queried from the database.  It is more of a reporting requirement and most reporting tools include the ability to pivot data either as part of the initial report generation or on-the-fly at the users request. The main point about using the reporting tools is that they query the data first and then the pivoting is simply a case of manipulating the display of those results, which can be dynamically and determined by the reporting tool based on what data there is.

    2) The other answer is to write dynamic SQL. Because you're not going to know the number of columns, this isn't just a simple case of building up a SQL query as a string and passing it to the EXECUTE IMMEDIATE command within PL/SQL.  In that case you won't have a suitable structure to read the results back into as those structures must have a known number of variables for each of the columns at design time, before the data is known.  As such, inside PL/SQL code, you would have to use the DBMS_SQL package, just like in the code above that showed the workings of a cursor, then reference the columns by position rather than name, and deal with each column separately.  What you do with each column is up to you... store them in an array/collection, process them as you get them, or whatever.  The key thing with doing this is that, just like the reporting tools, you would need to process the data first to determine what your SQL projection is, before you execute the query to fetch the data in the format you want.  That means querying your data twice, once to create your dynamic query, and once to actually fetch the data using that query (Note: In general this is considered bad practice!)

     

    For the sake of demonstration, let's look at an example of that...

     

    We'll write a procedure that dynamically generates our query, and passes that to our existing "process_query" procedure...

     

    create or replace procedure dyn_pivot is
      v_sql varchar2(32767);
      --
      -- cursor to find out the maximum number of projected columns required
      -- by looking at the data
      --
      cursor cur_proj_test is
        select distinct rn
        from   proj_test
        order by rn;
    begin
      v_sql := 'select id';
      for i in cur_proj_test
      loop
        -- dynamically add to the projection for the query
        v_sql := v_sql||',max(decode(rn,'||i.rn||',nm)) as nm_'||i.rn;
      end loop;
      v_sql := v_sql||' from proj_test group by id order by id';
      dbms_output.put_line('Dynamic SQL Statement:-'||chr(10)||v_sql||chr(10)||chr(10));
      -- call our DBMS_SQL procedure to process the query with it's dynamic projection
      process_cursor(v_sql);
    end;
    /

    SQL> exec dyn_pivot;
    Dynamic SQL Statement:-
    select id,max(decode(rn,1,nm)) as nm_1,max(decode(rn,2,nm)) as nm_2,max(decode(rn,3,nm)) as nm_3 from proj_test group by id order by id


    1 - Opening Cursor
    2 - Parsing the query
    3 - Describing the query

    3a - SQL Projection:-
    Column Name: ID                             Datatype: Number
    Column Name: NM_1                           Datatype: Varchar2
    Column Name: NM_2                           Datatype: Varchar2
    Column Name: NM_3                           Datatype: Varchar2

    4 - Binding in values
    5 - Executing the query

    6,7 and 8 Fetching Data:-
    1     ,Fred                ,Bloggs              ,Freddy
    2     ,Scott               ,Smith               ,
    3     ,Jim                 ,Jones               ,

    9 - Closing the cursor

    PL/SQL procedure successfully completed.

     

    So, our query has been dynamically generated, based on the data, and dynamically processed by using the DBMS_SQL package.

     

    And if more data is added ...

    SQL> insert into proj_test (id, rn, nm) values (1,4,'Fud');


    1 row created.


    SQL> exec dyn_pivot;
    Dynamic SQL Statement:-
    select id,max(decode(rn,1,nm)) as nm_1,max(decode(rn,2,nm)) as nm_2,max(decode(rn,3,nm)) as nm_3,max(decode(rn,4,nm)) as nm_4 from proj_test group by id order by id


    1 - Opening Cursor
    2 - Parsing the query
    3 - Describing the query

    3a - SQL Projection:-
    Column Name: ID                             Datatype: Number
    Column Name: NM_1                           Datatype: Varchar2
    Column Name: NM_2                           Datatype: Varchar2
    Column Name: NM_3                           Datatype: Varchar2
    Column Name: NM_4                           Datatype: Varchar2

    4 - Binding in values
    5 - Executing the query

    6,7 and 8 Fetching Data:-
    1     ,Fred                ,Bloggs              ,Freddy              ,Fud
    2     ,Scott               ,Smith               ,                    ,
    3     ,Jim                 ,Jones               ,                    ,

    9 - Closing the cursor

     


    PL/SQL procedure successfully completed.

     

    ... the procedure picks up the new row of data, dynamically generates another query to account for that, and dynamically processes it.

     

    Of course there are other methods, using dynamically generated scripts etc., but the above simply demonstrates that...


    a) having a dynamic projection requires two passes of the data; one to dynamically generate the query and another to actually query the data,
    b) it is not a good idea in most cases as it requires code to handle the results dynamically rather than being able to simply query directly into a known structure or variables, and
    c) a simple SQL statement cannot have a dynamic projection.


    Most importantly, dynamic queries prevent validation of your queries at the time your code is compiled. The compiler can't check that the column names are correct or the tables names, or that the actual syntax of the generated query is correct. This only happens at run-time, so, depending upon the complexity of your dynamic query, some problems may only be experienced under certain conditions (high risk if it only appears in a live environment!).  In effect you are writing queries that are harder to validate and could potentially have bugs in them that would are not apparent until they get to a run time environment.  Dynamic queries can also introduce the possibility of SQL injection (a potential security risk), especially if a user is supplying a string value into the query from an interface.

     

     

    Summary


    The projection of an SQL statement must be known by the SQL engine before any data is fetched, so don't expect SQL to magically create columns on-the-fly based on the data it's retrieving back; and, if you find yourself thinking of using dynamic SQL to get around it, just take a step back and see if what you are trying to achieve may be better done elsewhere, such as in a reporting tool or the user interface.  Generally, it's not good to consider writing Dynamic queries.