Forum Stats

  • 3,759,958 Users
  • 2,251,621 Discussions
  • 7,870,883 Comments

Discussions

Count nested table column in SQL

JackK
JackK Member Posts: 678 Bronze Badge
edited Sep 22, 2021 5:21AM in SQL & PL/SQL

Hi All,

I am using Oracle 19c.

create or replace type t_numbers is table of number;

create table test_tab (
  id    number(22),
  nums  t_numbers
) nested table nums store as nt_tt_nums;

INSERT INTO test_tab VALUES (1, t_numbers(1,2,3,4));
INSERT INTO test_tab VALUES (2, t_numbers(1,2,3,4,5));
INSERT INTO test_tab VALUES (3, t_numbers(1,2,3,4,5,6));

How may I count elements in nums column?

Below statement does not work because of ORA-00904: "NUMS"."COUNT": invalid identifier

SELECT id, nums.count AS has_items
 FROM test_tab;

Best regards,

Jacek


Cleaning:

drop table test_tab;
drop type t_numbers;

Best Answers

  • mathguy
    mathguy Member Posts: 10,086 Blue Diamond
    Accepted Answer
    select t.id, n.num_count
    from   test_tab t outer apply (select count(*) as num_count from table(nums)) n;
    
            ID  NUM_COUNT
    ---------- ----------
             1          4
             2          5
             3          6
    
  • Paulzip
    Paulzip Member Posts: 8,460 Blue Diamond
    edited Sep 22, 2021 2:00PM Accepted Answer

    I'm late to the party, but much easier and I suspect quicker to just use the CARDINALITY function

    SELECT id, cardinality(nums) AS has_items
    FROM test_tab;
    
            ID  HAS_ITEMS
    ---------- ----------
             1          4
             2          5
             3          6
    
    
    3 rows selected.
    
    BluShadow
«13

Answers

  • mathguy
    mathguy Member Posts: 10,086 Blue Diamond
    Accepted Answer
    select t.id, n.num_count
    from   test_tab t outer apply (select count(*) as num_count from table(nums)) n;
    
            ID  NUM_COUNT
    ---------- ----------
             1          4
             2          5
             3          6
    
  • mathguy
    mathguy Member Posts: 10,086 Blue Diamond

    Note - The query, as written, will return 0 if the nested table value is null, and if it is not, it will return a count of all members (elements) of the table, even those that are null.

    If you desire different handling of null in these situations, you need to explain what that is.

    For concreteness, consider these additional rows in the table - discuss what should be returned in each case.

    insert into test_tab values (4, t_numbers());
    insert into test_tab values (5, null);
    insert into test_tab values (6, t_numbers(1,null,3));
    

    The solution I gave above will return 0, 0, 3 for these rows. It is possible that you need 0, null, 2 instead. (The count for a table with zero elements almost surely should be 0; but if the nested table is null perhaps you need to return null in the output rather than 0, and in the last case you may want to return 2 rather than 3 - counting the non-null values.) Any combinations of requirements can be accommodated easily along the lines of the solution in my previous reply, you just need to state the exact requirements.

    JackK
  • JackK
    JackK Member Posts: 678 Bronze Badge
  • BluShadow
    BluShadow Member, Moderator Posts: 41,383 Red Diamond

    Generally though, using nested columns just causes headaches. Personally, I'd have the data split out into a 1:M relationship on two tables. Then, accessing the data and counting it, would be trivial.

    JackK
  • Paulzip
    Paulzip Member Posts: 8,460 Blue Diamond
    edited Sep 22, 2021 2:00PM Accepted Answer

    I'm late to the party, but much easier and I suspect quicker to just use the CARDINALITY function

    SELECT id, cardinality(nums) AS has_items
    FROM test_tab;
    
            ID  HAS_ITEMS
    ---------- ----------
             1          4
             2          5
             3          6
    
    
    3 rows selected.
    
    BluShadow
  • mathguy
    mathguy Member Posts: 10,086 Blue Diamond

    @Paulzip

    Good contribution. I wasn't aware of that function.

    I just checked explain plans for the two queries. Curiously, on this trivial amount of data (including the three rows I added), the "cardinality" query has higher cost (4 vs 3); meaningless of course, but this shows that the plans are indeed different - one query is not transformed into the other.

    It's an interesting question - which may be faster. So I'll play with that a bit and report back here.

    Note that the "cardinality" approach is not as flexible; for example, if the OP wanted to count non-null values in the nested table, he wouldn't be able to do that with "cardinality", which just reports the number of rows in the nested table.

    Also unless I am mistaken the documentation is wrong. It says cardinality() returns null if the nested table is empty or null. It does return null when the table is null, but on the example of id=4 in the data I added, where the nested table is empty, cardinality returns 0, not null. (I say "documentation is wrong" because the behavior is actually meaningful - an empty table has a well-determined number of rows, namely 0. So "not a bug.")

  • mathguy
    mathguy Member Posts: 10,086 Blue Diamond
    edited Sep 22, 2021 5:36PM

    @Paulzip


    I ran the test described below. On my system, the solution I proposed (using the table operator and standard count(*) ) runs four times faster than the cardinality() solution (0.08 seconds on average vs. 0.33 seconds).

    Table data:

    truncate table test_tab;
    
    insert into test_tab (id, nums)
      with
        p (id) as (
          select level from dual connect by level <= 2000
        )
      select p.id, l.nums
      from   p cross join lateral (
                            select  cast(collect(level) as t_numbers) as nums
                            from    dual
                            connect by level <= p.id
                          ) l
    ;
    


    Queries used for testing:

    select sum(n.num_count)       total_card       -- AVG EXEC TIME: 0.08 s ON MY SYSTEM
    from   test_tab t outer apply (select count(*) as num_count from table(nums)) n;
    
    select sum(cardinality(nums)) total_card       -- AVG EXEC TIME: 0.33 s ON MY SYSTEM
    from   test_tab;
    
  • Paulzip
    Paulzip Member Posts: 8,460 Blue Diamond
    edited Sep 22, 2021 5:48PM

    @mathguy

    Interesting, on my home system (18c) on a new session, after flushing the usual stuff, using your data setup, cardinality was on average twice as fast as outer apply. Perhaps there's version specific optimisation at play.

    The docs look incorrect on Cardinality. Empty and null instance collections will return 0. I sometimes use cardinality to check pl/sql collections on whether to process or not. E.g. If you have a minimum number...

     -- snip
    begin
     -- snip
      if vCollection is not null and vCollection.Count > 5 then
        -- snip
      end if;
    end;
    

    Can be rewritten as...

     -- snip
    begin
     -- snip
      if cardinality(vCollection) > 5 then
        -- snip
      end if;
    end;
    

    As for the flexibility, that's a moot point. If OP wanted to count non nulls only I wouldn't have offered cardinality as a solution.

    JackK
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,756 Gold Crown

    @mathguy & @Paulzip

    The variation in timing is interesting. I would have expected the cardinality to be slower for the simple reason that although both plans shows "for each test_tab row execute index driven query into nested table", the cardinality() function accesses the table by rowid while the lateral() approach doesn't access the table. I can't think of any good reason why the cardinality() function does this, possibly it's just an example of Oracle using a generic code path that could eventually become a special case.

    Another variation on counting (which produces 2 zeros for the small data set thanks to the outer join) is:

    select id, count(n.nested_table_id) from test_tab tt outer apply tt.nums group n by id;
    

    This is slower than the lateral() solution for the small scale test, but significantly faster for the large scale test because the optimizer uses "group by placement" when the data set is large to aggregate the nested table before joining to the test tab.

    -----------------------------------------------------------------------------------------------
    | Id  | Operation           | Name                    | Starts | E-Rows | Cost (%CPU)| A-Rows |
    -----------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |                         |      1 |        |     3 (100)|      7 |
    |   1 |  HASH GROUP BY      |                         |      1 |      7 |     3  (34)|      7 |
    |   2 |   NESTED LOOPS OUTER|                         |      1 |     21 |     2   (0)|     21 |
    |   3 |    TABLE ACCESS FULL| TEST_TAB                |      1 |      7 |     2   (0)|      7 |
    |*  4 |    INDEX RANGE SCAN | SYS_FK0000074396N00002$ |      7 |      3 |     0   (0)|     19 |
    -----------------------------------------------------------------------------------------------
    
    
    ------------------------------------------------------------------------------------
    | Id  | Operation             | Name       | Starts | E-Rows | Cost (%CPU)| A-Rows |   
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT      |            |      1 |        |  1815 (100)|   2000 |
    |   1 |  HASH GROUP BY        |            |      1 |   2000 |  1815  (51)|   2000 |
    |*  2 |   HASH JOIN OUTER     |            |      1 |   2000 |  1814  (51)|   2000 |
    |   3 |    TABLE ACCESS FULL  | TEST_TAB   |      1 |   2000 |     3   (0)|   2000 |
    |   4 |    VIEW               | VW_GBC_5   |      1 |   2000 |  1810  (51)|   2000 |
    |   5 |     HASH GROUP BY     |            |      1 |   2000 |  1810  (51)|   2000 |
    |   6 |      TABLE ACCESS FULL| NT_TT_NUMS |      1 |   2001K|  1055  (15)|   2001K|
    ------------------------------------------------------------------------------------
    
    

    Regarding 18c vs. 19c - I couldn't spot any changes in (optmizer) strategy between the two, but it's always possible that there's some memory allocation or data pipeline mechanism that changes between the two.


    Regards

    Jonathan Lewis

    Paulzip
  • mathguy
    mathguy Member Posts: 10,086 Blue Diamond

    @Jonathan Lewis

    Versions: Mine is still 12.2.0.1. It would be exceptionally easy for me to change it (since I don't actually work in the field, this is just my sandbox at home), but I didn't find enough reasons to go through the hassle. By comparison, the changes from 11 to 12 were many and exceptionally relevant IMO.

    Timings: Not sure which is the "large scale test", nor how you measured the timings. I learned not to rely on estimated optimizer cost as an approximation; did you actually run the tests?

    If the "large scale test" is the one I posted (2000 rows, nested tables with counts 1, 2, 3, ... , 2000), on my system and using Oracle 12, the LATERAL/CROSS APPLY/OUTER APPLY solution (using SUM(...) just to simplify) takes 0.07 seconds on average (tested again this morning); the new solution you proposed (after fixing the syntax: the alias n should appear after the table it aliases) takes 0.14 seconds. That is half as fast, not "significantly faster" - at least on my machine and on my version of Oracle.

    = = = = = = = =

    The discussion made me think a bit about how Oracle implements nested tables (and perhaps other things), in PL/SQL and in SQL. Let's just look at nested tables for concreteness.

    In PL/SQL they come with methods, including COUNT(). I always assumed, but perhaps incorrectly, that - in the PL/SQL environment - nested tables are objects with some attributes besides just the raw data in the collection; for example the "count" (the same way in SQL it appears that Oracle stores the length of strings; when we call the LENGTH function, I believe the length is simply read from some place, not counted from scratch).

    If indeed in PL/SQL Oracle keeps track of "count" in a separate attribute, then calling CARDINALITY() on a nested table should be trivial - just get the value of that attribute.

    Since that doesn't seem to be the case in SQL, I can think of a few plausible explanations; I don't know which (if indeed any) of these is the correct one.

    Either my speculation is entirely wrong, and even in PL/SQL a call to COUNT() causes an actual counting operation. Or in fact there is a count attribute in PL/SQL, but a nested table as a column in a SQL table (unlike a nested table in PL/SQL) does not have that attribute. Or it does, but the SQL implementation of CARDINALITY() doesn't use it; this last reason would be dumb, but still possible.


    Cheers - mathguy