Forum Stats

  • 3,816,062 Users
  • 2,259,135 Discussions
  • 7,893,378 Comments

Discussions

LISTAGG with DISTINCT option

245

Comments

  • Sven W.
    Sven W. Member Posts: 10,534 Gold Crown

    You don't need two listagg functions, you just need to do a distinct on the underlying table(s). So instead of this:

    select deptno,listagg (sal,', ') within group (order by sal)
    from scott.emp
    group by deptno;
    

    do this

    select deptno,listagg (sal,', ') within group (order by sal)
    from (select distinct deptno,sal from scott.emp)
    group by deptno;
    

    An example:

    Here I have some test data for projects and teams.

    I want to have a list of all team leaders and a second list of all team members. "Paul" is a team leader in several teams, but I want him only to be listed once. This can not be easily solved useing the current logic. (it is possible). Bines suggestion would be simliar to the COUNT(distinct column) logic and would solve such scenarios very easily.

    with projects as (select 1 project_id, 'IoT4.0' project_name from dual union all
                      select 2 project_id, 'VeryBigData' project_name from dual union all
                      select 3 project_id, 'Cloudify' project_name from dual )
        ,teams as (select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all 
                   select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all 
                   select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all 
                   select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all 
                   select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all 
                   select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all 
                   select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all 
                   select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all 
                   select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all 
                   select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all 
                   select 11 team_id, 3 project_id, 'Garfunkel' membername, 'N' is_leader from dual 
                   )
    select count(distinct p.project_id) project_count
          ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) leaders
          ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) members
    from projects p
    left join teams t on t.project_id = p.project_id
    group by ();
    

    Result:

    PROJECT_COUNTLEADERSMEMBERS
    3John,Paul,Paul,PaulArt,Garfunkel,George,John,Ringo,Simon,Wayne
  • Sven W.
    Sven W. Member Posts: 10,534 Gold Crown

    An example:

    Here I have some test data for projects and teams.

    I want to have a list of all team leaders and a second list of all team members. "Paul" is a team leader in several teams, but I want him only to be listed once. This can not be easily solved useing the current logic. (it is possible). Bines suggestion would be simliar to the COUNT(distinct column) logic and would solve such scenarios very easily.

    with projects as (select 1 project_id, 'IoT4.0' project_name from dual union all
                      select 2 project_id, 'VeryBigData' project_name from dual union all
                      select 3 project_id, 'Cloudify' project_name from dual )
        ,teams as (select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all 
                   select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all 
                   select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all 
                   select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all 
                   select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all 
                   select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all 
                   select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all 
                   select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all 
                   select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all 
                   select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all 
                   select 11 team_id, 3 project_id, 'Garfunkel' membername, 'N' is_leader from dual 
                   )
    select count(distinct p.project_id) project_count
          ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) leaders
          ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) members
    from projects p
    left join teams t on t.project_id = p.project_id
    group by ();
    

    Result:

    PROJECT_COUNTLEADERSMEMBERS
    3John,Paul,Paul,PaulArt,Garfunkel,George,John,Ringo,Simon,Wayne

    And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.

    select  min(leaders) as leaders 
           ,min(members) as members 
    from (
      select is_leader, membername
            ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) 
                over (partition by is_leader) 
                as leaders
            ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) 
                over (partition by is_leader) 
                as members
      from projects p
      left join teams t on t.project_id = p.project_id
      group by is_leader, membername
      );
    

    However we lost the project count on the way...

    ApexBineMKJ10930279
  • NSK2KSN
    NSK2KSN Member Posts: 600 Bronze Badge

    This option will be useful and will help in avoiding additional inline views.

    ApexBineGbenga Ajakaye
  • ApexBine
    ApexBine Member Posts: 153 Silver Badge
    edited Oct 10, 2016 8:03AM

    And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.

    select  min(leaders) as leaders 
           ,min(members) as members 
    from (
      select is_leader, membername
            ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) 
                over (partition by is_leader) 
                as leaders
            ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) 
                over (partition by is_leader) 
                as members
      from projects p
      left join teams t on t.project_id = p.project_id
      group by is_leader, membername
      );
    

    However we lost the project count on the way...

    Thank you, Sven! I think this demonstrates nicely how much easier the suggested syntax (or something similar) would be

    Sven W.
  • gkb
    gkb Member Posts: 1

    Hi @Chris Hunt,

    this is fine as long as I want to concatenate only one column.

    I'm still looking for a solution for concatenating two columns (separately)

    or for using listagg with "over (partition by ...)"

    Hi,

    another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

    select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
    from sys.all_identifiers
    where owner     = 'ME'
    and object_name = 'PKG_EXAMPLE'
    and object_type = 'PACKAGE BODY'
    group by name, signature, type
    

    The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

    with testdata as (
        select 'a' as original from dual union all
        select 'a,b' as original from dual union all
        select 'a,b,c' as original from dual union all
        select 'a,b,b' as original from dual union all
        select 'a,b,b,b' as original from dual union all
        select 'a,b,b,b,b' as original from dual union all
        select 'a,b,b,b,b,c' as original from dual union all
        select 'a,b,b,b,b,c,c' as original from dual
    ) 
    select 
        original,
         regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
    from testdata
    

    Bjoern

    Sven W.MKJ10930279d074c3a1-ea3c-481e-a044-da95be9c1ef3
  • Sven W.
    Sven W. Member Posts: 10,534 Gold Crown

    Hi,

    another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

    select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
    from sys.all_identifiers
    where owner     = 'ME'
    and object_name = 'PKG_EXAMPLE'
    and object_type = 'PACKAGE BODY'
    group by name, signature, type
    

    The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

    with testdata as (
        select 'a' as original from dual union all
        select 'a,b' as original from dual union all
        select 'a,b,c' as original from dual union all
        select 'a,b,b' as original from dual union all
        select 'a,b,b,b' as original from dual union all
        select 'a,b,b,b,b' as original from dual union all
        select 'a,b,b,b,b,c' as original from dual union all
        select 'a,b,b,b,b,c,c' as original from dual
    ) 
    select 
        original,
         regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
    from testdata
    

    Bjoern

    Nice usage of a backreference there!

    And since listagg returnes ordered output it should work for all normal cases.

  • Scott Wesley
    Scott Wesley Member Posts: 6,187 Gold Crown

    Hi,

    another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

    select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
    from sys.all_identifiers
    where owner     = 'ME'
    and object_name = 'PKG_EXAMPLE'
    and object_type = 'PACKAGE BODY'
    group by name, signature, type
    

    The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

    with testdata as (
        select 'a' as original from dual union all
        select 'a,b' as original from dual union all
        select 'a,b,c' as original from dual union all
        select 'a,b,b' as original from dual union all
        select 'a,b,b,b' as original from dual union all
        select 'a,b,b,b,b' as original from dual union all
        select 'a,b,b,b,b,c' as original from dual union all
        select 'a,b,b,b,b,c,c' as original from dual
    ) 
    select 
        original,
         regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
    from testdata
    

    Bjoern

    And this only works for 1 character length data, right?

    I have this from my notes regarding distinct options with listagg

     rtrim(xmltype('<r><n>' || 
                   replace(REGEXP_REPLACE( comma_delimited_column, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
                  ).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')

    A native feature would be awesome. I use listagg a lot, and often want distinct output. I'd also like to handle more than 4000 characters...

    ApexBine
  • Sven W.
    Sven W. Member Posts: 10,534 Gold Crown

    And this only works for 1 character length data, right?

    I have this from my notes regarding distinct options with listagg

     rtrim(xmltype('<r><n>' || 
                   replace(REGEXP_REPLACE( comma_delimited_column, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
                  ).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')

    A native feature would be awesome. I use listagg a lot, and often want distinct output. I'd also like to handle more than 4000 characters...

    swesley_perth wrote:
    
    And this only works for 1 character length data, right?
    ...
    

    It depends. It can be adapted for multi character delimiters as well, but only as long as none of the characters is part of the main string.

  • FatMartinR
    FatMartinR Member Posts: 190 Blue Ribbon

    This would definitely make queries easier and avoid an additional step to subquery distinct values before doing the LISTAGG.

    But I'd propose a slightly different format for the function. Since we already have DISTINCT functionality in some aggregate functions like COUNT, that should be consistent with LISTAGG. So I think the syntax should ideally look like -

    LISTAGG ( [DISTINCT] <column_name>  [ , <delimiter> ] )  OVER ...

    This makes it clearer that the DISTINCT is applied on the column values.

    I think this is a great idea too.   Though I'd like the DISTINCT here:

          LISTAGG (<column_name>  [ , <delimiter> ] )  DISTINCT WITHIN GROUP.....       ;-)

    Although Yes, you can do a DISTINCT sub-query, what happens if you want to mix them in the same statement?    That's why this would be really useful.

    I wonder though, are there any other functions that would benefit from a similar change....?

    ApexBine