Forum Stats

  • 3,815,606 Users
  • 2,259,059 Discussions
  • 7,893,185 Comments

Discussions

LISTAGG with DISTINCT option

124

Comments

  • FatMartinR
    FatMartinR Member Posts: 190 Blue Ribbon

    Agreed, Sort also should be there within the LISTAGG function and then only this can be used effectively ?

    Sort is there by default.       WITHIN GROUP (ORDER BY ....)

    ApexBinejaramill
  • Dinidu Hewage
    Dinidu Hewage Member Posts: 11 Blue Ribbon

    My suggestion is to introduce a function called

    Distinct()

    So then it could be used in other places also, such as varrays, table type variables......

  • William Robertson
    William Robertson Member Posts: 9,567 Bronze Crown

    My suggestion is to introduce a function called

    Distinct()

    So then it could be used in other places also, such as varrays, table type variables......

    Sounds a bit like SET.

  • Hi,

    another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

    select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
    from sys.all_identifiers
    where owner     = 'ME'
    and object_name = 'PKG_EXAMPLE'
    and object_type = 'PACKAGE BODY'
    group by name, signature, type
    

    The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

    with testdata as (
        select 'a' as original from dual union all
        select 'a,b' as original from dual union all
        select 'a,b,c' as original from dual union all
        select 'a,b,b' as original from dual union all
        select 'a,b,b,b' as original from dual union all
        select 'a,b,b,b,b' as original from dual union all
        select 'a,b,b,b,b,c' as original from dual union all
        select 'a,b,b,b,b,c,c' as original from dual
    ) 
    select 
        original,
         regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
    from testdata
    

    Bjoern

    hi Bjoern

    I've used your regexp proposal for my own needs and have changed it slightly to cover other use cases (related to substring match). I use now '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4'. I'm providing new test cases below, which are managed correctly by the new version:

    with testdata as (

        select 'a,' as original from dual union all

        select 'a,b' as original from dual union all

        select 'a,b,c' as original from dual union all

        select 'a,b,b' as original from dual union all

        select 'a,b,b,b' as original from dual union all

        select 'a,b,b,b,b' as original from dual union all

        select 'a,b,b,b,b,c' as original from dual union all

        select 'a,b,b,b,b,c,c' as original from dual union all

        select 'a,ab' as original from dual union all

        select 'a,b,bc,c' as original from dual union all

        select 'ab,b' as original from dual

    )

    select

        original,

        regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped1,

        regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2

        

    from testdata;

  • Stew Ashton
    Stew Ashton Member Posts: 2,894 Bronze Crown

    hi Bjoern

    I've used your regexp proposal for my own needs and have changed it slightly to cover other use cases (related to substring match). I use now '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4'. I'm providing new test cases below, which are managed correctly by the new version:

    with testdata as (

        select 'a,' as original from dual union all

        select 'a,b' as original from dual union all

        select 'a,b,c' as original from dual union all

        select 'a,b,b' as original from dual union all

        select 'a,b,b,b' as original from dual union all

        select 'a,b,b,b,b' as original from dual union all

        select 'a,b,b,b,b,c' as original from dual union all

        select 'a,b,b,b,b,c,c' as original from dual union all

        select 'a,ab' as original from dual union all

        select 'a,b,bc,c' as original from dual union all

        select 'ab,b' as original from dual

    )

    select

        original,

        regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped1,

        regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2

        

    from testdata;

    Hello,

    I think there is a bug in your regexp_replace. I find it easier to put a comma in front of the source, then remove it after the rexexp_replace. This will make more sense when I address some of the other requirements.

    with testdata as (    select 'a,a,b,b' as original from dual)select    original,    regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2,    substr(regexp_replace(','||original, '(,[^,]+)(\1)+', '\1'),2) as dedupped3    from testdata;
    ORIGINALDEDUPPED2DEDUPPED3
    a,a,b,ba,b,ba,b

    Best regards,

    Stew

  • Stew Ashton
    Stew Ashton Member Posts: 2,894 Bronze Crown
    edited Jun 2, 2018 6:38AM

    Having read everything up to now, I have several comments:

    1. I am for DISTINCT
    2. The SQL standard has already stated where the DISTINCT should go: LISTAGG(DISTINCT ...)
    3. I am for LISTAGG being able to return a CLOB.
      I don't know if the SQL standard mandates syntax for this, but there are already other functions that specify the returning datatype using AS.
    4. Below is my best effort at a workaround that provides the equivalent of LISTAGG(DISTINCT ...) AS CLOB or AS VARCHAR2.
    select trim(',' from  regexp_replace(    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB),    '(,[^,]*)(\1)+', '\1'  )) deduppedfrom sys.all_identifiers;DEDUPPED------------------------------------------------ASSIGNMENT,CALL,DECLARATION,DEFINITION,REFERENCE

    This will return a CLOB, but it can be CAST as a VARCHAR2. We can even return a VARCHAR2 from XMLCAST if the data allows it.

    The TRIM at the end is to remove a trailing comma if there is one or more NULL values in USAGE.

    Best regards,

    Stew

    ApexBine
  • Stew Ashton
    Stew Ashton Member Posts: 2,894 Bronze Crown
    edited Jun 2, 2018 6:36AM

    And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.

    select  min(leaders) as leaders 
           ,min(members) as members 
    from (
      select is_leader, membername
            ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) 
                over (partition by is_leader) 
                as leaders
            ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) 
                over (partition by is_leader) 
                as members
      from projects p
      left join teams t on t.project_id = p.project_id
      group by is_leader, membername
      );
    

    However we lost the project count on the way...

    Sven,

    Applying my alternative to your problem:

    with projects as (  select 1 project_id, 'IoT4.0' project_name from dual union all  select 2 project_id, 'VeryBigData' project_name from dual union all  select 3 project_id, 'Cloudify' project_name from dual),teams as (  select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all  select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all  select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all  select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all  select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all  select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all  select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all  select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all  select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all  select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all  select 11 team_id, 3 project_id, null membername, 'N' is_leader from dual)select count(distinct p.project_id) project_count,trim(',' from  regexp_replace(    xmlcast(xmlagg(        case when is_leader = 'Y' then xmlelement(E, ','||membername) end        order by membername      )as varchar2(4000)),    '(,[^,]+)(\1)+', '\1'  )) leaders,trim(',' from  regexp_replace(    xmlcast(xmlagg(        case when is_leader = 'N' then xmlelement(E, ','||membername) end        order by membername      ) as varchar2(4000)),    '(,[^,]*)(\1)+', '\1'  )) membersfrom projects pleft join teams t on t.project_id = p.project_id;

    Best regards,

    Stew

    P.S. Why join projects and teams?

    Sven W.
  • dirkvanhaute
    dirkvanhaute Member Posts: 27 Bronze Badge

    Having read everything up to now, I have several comments:

    1. I am for DISTINCT
    2. The SQL standard has already stated where the DISTINCT should go: LISTAGG(DISTINCT ...)
    3. I am for LISTAGG being able to return a CLOB.
      I don't know if the SQL standard mandates syntax for this, but there are already other functions that specify the returning datatype using AS.
    4. Below is my best effort at a workaround that provides the equivalent of LISTAGG(DISTINCT ...) AS CLOB or AS VARCHAR2.
    select trim(',' from  regexp_replace(    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB),    '(,[^,]*)(\1)+', '\1'  )) deduppedfrom sys.all_identifiers;DEDUPPED------------------------------------------------ASSIGNMENT,CALL,DECLARATION,DEFINITION,REFERENCE

    This will return a CLOB, but it can be CAST as a VARCHAR2. We can even return a VARCHAR2 from XMLCAST if the data allows it.

    The TRIM at the end is to remove a trailing comma if there is one or more NULL values in USAGE.

    Best regards,

    Stew

    Hi Stew

    Nice solution by using xmlagg, but why do you prefer regexp_replace over the row_number solution ?

    I do not notice much difference in execution time, but I would guess deduplicating before creating the text would be more efficient.

    Moreover you must be sure that usage does never contain spaces.

    with t as ( 
       select decode( row_number() over (partition by usage order by 1) ,1,usage,null) as usage   from sys.all_identifiers)select trim(',' from    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB))as deduppedfrom t;

    Regards,
    Dirk

    Stew Ashton