Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

LISTAGG with DISTINCT option

ApexBineApr 4 2016 — edited Jul 31 2019

I love the LISTAGG function which makes my life a lot easier.

And I would love it even more if it was able to omit duplicates!

So instead of writing

select name, signature, type, listagg(usage,', ') within group (order by usage) usages

from sys.all_identifiers

where owner     = 'ME'

and object_name = 'PKG_EXAMPLE'

and object_type = 'PACKAGE BODY'

group by name, signature, type

which would return something like

'DECLARATION, REFERENCE, REFERENCE, REFERENCE, REFERENCE, REFERENCE'

I would write

select name, signature, type, listagg(usage,', ' DISTINCT) within group (order by usage) usages

from sys.all_identifiers

where owner     = 'ME'

and object_name = 'PKG_EXAMPLE'

and object_type = 'PACKAGE BODY'

group by name, signature, type

and expect

'DECLARATION, REFERENCE'.

What do you think of it? Would it be a useful amendment?

This feature is coming with 19c:

SQL> select deptno, listagg (sal,', ') within group (order by sal)

  2  from scott.emp

  3  group by deptno;

    DEPTNO LISTAGG(SAL,',')WITHINGROUP(ORDERBYSAL)

---------- ---------------------------------------

        10 1300, 2450, 5000

        20 800, 1100, 2975, 3000, 3000

        30 950, 1250, 1250, 1500, 1600, 2850

SQL> select deptno, listagg (distinct sal,', ') within group (order by sal)

  2  from scott.emp

  3  group by deptno;

    DEPTNO LISTAGG(DISTINCTSAL,',')WITHINGROUP(ORDERBYSAL)

---------- -----------------------------------------------

        10 1300, 2450, 5000

        20 800, 1100, 2975, 3000

        30 950, 1250, 1500, 1600, 2850

Comments

MKJ10930279

NOT WORKING IN 11.2.0.3.

even the columns mentioned in WHERE clause is not present.

William Robertson

Perhaps that's because Oracle hasn't taken up AB's suggestion and backported it to 11.2.0.3 yet.

ApexBine

NOT WORKING IN 11.2.0.3.

even the columns mentioned in WHERE clause is not present.

MKJ, are we talking about the first statement? Actually, I would expect that to work in any 11.2 version.

Is the view sys.all_identifiers there at all?

Anyway, this example should work (and show some duplicates) if you have the HR schema:

select d.department_name,

  listagg(e.salary ,', ') within group (order by e.salary) salaries

from employees e

join departments d

on d.department_id = e.department_id

group by d.department_name

MKJ10930279

MKJ, are we talking about the first statement? Actually, I would expect that to work in any 11.2 version.

Is the view sys.all_identifiers there at all?

Anyway, this example should work (and show some duplicates) if you have the HR schema:

select d.department_name,

  listagg(e.salary ,', ') within group (order by e.salary) salaries

from employees e

join departments d

on d.department_id = e.department_id

group by d.department_name

Hi,

The second statement is throwing error, on DISTINCT, ORA-00907; missing right parenthesis.

Structure is fine.

ApexBine

Hi,

The second statement is throwing error, on DISTINCT, ORA-00907; missing right parenthesis.

Structure is fine.

Well, the second example is just a possble syntax for my suggestion. It won't work in a current database.

MKJ10930279

Well, the second example is just a possble syntax for my suggestion. It won't work in a current database.

If that is the case, then its a good idea.

Rafiq D

Somehow, I think you can use an outer select using the WITH clause to group the result before using the listagg. This will do the trick

Sven W.

Somehow, I think you can use an outer select using the WITH clause to group the result before using the listagg. This will do the trick

Yes in general it is possible to do such a distinct by stacking up two listagg functions. But it is clumsy to develop and not so easy to implement when more then 1 column needs to be aggregated.

Chris Hunt

Yes in general it is possible to do such a distinct by stacking up two listagg functions. But it is clumsy to develop and not so easy to implement when more then 1 column needs to be aggregated.

You don't need two listagg functions, you just need to do a distinct on the underlying table(s). So instead of this:

select deptno,listagg (sal,', ') within group (order by sal)

from scott.emp

group by deptno;

do this

select deptno,listagg (sal,', ') within group (order by sal)

from (select distinct deptno,sal from scott.emp)

group by deptno;

ApexBine

You don't need two listagg functions, you just need to do a distinct on the underlying table(s). So instead of this:

select deptno,listagg (sal,', ') within group (order by sal)

from scott.emp

group by deptno;

do this

select deptno,listagg (sal,', ') within group (order by sal)

from (select distinct deptno,sal from scott.emp)

group by deptno;

Hi @"Chris Hunt",

this is fine as long as I want to concatenate only one column.

I'm still looking for a solution for concatenating two columns (separately)

or for using listagg with "over (partition by ...)"

Sven W.

You don't need two listagg functions, you just need to do a distinct on the underlying table(s). So instead of this:

select deptno,listagg (sal,', ') within group (order by sal)

from scott.emp

group by deptno;

do this

select deptno,listagg (sal,', ') within group (order by sal)

from (select distinct deptno,sal from scott.emp)

group by deptno;

An example:

Here I have some test data for projects and teams.

I want to have a list of all team leaders and a second list of all team members. "Paul" is a team leader in several teams, but I want him only to be listed once. This can not be easily solved useing the current logic. (it is possible). Bines suggestion would be simliar to the COUNT(distinct column) logic and would solve such scenarios very easily.

with projects as (select 1 project_id, 'IoT4.0' project_name from dual union all

                  select 2 project_id, 'VeryBigData' project_name from dual union all

                  select 3 project_id, 'Cloudify' project_name from dual )

    ,teams as (select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all

               select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all

               select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all

               select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all

               select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all

               select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all

               select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all

               select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all

               select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all

               select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all

               select 11 team_id, 3 project_id, 'Garfunkel' membername, 'N' is_leader from dual

               )

select count(distinct p.project_id) project_count

      ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) leaders

      ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) members

from projects p

left join teams t on t.project_id = p.project_id

group by ();

Result:

PROJECT_COUNTLEADERSMEMBERS
3John,Paul,Paul,PaulArt,Garfunkel,George,John,Ringo,Simon,Wayne
Sven W.

An example:

Here I have some test data for projects and teams.

I want to have a list of all team leaders and a second list of all team members. "Paul" is a team leader in several teams, but I want him only to be listed once. This can not be easily solved useing the current logic. (it is possible). Bines suggestion would be simliar to the COUNT(distinct column) logic and would solve such scenarios very easily.

with projects as (select 1 project_id, 'IoT4.0' project_name from dual union all

                  select 2 project_id, 'VeryBigData' project_name from dual union all

                  select 3 project_id, 'Cloudify' project_name from dual )

    ,teams as (select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all

               select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all

               select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all

               select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all

               select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all

               select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all

               select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all

               select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all

               select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all

               select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all

               select 11 team_id, 3 project_id, 'Garfunkel' membername, 'N' is_leader from dual

               )

select count(distinct p.project_id) project_count

      ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) leaders

      ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) members

from projects p

left join teams t on t.project_id = p.project_id

group by ();

Result:

PROJECT_COUNTLEADERSMEMBERS
3John,Paul,Paul,PaulArt,Garfunkel,George,John,Ringo,Simon,Wayne

And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.

select  min(leaders) as leaders

       ,min(members) as members

from (

  select is_leader, membername

        ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername)

            over (partition by is_leader)

            as leaders

        ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername)

            over (partition by is_leader)

            as members

  from projects p

  left join teams t on t.project_id = p.project_id

  group by is_leader, membername

  );

However we lost the project count on the way...

NSK2KSN

This option will be useful and will help in avoiding additional inline views.

ApexBine

And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.

select  min(leaders) as leaders

       ,min(members) as members

from (

  select is_leader, membername

        ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername)

            over (partition by is_leader)

            as leaders

        ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername)

            over (partition by is_leader)

            as members

  from projects p

  left join teams t on t.project_id = p.project_id

  group by is_leader, membername

  );

However we lost the project count on the way...

Thank you, Sven! I think this demonstrates nicely how much easier the suggested syntax (or something similar) would be

gkb

Hi @"Chris Hunt",

this is fine as long as I want to concatenate only one column.

I'm still looking for a solution for concatenating two columns (separately)

or for using listagg with "over (partition by ...)"

Hi,

another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
from sys.all_identifiers
where owner     = 'ME'
and object_name = 'PKG_EXAMPLE'
and object_type = 'PACKAGE BODY'
group by name, signature, type

The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

with testdata as (
    select 'a' as original from dual union all
    select 'a,b' as original from dual union all
    select 'a,b,c' as original from dual union all
    select 'a,b,b' as original from dual union all
    select 'a,b,b,b' as original from dual union all
    select 'a,b,b,b,b' as original from dual union all
    select 'a,b,b,b,b,c' as original from dual union all
    select 'a,b,b,b,b,c,c' as original from dual
) 
select 
    original,
     regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
from testdata

Bjoern

Sven W.

Hi,

another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
from sys.all_identifiers
where owner     = 'ME'
and object_name = 'PKG_EXAMPLE'
and object_type = 'PACKAGE BODY'
group by name, signature, type

The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

with testdata as (
    select 'a' as original from dual union all
    select 'a,b' as original from dual union all
    select 'a,b,c' as original from dual union all
    select 'a,b,b' as original from dual union all
    select 'a,b,b,b' as original from dual union all
    select 'a,b,b,b,b' as original from dual union all
    select 'a,b,b,b,b,c' as original from dual union all
    select 'a,b,b,b,b,c,c' as original from dual
) 
select 
    original,
     regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
from testdata

Bjoern

Nice usage of a backreference there!

And since listagg returnes ordered output it should work for all normal cases.

Scott Wesley

Hi,

another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
from sys.all_identifiers
where owner     = 'ME'
and object_name = 'PKG_EXAMPLE'
and object_type = 'PACKAGE BODY'
group by name, signature, type

The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

with testdata as (
    select 'a' as original from dual union all
    select 'a,b' as original from dual union all
    select 'a,b,c' as original from dual union all
    select 'a,b,b' as original from dual union all
    select 'a,b,b,b' as original from dual union all
    select 'a,b,b,b,b' as original from dual union all
    select 'a,b,b,b,b,c' as original from dual union all
    select 'a,b,b,b,b,c,c' as original from dual
) 
select 
    original,
     regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
from testdata

Bjoern

And this only works for 1 character length data, right?

I have this from my notes regarding distinct options with listagg

 rtrim(xmltype('<r><n>' || 
               replace(REGEXP_REPLACE( comma_delimited_column, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
              ).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')

A native feature would be awesome. I use listagg a lot, and often want distinct output. I'd also like to handle more than 4000 characters...

Sven W.

And this only works for 1 character length data, right?

I have this from my notes regarding distinct options with listagg

 rtrim(xmltype('<r><n>' || 
               replace(REGEXP_REPLACE( comma_delimited_column, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
              ).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')

A native feature would be awesome. I use listagg a lot, and often want distinct output. I'd also like to handle more than 4000 characters...

swesley_perth wrote:

And this only works for 1 character length data, right?

...

It depends. It can be adapted for multi character delimiters as well, but only as long as none of the characters is part of the main string.

Abhinav B.

This would definitely make queries easier and avoid an additional step to subquery distinct values before doing the LISTAGG.

But I'd propose a slightly different format for the function. Since we already have DISTINCT functionality in some aggregate functions like COUNT, that should be consistent with LISTAGG. So I think the syntax should ideally look like -

LISTAGG ( [DISTINCT] <column_name>  [ , <delimiter> ] )  OVER ...

This makes it clearer that the DISTINCT is applied on the column values.

FatMartinR

This would definitely make queries easier and avoid an additional step to subquery distinct values before doing the LISTAGG.

But I'd propose a slightly different format for the function. Since we already have DISTINCT functionality in some aggregate functions like COUNT, that should be consistent with LISTAGG. So I think the syntax should ideally look like -

LISTAGG ( [DISTINCT] <column_name>  [ , <delimiter> ] )  OVER ...

This makes it clearer that the DISTINCT is applied on the column values.

I think this is a great idea too.   Though I'd like the DISTINCT here:

      LISTAGG (<column_name>  [ , <delimiter> ] )  DISTINCT WITHIN GROUP.....       ;-)

Although Yes, you can do a DISTINCT sub-query, what happens if you want to mix them in the same statement?    That's why this would be really useful.

I wonder though, are there any other functions that would benefit from a similar change....?

Gbenga Ajakaye

This would be useful. I can already thing of different ways that I can use this. + for me.

jaramill

Oh wow....I thought I was the only one that ran into this issue.  Yes using CTE (common-table-expressions) via sub-query factoring (aka WITH clause) would help solve the OP's problem but having the option added to the grammar of the LISTAGG function would help IMMENSELY.


Once I've found out about this future I use it when I need to concatenate a list of values comma-delimited.

There is another method or sister function of the XML variety (XMLAGG(XMLELEMENT .....

I definitely vote for this.

Nikolaus Thiel

I would love to see listagg(distinct,_,_) to be implemented in the next release.

Please note that first using listagg and then regexp_replace works for small data sets only, because listagg returns VARCHAR2 which allows 4000 characters only.

Here an example:

SQL> select * from v$version;BANNER--------------------------------------------------------------------------------Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionPL/SQL Release 11.2.0.4.0 - ProductionCORE    11.2.0.4.0      ProductionTNS for Linux: Version 11.2.0.4.0 - ProductionNLSRTL Version 11.2.0.4.0 - ProductionSQL> with data as  (select mod(level,2) ID from dual connect by level <10),data_list as (select listagg(ID,',') within group (order by ID) ID_LIST from data)select regexp_replace(ID_LIST,'([^,]+)(,\1)+', '\1') from data_list; REGEXP_REPLACE(ID_LIST,'([^,]+)(,\1)+','\1')--------------------------------------------0,1SQL> with data as  (select mod(level,2) ID from dual connect by level <10000),data_list as (select listagg(ID,',') within group (order by ID) ID_LIST from data)select regexp_replace(ID_LIST,'([^,]+)(,\1)+', '\1') from data_list;ERROR at line 1: ORA-01489: result of string concatenation is too long
Sven W.

I would love to see listagg(distinct,_,_) to be implemented in the next release.

Please note that first using listagg and then regexp_replace works for small data sets only, because listagg returns VARCHAR2 which allows 4000 characters only.

Here an example:

SQL> select * from v$version;BANNER--------------------------------------------------------------------------------Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionPL/SQL Release 11.2.0.4.0 - ProductionCORE    11.2.0.4.0      ProductionTNS for Linux: Version 11.2.0.4.0 - ProductionNLSRTL Version 11.2.0.4.0 - ProductionSQL> with data as  (select mod(level,2) ID from dual connect by level <10),data_list as (select listagg(ID,',') within group (order by ID) ID_LIST from data)select regexp_replace(ID_LIST,'([^,]+)(,\1)+', '\1') from data_list; REGEXP_REPLACE(ID_LIST,'([^,]+)(,\1)+','\1')--------------------------------------------0,1SQL> with data as  (select mod(level,2) ID from dual connect by level <10000),data_list as (select listagg(ID,',') within group (order by ID) ID_LIST from data)select regexp_replace(ID_LIST,'([^,]+)(,\1)+', '\1') from data_list;ERROR at line 1: ORA-01489: result of string concatenation is too long

Please note that in 12.2 listagg was enhanced slightly. It still only supports up to 4000 chars, however you can now specify what should happen when the list gets to long.

From the doc: Oracle Database 12.2 New Features

"Enhanced LISTAGG Functionality

LISTAGG aggregates the values of a column by concatenating them into a single string. New functionality has been added for managing situations where the length of the concatenated string is too long.

Developers can now control the process for managing overflowing LISTAGG aggregates. This increases the productivity and flexibility of this aggregation function."

The new syntax looks like this:

ON OVERFLOW TRUNCATE WITH COUNT

Nikolaus Thiel

Please note that in 12.2 listagg was enhanced slightly. It still only supports up to 4000 chars, however you can now specify what should happen when the list gets to long.

From the doc: Oracle Database 12.2 New Features

"Enhanced LISTAGG Functionality

LISTAGG aggregates the values of a column by concatenating them into a single string. New functionality has been added for managing situations where the length of the concatenated string is too long.

Developers can now control the process for managing overflowing LISTAGG aggregates. This increases the productivity and flexibility of this aggregation function."

The new syntax looks like this:

ON OVERFLOW TRUNCATE WITH COUNT

Sven W. wrote:

The new syntax looks like this:

ON OVERFLOW TRUNCATE WITH COUNT

Truncating and thus modifying the result isntead of an error sounds scary. Though there might be occasion where this could be usefull.

Better would be to not add duplicates to the output (DISTINCT) and to increase the varchar2 length to varchar2(32767) as in PL/SQL.

Raj Jamadagni

Sven W. wrote:

The new syntax looks like this:

ON OVERFLOW TRUNCATE WITH COUNT

Truncating and thus modifying the result isntead of an error sounds scary. Though there might be occasion where this could be usefull.

Better would be to not add duplicates to the output (DISTINCT) and to increase the varchar2 length to varchar2(32767) as in PL/SQL.

If it needs to be extended to varchar2(32767) why not provide an overloaded function to return a CLOB? I'd surely vote that that.

User_5AM5X

Here's a much better solution all-round.      Right-click and 'Translate to English'.

"Oracle SQL und PL/SQL" von Carsten Czarski: SQL LISTAGG mit CLOB-Ausgabe - kein VARCHAR2-Limit mehr

Raj Jamadagni

Here's a much better solution all-round.      Right-click and 'Translate to English'.

"Oracle SQL und PL/SQL" von Carsten Czarski: SQL LISTAGG mit CLOB-Ausgabe - kein VARCHAR2-Limit mehr

That is great, thank you for the mention, bookmarked and I'll test it. But as you can see it occasionally gives errors, so all that might just go away if this becomes a built-in solution.

ApexBine

Sven W. wrote:

The new syntax looks like this:

ON OVERFLOW TRUNCATE WITH COUNT

Truncating and thus modifying the result isntead of an error sounds scary. Though there might be occasion where this could be usefull.

Better would be to not add duplicates to the output (DISTINCT) and to increase the varchar2 length to varchar2(32767) as in PL/SQL.

As always: 'it depends'.

I can see use cases for all three: truncate, distinct and increase the length of a possible result

NSK2KSN

This would definitely make queries easier and avoid an additional step to subquery distinct values before doing the LISTAGG.

But I'd propose a slightly different format for the function. Since we already have DISTINCT functionality in some aggregate functions like COUNT, that should be consistent with LISTAGG. So I think the syntax should ideally look like -

LISTAGG ( [DISTINCT] <column_name>  [ , <delimiter> ] )  OVER ...

This makes it clearer that the DISTINCT is applied on the column values.

Agreed, Sort also should be there within the LISTAGG function and then only this can be used effectively ?

FatMartinR

Agreed, Sort also should be there within the LISTAGG function and then only this can be used effectively ?

Sort is there by default.       WITHIN GROUP (ORDER BY ....)

Dinidu Hewage

My suggestion is to introduce a function called

Distinct()

So then it could be used in other places also, such as varrays, table type variables......

William Robertson

My suggestion is to introduce a function called

Distinct()

So then it could be used in other places also, such as varrays, table type variables......

Sounds a bit like SET.

user10366751

Hi,

another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:

select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages
from sys.all_identifiers
where owner     = 'ME'
and object_name = 'PKG_EXAMPLE'
and object_type = 'PACKAGE BODY'
group by name, signature, type

The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.

with testdata as (
    select 'a' as original from dual union all
    select 'a,b' as original from dual union all
    select 'a,b,c' as original from dual union all
    select 'a,b,b' as original from dual union all
    select 'a,b,b,b' as original from dual union all
    select 'a,b,b,b,b' as original from dual union all
    select 'a,b,b,b,b,c' as original from dual union all
    select 'a,b,b,b,b,c,c' as original from dual
) 
select 
    original,
     regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped
from testdata

Bjoern

hi Bjoern

I've used your regexp proposal for my own needs and have changed it slightly to cover other use cases (related to substring match). I use now '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4'. I'm providing new test cases below, which are managed correctly by the new version:

with testdata as (

    select 'a,' as original from dual union all

    select 'a,b' as original from dual union all

    select 'a,b,c' as original from dual union all

    select 'a,b,b' as original from dual union all

    select 'a,b,b,b' as original from dual union all

    select 'a,b,b,b,b' as original from dual union all

    select 'a,b,b,b,b,c' as original from dual union all

    select 'a,b,b,b,b,c,c' as original from dual union all

    select 'a,ab' as original from dual union all

    select 'a,b,bc,c' as original from dual union all

    select 'ab,b' as original from dual

)

select

    original,

    regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped1,

    regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2

from testdata;

Massimiliano Palese

I implemented this stored function:

CREATE TYPE LISTAGG_DISTINCT_PARAMS AS OBJECT (ELEMENTO VARCHAR2(2000), SEPARATORE VARCHAR2(10));
CREATE TYPE T_LISTA_ELEMENTI AS TABLE OF VARCHAR2(2000);
CREATE TYPE T_LISTAGG_DISTINCT AS OBJECT (

  LISTA_ELEMENTI T_LISTA_ELEMENTI
,
  SEPARATORE VARCHAR2
(10),

  STATIC
FUNCTION ODCIAGGREGATEINITIALIZE(SCTX  IN OUT T_LISTAGG_DISTINCT)
  
RETURN NUMBER,

  MEMBER
FUNCTION ODCIAGGREGATEITERATE  (SELF  IN OUT T_LISTAGG_DISTINCT,
  VALUE
IN  LISTAGG_DISTINCT_PARAMS )
  
RETURN NUMBER,

  MEMBER
FUNCTION ODCIAGGREGATETERMINATE (SELF  IN  T_LISTAGG_DISTINCT,
  RETURN_VALUE OUT VARCHAR2
,
  FLAGS 
IN  NUMBER  )
  
RETURN NUMBER,

  MEMBER
FUNCTION ODCIAGGREGATEMERGE  (SELF  IN OUT T_LISTAGG_DISTINCT,
  CTX2 
IN  T_LISTAGG_DISTINCT  )
  
RETURN NUMBER
);
CREATE OR REPLACE TYPE BODY T_LISTAGG_DISTINCT IS 

  STATIC
FUNCTION ODCIAGGREGATEINITIALIZE(SCTX IN OUT T_LISTAGG_DISTINCT) RETURN NUMBER IS
  
BEGIN
  SCTX
:= T_LISTAGG_DISTINCT(T_LISTA_ELEMENTI() , ',');
  
RETURN ODCICONST.SUCCESS;
  
END;

  MEMBER
FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT T_LISTAGG_DISTINCT, VALUE IN LISTAGG_DISTINCT_PARAMS) RETURN NUMBER IS
  
BEGIN

  
IF VALUE.ELEMENTO IS NOT NULL THEN
  SELF
.LISTA_ELEMENTI.EXTEND;
  SELF
.LISTA_ELEMENTI(SELF.LISTA_ELEMENTI.LAST) := TO_CHAR(VALUE.ELEMENTO);
  SELF
.LISTA_ELEMENTI:= SELF.LISTA_ELEMENTI MULTISET UNION DISTINCT SELF
1 person found this helpful
Solomon Yakobson

I implemented this stored function:

CREATE TYPE LISTAGG_DISTINCT_PARAMS AS OBJECT (ELEMENTO VARCHAR2(2000), SEPARATORE VARCHAR2(10));
CREATE TYPE T_LISTA_ELEMENTI AS TABLE OF VARCHAR2(2000);
CREATE TYPE T_LISTAGG_DISTINCT AS OBJECT (

  LISTA_ELEMENTI T_LISTA_ELEMENTI
,
  SEPARATORE VARCHAR2
(10),

  STATIC
FUNCTION ODCIAGGREGATEINITIALIZE(SCTX  IN OUT T_LISTAGG_DISTINCT)
  
RETURN NUMBER,

  MEMBER
FUNCTION ODCIAGGREGATEITERATE  (SELF  IN OUT T_LISTAGG_DISTINCT,
  VALUE
IN  LISTAGG_DISTINCT_PARAMS )
  
RETURN NUMBER,

  MEMBER
FUNCTION ODCIAGGREGATETERMINATE (SELF  IN  T_LISTAGG_DISTINCT,
  RETURN_VALUE OUT VARCHAR2
,
  FLAGS 
IN  NUMBER  )
  
RETURN NUMBER,

  MEMBER
FUNCTION ODCIAGGREGATEMERGE  (SELF  IN OUT T_LISTAGG_DISTINCT,
  CTX2 
IN  T_LISTAGG_DISTINCT  )
  
RETURN NUMBER
);
CREATE OR REPLACE TYPE BODY T_LISTAGG_DISTINCT IS 

  STATIC
FUNCTION ODCIAGGREGATEINITIALIZE(SCTX IN OUT T_LISTAGG_DISTINCT) RETURN NUMBER IS
  
BEGIN
  SCTX
:= T_LISTAGG_DISTINCT(T_LISTA_ELEMENTI() , ',');
  
RETURN ODCICONST.SUCCESS;
  
END;

  MEMBER
FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT T_LISTAGG_DISTINCT, VALUE IN LISTAGG_DISTINCT_PARAMS) RETURN NUMBER IS
  
BEGIN

  
IF VALUE.ELEMENTO IS NOT NULL THEN
  SELF
.LISTA_ELEMENTI.EXTEND;
  SELF
.LISTA_ELEMENTI(SELF.LISTA_ELEMENTI.LAST) := TO_CHAR(VALUE.ELEMENTO);
  SELF
.LISTA_ELEMENTI:= SELF.LISTA_ELEMENTI MULTISET UNION DISTINCT SELF
Stew Ashton

hi Bjoern

I've used your regexp proposal for my own needs and have changed it slightly to cover other use cases (related to substring match). I use now '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4'. I'm providing new test cases below, which are managed correctly by the new version:

with testdata as (

    select 'a,' as original from dual union all

    select 'a,b' as original from dual union all

    select 'a,b,c' as original from dual union all

    select 'a,b,b' as original from dual union all

    select 'a,b,b,b' as original from dual union all

    select 'a,b,b,b,b' as original from dual union all

    select 'a,b,b,b,b,c' as original from dual union all

    select 'a,b,b,b,b,c,c' as original from dual union all

    select 'a,ab' as original from dual union all

    select 'a,b,bc,c' as original from dual union all

    select 'ab,b' as original from dual

)

select

    original,

    regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped1,

    regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2

from testdata;

Hello,

I think there is a bug in your regexp_replace. I find it easier to put a comma in front of the source, then remove it after the rexexp_replace. This will make more sense when I address some of the other requirements.

with testdata as (    select 'a,a,b,b' as original from dual)select    original,    regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2,    substr(regexp_replace(','||original, '(,[^,]+)(\1)+', '\1'),2) as dedupped3    from testdata;

| ORIGINAL | DEDUPPED2 | DEDUPPED3 |
| a,a,b,b | a,b,b | a,b |

Best regards,

Stew

Stew Ashton

Having read everything up to now, I have several comments:

  1. I am for DISTINCT
  2. The SQL standard has already stated where the DISTINCT should go: LISTAGG(DISTINCT ...)
  3. I am for LISTAGG being able to return a CLOB.
    I don't know if the SQL standard mandates syntax for this, but there are already other functions that specify the returning datatype using AS.
  4. Below is my best effort at a workaround that provides the equivalent of LISTAGG(DISTINCT ...) AS CLOB or AS VARCHAR2.
select trim(',' from  regexp_replace(    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB),    '(,[^,]*)(\1)+', '\1'  )) deduppedfrom sys.all_identifiers;DEDUPPED------------------------------------------------ASSIGNMENT,CALL,DECLARATION,DEFINITION,REFERENCE

This will return a CLOB, but it can be CAST as a VARCHAR2. We can even return a VARCHAR2 from XMLCAST if the data allows it.

The TRIM at the end is to remove a trailing comma if there is one or more NULL values in USAGE.

Best regards,

Stew

Stew Ashton

And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.

select  min(leaders) as leaders

       ,min(members) as members

from (

  select is_leader, membername

        ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername)

            over (partition by is_leader)

            as leaders

        ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername)

            over (partition by is_leader)

            as members

  from projects p

  left join teams t on t.project_id = p.project_id

  group by is_leader, membername

  );

However we lost the project count on the way...

Sven,

Applying my alternative to your problem:

with projects as (  select 1 project_id, 'IoT4.0' project_name from dual union all  select 2 project_id, 'VeryBigData' project_name from dual union all  select 3 project_id, 'Cloudify' project_name from dual),teams as (  select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all  select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all  select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all  select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all  select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all  select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all  select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all  select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all  select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all  select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all  select 11 team_id, 3 project_id, null membername, 'N' is_leader from dual)select count(distinct p.project_id) project_count,trim(',' from  regexp_replace(    xmlcast(xmlagg(        case when is_leader = 'Y' then xmlelement(E, ','||membername) end        order by membername      )as varchar2(4000)),    '(,[^,]+)(\1)+', '\1'  )) leaders,trim(',' from  regexp_replace(    xmlcast(xmlagg(        case when is_leader = 'N' then xmlelement(E, ','||membername) end        order by membername      ) as varchar2(4000)),    '(,[^,]*)(\1)+', '\1'  )) membersfrom projects pleft join teams t on t.project_id = p.project_id;

Best regards,

Stew

P.S. Why join projects and teams?

dirkvanhaute

Having read everything up to now, I have several comments:

  1. I am for DISTINCT
  2. The SQL standard has already stated where the DISTINCT should go: LISTAGG(DISTINCT ...)
  3. I am for LISTAGG being able to return a CLOB.
    I don't know if the SQL standard mandates syntax for this, but there are already other functions that specify the returning datatype using AS.
  4. Below is my best effort at a workaround that provides the equivalent of LISTAGG(DISTINCT ...) AS CLOB or AS VARCHAR2.
select trim(',' from  regexp_replace(    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB),    '(,[^,]*)(\1)+', '\1'  )) deduppedfrom sys.all_identifiers;DEDUPPED------------------------------------------------ASSIGNMENT,CALL,DECLARATION,DEFINITION,REFERENCE

This will return a CLOB, but it can be CAST as a VARCHAR2. We can even return a VARCHAR2 from XMLCAST if the data allows it.

The TRIM at the end is to remove a trailing comma if there is one or more NULL values in USAGE.

Best regards,

Stew

Hi Stew

Nice solution by using xmlagg, but why do you prefer regexp_replace over the row_number solution ?

I do not notice much difference in execution time, but I would guess deduplicating before creating the text would be more efficient.

Moreover you must be sure that usage does never contain spaces.

with t as (    select decode( row_number() over (partition by usage order by 1) ,1,usage,null) as usage   from sys.all_identifiers)select trim(',' from    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB))as deduppedfrom t;

Regards,
Dirk

Stew Ashton

Hi Stew

Nice solution by using xmlagg, but why do you prefer regexp_replace over the row_number solution ?

I do not notice much difference in execution time, but I would guess deduplicating before creating the text would be more efficient.

Moreover you must be sure that usage does never contain spaces.

with t as (    select decode( row_number() over (partition by usage order by 1) ,1,usage,null) as usage   from sys.all_identifiers)select trim(',' from    xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB))as deduppedfrom t;

Regards,
Dirk

Hi Dirk,

Thanks for your comment. I did a test with spaces and I don't see the problem; could you demonstrate?

I'm not sure I "prefer" the solution I provided. I was trying to find a solution that was a function (even with complex expressions in the parameters) and that could be plugged into the query without requiring subqueries or inline views.

If we are allowed subqueries, I would go for SELECT DISTINCT, not row_number. DISTINCT is the obvious and semantically meaningful notion, and I presume uses less memory.

Best regards,

Stew

Stew Ashton

Getting back to the original suggestion, I have a question about the idea itself: what should happen if we say

SELECT LISTAGG(DISTINCT COL1) WITHIN GROUP(ORDER BY COL2) ?

If there are multiple COL1 values, which COL2 should be chosen for ordering?

Best regards,

Stew

Solomon Yakobson

Getting back to the original suggestion, I have a question about the idea itself: what should happen if we say

SELECT LISTAGG(DISTINCT COL1) WITHIN GROUP(ORDER BY COL2) ?

If there are multiple COL1 values, which COL2 should be chosen for ordering?

Best regards,

Stew

Stew,

This is no different from:

SQL> select  distinct ename

  2    from  emp

  3    order by job

  4  /

  order by job

           *

ERROR at line 3:

ORA-01791: not a SELECTed expression

SQL>

SY,

Stew Ashton

Stew,

This is no different from:

SQL> select  distinct ename

  2    from  emp

  3    order by job

  4  /

  order by job

           *

ERROR at line 3:

ORA-01791: not a SELECTed expression

SQL>

SY,

It could be the same or it could be different.

According to this site https://modern-sql.com/feature/listagg

the SQL standard does not say that is an exception.

And the similar COLLECT function doesn't raise an exception either: it just ignores the DISTINCT.

SQL> select cast(collect(distinct job) as sys.odcivarchar2list) jobs  2  from emp;JOBS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ------------------------------------------------------------------------ODCIVARCHAR2LIST('ANALYST', 'CLERK', 'MANAGER', 'PRESIDENT', 'SALESMAN')SQL> select cast(collect(distinct job order by ename) as sys.odcivarchar2list) jobs  2  from emp;JOBS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               -------------------------------------------------------------------------------------------------------------------------------------------------------------ODCIVARCHAR2LIST('CLERK', 'SALESMAN', 'MANAGER', 'MANAGER', 'ANALYST', 'CLERK', 'MANAGER', 'PRESIDENT', 'SALESMAN', 'CLERK', 'CLERK', 'SALESMAN', 'SALESMAN')

So my question stands, since the answer is not obvious. I do take your reply as a vote for raising an exception: that is a valid possibility. I certainly prefer it to what the COLLECT function does!

Stew

Solomon Yakobson

It could be the same or it could be different.

According to this site https://modern-sql.com/feature/listagg

the SQL standard does not say that is an exception.

And the similar COLLECT function doesn't raise an exception either: it just ignores the DISTINCT.

SQL> select cast(collect(distinct job) as sys.odcivarchar2list) jobs  2  from emp;JOBS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ------------------------------------------------------------------------ODCIVARCHAR2LIST('ANALYST', 'CLERK', 'MANAGER', 'PRESIDENT', 'SALESMAN')SQL> select cast(collect(distinct job order by ename) as sys.odcivarchar2list) jobs  2  from emp;JOBS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               -------------------------------------------------------------------------------------------------------------------------------------------------------------ODCIVARCHAR2LIST('CLERK', 'SALESMAN', 'MANAGER', 'MANAGER', 'ANALYST', 'CLERK', 'MANAGER', 'PRESIDENT', 'SALESMAN', 'CLERK', 'CLERK', 'SALESMAN', 'SALESMAN')

So my question stands, since the answer is not obvious. I do take your reply as a vote for raising an exception: that is a valid possibility. I certainly prefer it to what the COLLECT function does!

Stew

I see your point but, IMHO, it's a bug. DISTINCT implies ORDER BY list is same or is a subset of DISTINCT expression list expressions (not even column list):

SQL> select  distinct sal + nvl(comm,0)

  2    from  emp

  3    order by sal

  4  /

  order by sal

           *

ERROR at line 3:

ORA-01791: not a SELECTed expression

SQL>

SY.

Racer I.

Hi,

> According to this site https://modern-sql.com/feature/listagg the SQL standard does not say that is an exception.

Arguably it doesn't allow the COLLECT result as it states that all duplicates must still be removed it just doesn't say which copy will survive. COLLECT apparently doesn't remove duplicates that aren't adjacent. Can anyone say if LISTAGG-DISTINCT shows the same behaviour?

regards,

Racer I.

Hi,

I just checked it and LISTAGG-DISTINCT works Ok with non-matching ORDER BY :

https://livesql.oracle.com/apex/livesql/s/hxn75ii2xjqfqhn0c1wxpu2zx

regards,

Racer I.

Hi,

Also arguably with the precedent of LISTAGG-DISTINCT the SELECT case could now behave the same, instead of raising ORA-01791...

regards,

1 - 48

Post Details

Added on Apr 4 2016
48 comments
110,160 views