Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 63 Insurance
- 535.7K On-Premises Infrastructure
- 138.1K Analytics Software
- 38.6K Application Development Software
- 5.6K Cloud Platform
- 109.3K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
LISTAGG with DISTINCT option
Comments
-
Agreed, Sort also should be there within the LISTAGG function and then only this can be used effectively ?
Sort is there by default. WITHIN GROUP (ORDER BY ....)
-
My suggestion is to introduce a function called
Distinct()
So then it could be used in other places also, such as varrays, table type variables......
-
My suggestion is to introduce a function called
Distinct()
So then it could be used in other places also, such as varrays, table type variables......
Sounds a bit like
SET
. -
Hi,
another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:
select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages from sys.all_identifiers where owner = 'ME' and object_name = 'PKG_EXAMPLE' and object_type = 'PACKAGE BODY' group by name, signature, type
The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.
with testdata as ( select 'a' as original from dual union all select 'a,b' as original from dual union all select 'a,b,c' as original from dual union all select 'a,b,b' as original from dual union all select 'a,b,b,b' as original from dual union all select 'a,b,b,b,b' as original from dual union all select 'a,b,b,b,b,c' as original from dual union all select 'a,b,b,b,b,c,c' as original from dual ) select original, regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped from testdata
Bjoern
hi Bjoern
I've used your regexp proposal for my own needs and have changed it slightly to cover other use cases (related to substring match). I use now '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4'. I'm providing new test cases below, which are managed correctly by the new version:
with testdata as (
select 'a,' as original from dual union all
select 'a,b' as original from dual union all
select 'a,b,c' as original from dual union all
select 'a,b,b' as original from dual union all
select 'a,b,b,b' as original from dual union all
select 'a,b,b,b,b' as original from dual union all
select 'a,b,b,b,b,c' as original from dual union all
select 'a,b,b,b,b,c,c' as original from dual union all
select 'a,ab' as original from dual union all
select 'a,b,bc,c' as original from dual union all
select 'ab,b' as original from dual
)
select
original,
regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped1,
regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2
from testdata;
-
-
-
hi Bjoern
I've used your regexp proposal for my own needs and have changed it slightly to cover other use cases (related to substring match). I use now '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4'. I'm providing new test cases below, which are managed correctly by the new version:
with testdata as (
select 'a,' as original from dual union all
select 'a,b' as original from dual union all
select 'a,b,c' as original from dual union all
select 'a,b,b' as original from dual union all
select 'a,b,b,b' as original from dual union all
select 'a,b,b,b,b' as original from dual union all
select 'a,b,b,b,b,c' as original from dual union all
select 'a,b,b,b,b,c,c' as original from dual union all
select 'a,ab' as original from dual union all
select 'a,b,bc,c' as original from dual union all
select 'ab,b' as original from dual
)
select
original,
regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped1,
regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2
from testdata;
Hello,
I think there is a bug in your regexp_replace. I find it easier to put a comma in front of the source, then remove it after the rexexp_replace. This will make more sense when I address some of the other requirements.
with testdata as ( select 'a,a,b,b' as original from dual)select original, regexp_replace(original, '(^|,)([^,]+)(,\2)+($|,)', '\1\2\4') as dedupped2, substr(regexp_replace(','||original, '(,[^,]+)(\1)+', '\1'),2) as dedupped3 from testdata;
ORIGINAL DEDUPPED2 DEDUPPED3 a,a,b,b a,b,b a,b Best regards,
Stew
-
Having read everything up to now, I have several comments:
- I am for DISTINCT
- The SQL standard has already stated where the DISTINCT should go: LISTAGG(DISTINCT ...)
- I am for LISTAGG being able to return a CLOB.
I don't know if the SQL standard mandates syntax for this, but there are already other functions that specify the returning datatype using AS. - Below is my best effort at a workaround that provides the equivalent of LISTAGG(DISTINCT ...) AS CLOB or AS VARCHAR2.
select trim(',' from regexp_replace( xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB), '(,[^,]*)(\1)+', '\1' )) deduppedfrom sys.all_identifiers;DEDUPPED------------------------------------------------ASSIGNMENT,CALL,DECLARATION,DEFINITION,REFERENCE
This will return a CLOB, but it can be CAST as a VARCHAR2. We can even return a VARCHAR2 from XMLCAST if the data allows it.
The TRIM at the end is to remove a trailing comma if there is one or more NULL values in USAGE.
Best regards,
Stew
-
And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.
select min(leaders) as leaders ,min(members) as members from ( select is_leader, membername ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) over (partition by is_leader) as leaders ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) over (partition by is_leader) as members from projects p left join teams t on t.project_id = p.project_id group by is_leader, membername );
However we lost the project count on the way...
Sven,
Applying my alternative to your problem:
with projects as ( select 1 project_id, 'IoT4.0' project_name from dual union all select 2 project_id, 'VeryBigData' project_name from dual union all select 3 project_id, 'Cloudify' project_name from dual),teams as ( select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all select 11 team_id, 3 project_id, null membername, 'N' is_leader from dual)select count(distinct p.project_id) project_count,trim(',' from regexp_replace( xmlcast(xmlagg( case when is_leader = 'Y' then xmlelement(E, ','||membername) end order by membername )as varchar2(4000)), '(,[^,]+)(\1)+', '\1' )) leaders,trim(',' from regexp_replace( xmlcast(xmlagg( case when is_leader = 'N' then xmlelement(E, ','||membername) end order by membername ) as varchar2(4000)), '(,[^,]*)(\1)+', '\1' )) membersfrom projects pleft join teams t on t.project_id = p.project_id;
Best regards,
Stew
P.S. Why join projects and teams?
-
Having read everything up to now, I have several comments:
- I am for DISTINCT
- The SQL standard has already stated where the DISTINCT should go: LISTAGG(DISTINCT ...)
- I am for LISTAGG being able to return a CLOB.
I don't know if the SQL standard mandates syntax for this, but there are already other functions that specify the returning datatype using AS. - Below is my best effort at a workaround that provides the equivalent of LISTAGG(DISTINCT ...) AS CLOB or AS VARCHAR2.
select trim(',' from regexp_replace( xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB), '(,[^,]*)(\1)+', '\1' )) deduppedfrom sys.all_identifiers;DEDUPPED------------------------------------------------ASSIGNMENT,CALL,DECLARATION,DEFINITION,REFERENCE
This will return a CLOB, but it can be CAST as a VARCHAR2. We can even return a VARCHAR2 from XMLCAST if the data allows it.
The TRIM at the end is to remove a trailing comma if there is one or more NULL values in USAGE.
Best regards,
Stew
Hi Stew
Nice solution by using xmlagg, but why do you prefer regexp_replace over the row_number solution ?
I do not notice much difference in execution time, but I would guess deduplicating before creating the text would be more efficient.
Moreover you must be sure that usage does never contain spaces.
with t as (
select decode( row_number() over (partition by usage order by 1) ,1,usage,null) as usage from sys.all_identifiers)select trim(',' from xmlcast(xmlagg(xmlelement(E,','||USAGE) order by USAGE) AS CLOB))as deduppedfrom t;Regards,
Dirk