Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 63 Insurance
- 535.7K On-Premises Infrastructure
- 138.1K Analytics Software
- 38.6K Application Development Software
- 5.6K Cloud Platform
- 109.3K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
LISTAGG with DISTINCT option
Comments
-
You don't need two listagg functions, you just need to do a distinct on the underlying table(s). So instead of this:
select deptno,listagg (sal,', ') within group (order by sal) from scott.emp group by deptno;
do this
select deptno,listagg (sal,', ') within group (order by sal) from (select distinct deptno,sal from scott.emp) group by deptno;
An example:
Here I have some test data for projects and teams.
I want to have a list of all team leaders and a second list of all team members. "Paul" is a team leader in several teams, but I want him only to be listed once. This can not be easily solved useing the current logic. (it is possible). Bines suggestion would be simliar to the COUNT(distinct column) logic and would solve such scenarios very easily.
with projects as (select 1 project_id, 'IoT4.0' project_name from dual union all select 2 project_id, 'VeryBigData' project_name from dual union all select 3 project_id, 'Cloudify' project_name from dual ) ,teams as (select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all select 11 team_id, 3 project_id, 'Garfunkel' membername, 'N' is_leader from dual ) select count(distinct p.project_id) project_count ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) leaders ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) members from projects p left join teams t on t.project_id = p.project_id group by ();
Result:
PROJECT_COUNT LEADERS MEMBERS 3 John,Paul,Paul,Paul Art,Garfunkel,George,John,Ringo,Simon,Wayne -
An example:
Here I have some test data for projects and teams.
I want to have a list of all team leaders and a second list of all team members. "Paul" is a team leader in several teams, but I want him only to be listed once. This can not be easily solved useing the current logic. (it is possible). Bines suggestion would be simliar to the COUNT(distinct column) logic and would solve such scenarios very easily.
with projects as (select 1 project_id, 'IoT4.0' project_name from dual union all select 2 project_id, 'VeryBigData' project_name from dual union all select 3 project_id, 'Cloudify' project_name from dual ) ,teams as (select 1 team_id, 1 project_id, 'John' membername, 'Y' is_leader from dual union all select 2 team_id, 1 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 3 team_id, 1 project_id, 'George' membername, 'N' is_leader from dual union all select 4 team_id, 1 project_id, 'Ringo' membername, 'N' is_leader from dual union all select 5 team_id, 2 project_id, 'John' membername, 'N' is_leader from dual union all select 6 team_id, 2 project_id, 'Wayne' membername, 'N' is_leader from dual union all select 7 team_id, 2 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 8 team_id, 3 project_id, 'Paul' membername, 'Y' is_leader from dual union all select 9 team_id, 3 project_id, 'Simon' membername, 'N' is_leader from dual union all select 10 team_id, 3 project_id, 'Art' membername, 'N' is_leader from dual union all select 11 team_id, 3 project_id, 'Garfunkel' membername, 'N' is_leader from dual ) select count(distinct p.project_id) project_count ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) leaders ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) members from projects p left join teams t on t.project_id = p.project_id group by ();
Result:
PROJECT_COUNT LEADERS MEMBERS 3 John,Paul,Paul,Paul Art,Garfunkel,George,John,Ringo,Simon,Wayne And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.
select min(leaders) as leaders ,min(members) as members from ( select is_leader, membername ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) over (partition by is_leader) as leaders ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) over (partition by is_leader) as members from projects p left join teams t on t.project_id = p.project_id group by is_leader, membername );
However we lost the project count on the way...
-
This option will be useful and will help in avoiding additional inline views.
-
And here is a solution that works using a double group by and the analytic version of listagg. It is a bit tricky to understand and therefore almost impossible to maintain.
select min(leaders) as leaders ,min(members) as members from ( select is_leader, membername ,listagg(case when is_leader = 'Y' then membername end,',') within group (order by membername) over (partition by is_leader) as leaders ,listagg(case when is_leader = 'N' then membername end,',') within group (order by membername) over (partition by is_leader) as members from projects p left join teams t on t.project_id = p.project_id group by is_leader, membername );
However we lost the project count on the way...
Thank you, Sven! I think this demonstrates nicely how much easier the suggested syntax (or something similar) would be
-
Hi @Chris Hunt,
this is fine as long as I want to concatenate only one column.
I'm still looking for a solution for concatenating two columns (separately)
or for using listagg with "over (partition by ...)"
Hi,
another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:
select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages from sys.all_identifiers where owner = 'ME' and object_name = 'PKG_EXAMPLE' and object_type = 'PACKAGE BODY' group by name, signature, type
The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.
with testdata as ( select 'a' as original from dual union all select 'a,b' as original from dual union all select 'a,b,c' as original from dual union all select 'a,b,b' as original from dual union all select 'a,b,b,b' as original from dual union all select 'a,b,b,b,b' as original from dual union all select 'a,b,b,b,b,c' as original from dual union all select 'a,b,b,b,b,c,c' as original from dual ) select original, regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped from testdata
Bjoern
-
Hi,
another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:
select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages from sys.all_identifiers where owner = 'ME' and object_name = 'PKG_EXAMPLE' and object_type = 'PACKAGE BODY' group by name, signature, type
The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.
with testdata as ( select 'a' as original from dual union all select 'a,b' as original from dual union all select 'a,b,c' as original from dual union all select 'a,b,b' as original from dual union all select 'a,b,b,b' as original from dual union all select 'a,b,b,b,b' as original from dual union all select 'a,b,b,b,b,c' as original from dual union all select 'a,b,b,b,b,c,c' as original from dual ) select original, regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped from testdata
Bjoern
Nice usage of a backreference there!
And since listagg returnes ordered output it should work for all normal cases.
-
Hi,
another option (as long as there is no DISTINCT available) would be to remove the duplicates after the fact using Oracle's regular expression features like so:
select name, signature, type, regexp_replace(listagg(usage,', '), '<pattern>') within group (order by usage) usages from sys.all_identifiers where owner = 'ME' and object_name = 'PKG_EXAMPLE' and object_type = 'PACKAGE BODY' group by name, signature, type
The following pattern works if the separator is just one character (a comma for example). If haven't tried your combination of comma plus blank. Should be a bit more tricky, though.
with testdata as ( select 'a' as original from dual union all select 'a,b' as original from dual union all select 'a,b,c' as original from dual union all select 'a,b,b' as original from dual union all select 'a,b,b,b' as original from dual union all select 'a,b,b,b,b' as original from dual union all select 'a,b,b,b,b,c' as original from dual union all select 'a,b,b,b,b,c,c' as original from dual ) select original, regexp_replace(original, '([^,]+)(,\1)+', '\1') as dedupped from testdata
Bjoern
And this only works for 1 character length data, right?
I have this from my notes regarding distinct options with listagg
rtrim(xmltype('<r><n>' || replace(REGEXP_REPLACE( comma_delimited_column, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>' ).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')
A native feature would be awesome. I use listagg a lot, and often want distinct output. I'd also like to handle more than 4000 characters...
-
And this only works for 1 character length data, right?
I have this from my notes regarding distinct options with listagg
rtrim(xmltype('<r><n>' || replace(REGEXP_REPLACE( comma_delimited_column, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>' ).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')
A native feature would be awesome. I use listagg a lot, and often want distinct output. I'd also like to handle more than 4000 characters...
swesley_perth wrote: And this only works for 1 character length data, right? ...
It depends. It can be adapted for multi character delimiters as well, but only as long as none of the characters is part of the main string.
-
This would definitely make queries easier and avoid an additional step to subquery distinct values before doing the LISTAGG.
But I'd propose a slightly different format for the function. Since we already have DISTINCT functionality in some aggregate functions like COUNT, that should be consistent with LISTAGG. So I think the syntax should ideally look like -
LISTAGG ( [DISTINCT] <column_name> [ , <delimiter> ] ) OVER ...
This makes it clearer that the DISTINCT is applied on the column values.
-
This would definitely make queries easier and avoid an additional step to subquery distinct values before doing the LISTAGG.
But I'd propose a slightly different format for the function. Since we already have DISTINCT functionality in some aggregate functions like COUNT, that should be consistent with LISTAGG. So I think the syntax should ideally look like -
LISTAGG ( [DISTINCT] <column_name> [ , <delimiter> ] ) OVER ...
This makes it clearer that the DISTINCT is applied on the column values.
I think this is a great idea too. Though I'd like the DISTINCT here:
LISTAGG (<column_name> [ , <delimiter> ] ) DISTINCT WITHIN GROUP..... ;-)
Although Yes, you can do a DISTINCT sub-query, what happens if you want to mix them in the same statement? That's why this would be really useful.
I wonder though, are there any other functions that would benefit from a similar change....?