Forum Stats

  • 3,828,088 Users
  • 2,260,862 Discussions
  • 7,897,472 Comments

Discussions

Remove duplicates in string without sorting(do not change the order)

Gor_Mahia
Gor_Mahia Member Posts: 1,146 Bronze Badge

All,

Ive strings like below and i want to remove duplicates without changing the sort order (so listagg might not work for me even with random() order clause)

sample:

i. [email protected], [email protected],[email protected],[email protected], [email protected],[email protected]


and correct output should be (remember dont change the order):

i. [email protected], [email protected],[email protected], [email protected]

oracle 11.x g,

thanks

Tagged:

Best Answers

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,492 Red Diamond
    Answer ✓
    with t as (
               select  level lvl,
                       regexp_substr(emails,'[^, ]+',1,level) email
                 from  tbl
                 connect by rowid = prior rowid
                     and prior sys_guid() is not null
                     and level <= regexp_count(emails,',') + 1
              )
    select  rtrim(
                  xmlcast(
                          xmlagg(xmlelement(e,email || ',') order by min(lvl))
                          as clob
                         ),
                  ','
                 ) "Unique Emails"
      from  t
      group by email
    /
    
    Unique Emails
    --------------------------------------------------------------------------------
    [email protected],[email protected],[email protected],[email protected]
    
    SQL>
    

    SY.

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,492 Red Diamond
    Answer ✓

    And if it is a column in multi-row table and also to consider emails are case insensitive:

    with t1 as (
                select  rowid rid,
                        level lvl,
                        regexp_substr(emails,'[^, ]+',1,level) email
                  from  tbl
                  connect by rowid = prior rowid
                      and prior sys_guid() is not null
                      and level <= regexp_count(emails,',') + 1
               ),
         t2 as (
                select  rid,
                        min(lvl) rn,
                        min(email) keep(dense_rank first order by lvl) email
                  from  t1
                  group by rid,
                           upper(email)
               )
    select  rtrim(
                  xmlcast(
                          xmlagg(xmlelement(e,email || ',') order by rn)
                          as clob
                         ),
                  ','
                 ) "Unique Emails"
      from  t2
      group by rid
    /
    
    Unique Emails
    --------------------------------------------------------------------------------
    [email protected],[email protected],[email protected],[email protected]
    
    SQL>
    

    SY.

Answers

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,492 Red Diamond
    Answer ✓
    with t as (
               select  level lvl,
                       regexp_substr(emails,'[^, ]+',1,level) email
                 from  tbl
                 connect by rowid = prior rowid
                     and prior sys_guid() is not null
                     and level <= regexp_count(emails,',') + 1
              )
    select  rtrim(
                  xmlcast(
                          xmlagg(xmlelement(e,email || ',') order by min(lvl))
                          as clob
                         ),
                  ','
                 ) "Unique Emails"
      from  t
      group by email
    /
    
    Unique Emails
    --------------------------------------------------------------------------------
    [email protected],[email protected],[email protected],[email protected]
    
    SQL>
    

    SY.

  • _jum
    _jum Member Posts: 543 Bronze Trophy
    edited Apr 29, 2022 5:49AM

    You could alternatively use:

    WITH t(emails) AS
     (SELECT '[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]' FROM dual)
    SELECT t.emails, x.column_value.getStringVal() dist_email
     FROM t
       , xmltable(('string-join(distinct-values(("' || replace(emails,',','","') || '")),",")')) x;
    
    EMAILS
    [email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
    
    DIST_EMAIL
    [email protected],[email protected],[email protected],[email protected]
    


    This is only a principle solution, it does not work in every special case, e.g. with emails containing double quotes as @SY correctly noted - thanks!

    Gor_Mahia
  • mathguy
    mathguy Member Posts: 10,539 Blue Diamond

    "Don't change the order" is not well defined.

    The most common, well-defined clarification looks like this: keep the FIRST occurrence of each token, and remove all the later ones.

    Is that what you meant? That seems so based on your example.

    The way to achieve that is to keep track of order when you split the input string; then keep only the lowest ordinal for each distinct token.

    with
      sample_data as (
        select '[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]'
               as str
        from   dual
      )
    select listagg(token, ',') within group (order by min(ord)) as new_str
    from   (
             select  level as ord, regexp_substr(str, '[^,]+', 1, level) as token
             from    sample_data
             connect by level <= 1 + regexp_count(str, ',')
          )
    group by token
    ;
    
    NEW_STR                                                               
    ----------------------------------------------------------------------
    [email protected],[email protected],[email protected],[email protected]
    


    Here I ignored your inconsistent use of space after comma (I assumed there should be no spaces either in the output or in the input), and assumed only one input string rather than several, etc. I focused strictly on how to preserve the "order of tokens" in the output based on first occurrence in the input.

    Note that listaggis available only in Oracle 11.2; "11.g" is only a marketing description used by Oracle. On a technical site like this one, you should give us your actual version, in full - something like 11.1.0.2.0 or 11.2.0.4.0.

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,492 Red Diamond
    Answer ✓

    And if it is a column in multi-row table and also to consider emails are case insensitive:

    with t1 as (
                select  rowid rid,
                        level lvl,
                        regexp_substr(emails,'[^, ]+',1,level) email
                  from  tbl
                  connect by rowid = prior rowid
                      and prior sys_guid() is not null
                      and level <= regexp_count(emails,',') + 1
               ),
         t2 as (
                select  rid,
                        min(lvl) rn,
                        min(email) keep(dense_rank first order by lvl) email
                  from  t1
                  group by rid,
                           upper(email)
               )
    select  rtrim(
                  xmlcast(
                          xmlagg(xmlelement(e,email || ',') order by rn)
                          as clob
                         ),
                  ','
                 ) "Unique Emails"
      from  t2
      group by rid
    /
    
    Unique Emails
    --------------------------------------------------------------------------------
    [email protected],[email protected],[email protected],[email protected]
    
    SQL>
    

    SY.

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,492 Red Diamond

    @_jum You could alternatively use:

    If I'm not mistaken double quote is allowed (with restrictions - must be preceded with backslash) in email address.

    SY.

  • odie_63
    odie_63 Member Posts: 8,493 Silver Trophy

    This solution should be using a bind variable for the XQuery expression, in which case we need an additional call to ora:tokenize(), and a proper casting of the result to VARCHAR2 (or CLOB) because getStringVal() is a serialization method, it's not meant to extract scalar values.

    WITH t(emails) AS (
     SELECT '[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]' FROM dual
    )
    SELECT /*+ no_xml_query_rewrite */ 
        t.emails, x.dist_email
    FROM t
      , xmltable('string-join(distinct-values(ora:tokenize($emails, ",")), ",")'
        passing t.emails as "emails"
        columns dist_email varchar2(4000) path '.'
       ) x
    ;
    

    NB:

    The resulting sequence order after applying distinct-values() is not deterministic, depending on the algorithm used. It seems like forcing functional evaluation (via no_xml_query_rewrite hint) retains the original order as required, but who knows?

    ora:tokenize() errors out when no separator pattern is found in the input string. There are a few workarounds to that. Though it works with fn:tokenize(), Oracle is still documenting it as unsupported.

    _jum