13 Replies Latest reply: Nov 25, 2010 5:28 AM by Aketi Jyuuzou RSS

    pivot + removal of duplicates

    742431
      Hi, I am on 11gr1 and need to pivot and concatenate string values.
      drop table test_a;
      
      create table test_a (
      place_id number,
      place_name varchar2(30), 
      tag varchar2(30),
      category_name varchar2(30),
      brand_name varchar2(30),
      chain_name varchar2(30));
      
      insert into test_a values (1, 'thai thai',    'asiantag',  'restaurant',  'independent', null);
      insert into test_a values (2, 'mann mobilia', 'furniture', 'home-center', 'xxl', 'moebelchain');
      insert into test_a values (2, 'mann mobilia', 'furniture', 'shop',        'xxl', 'moebelchain');
      insert into test_a values (2, 'mann mobilia', 'office',    'home-center', 'xxl', 'moebelchain');
      insert into test_a values (2, 'mann mobilia', 'office',    'shop',        'xxl', 'moebelchain');
      commit;
      
      
      select
        tb.place_id ,
        substr ( sys_connect_by_path( tb.tag,           ',' ) , 2 ) as tags,
        substr ( sys_connect_by_path( tb.brand_name,    ',' ) , 2 ) as brandnames,
        substr ( sys_connect_by_path( tb.chain_name,    ',' ) , 2 ) as chainnames,
        substr ( sys_connect_by_path( tb.category_name, ',' ) , 2 ) as categorynames,
        substr ( sys_connect_by_path( tb.place_name,    ',' ) , 2 ) as names
      from
        ( select 
            place_id ,
            tag ,
            category_name,
            brand_name,
            chain_name,
            place_name,
            row_number() over
              ( partition by place_id
                order by tag
              )
              as indx
          from
            (test_a)
        ) tb
      where
        connect_by_isleaf = 1
      connect by
        indx = prior indx + 1 and
        place_id = prior place_id
      start with
        indx = 1
      ;
      output:
      |1|asian                            |independent    |<null>                 |restaurant                       |my sushi                                       |
      |2|furniture,furniture,office,office|xxl,xxl,xxl,xxl|chain,chain,chain,chain|home-center,shop,home-center,shop|great place,great place,great place,great place|
      the select above is a good startingpoint but contains duplicates I'd like to avoid. I could write some PL/SQL script for that but would prefer to stick to plain and efficient SQL. Any idea?

      This is how the output should look like:
      |1|asian           |independent|<null>|restaurant      |my sushi   |
      |2|furniture,office|xxl        |chain |home-center,shop|great place|
        • 1. Re: pivot + removal of duplicates
          odie_63
          Hi,

          Thanks for the sample data.

          A couple of additional techniques :
          SELECT t.place_id,
                 rtrim(xmlelement("e", cast(collect(distinct t.tag||',') as sys.odcivarchar2list)).extract('//text()'), ',') as tags,
                 rtrim(xmlelement("e", cast(collect(distinct t.brand_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as brandnames,
                 rtrim(xmlelement("e", cast(collect(distinct t.chain_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as chainnames,
                 rtrim(xmlelement("e", cast(collect(distinct t.category_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as categorynames,
                 rtrim(xmlelement("e", cast(collect(distinct t.place_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as placenames
          FROM test_a t
          GROUP BY t.place_id
          ;
          ...
          The last one is untested (I don't have a 11g at hand right now).

          Edited by: odie_63 on 4 nov. 2010 16:35 - removed last solution, was a nonsense
          • 2. Re: pivot + removal of duplicates
            742431
            Thanks for your reply, unfortunately there are still duplicates. So from a functionality perspective your solution is as good as mine.
            • 3. Re: pivot + removal of duplicates
              Hoek
              Thanks for posting sample data.
              Inspired by Marc's idea ( and probably way too much coffee):
              SQL> with t as ( select place_id 
                2              ,      nullif(tag, lag(tag) over (partition by place_id order by place_id, tag)) tag
                3              ,      nullif(category_name, lag(category_name) over (partition by place_id order by place_id, category_name)) category_name 
                4              ,      nullif(brand_name, lag(brand_name) over (partition by place_id order by place_id, brand_name)) brand_name
                5              ,      nullif(chain_name, lag(chain_name) over (partition by place_id order by place_id, chain_name)) chain_name
                6              ,      nullif(place_name, lag(place_name) over (partition by place_id order by place_id, place_name)) place_name
                7              ,      row_number() over (partition by place_id order by tag )indx
                8              from   test_a
                9            ) 
               10  SELECT t.place_id,
               11         rtrim(regexp_replace(ltrim(xmlelement("e", cast(collect(t.tag||',') as sys.odcivarchar2list)).extract('//text()'), ','), '[,]+', ','), ',') as tags,
               12         rtrim(regexp_replace(ltrim(xmlelement("e", cast(collect(t.brand_name||',') as sys.odcivarchar2list)).extract('//text()'), ','), '[,]+', ','), ',') as brandnames,
               13         rtrim(regexp_replace(ltrim(xmlelement("e", cast(collect(t.chain_name||',') as sys.odcivarchar2list)).extract('//text()'), ','), '[,]+', ','), ',') as chainnames,
               14         rtrim(regexp_replace(ltrim(xmlelement("e", cast(collect(t.category_name||',') as sys.odcivarchar2list)).extract('//text()'), ','), '[,]+', ','), ',') as categorynames,
               15         rtrim(regexp_replace(ltrim(xmlelement("e", cast(collect(t.place_name||',') as sys.odcivarchar2list)).extract('//text()'), ','), '[,]+', ','), ',') as placenames
               16  FROM t
               17  GROUP BY t.place_id;
              
              PLACE_ID TAGS                      BRANDNAMES      CHAINNAMES      CATEGORYNAMES             PLACENAMES
              -------- ------------------------- --------------- --------------- ------------------------- -------------------------
                     1 asiantag                  independent                     restaurant                thai thai
                     2 furniture,office          xxl             moebelchain     shop,home-center          mann mobilia
              
              2 rows selected.
              
              SQL> 
              disclaimer
              I wouldn't be surprised if this could be done in a much simpler way ;)
              • 4. Re: pivot + removal of duplicates
                odie_63
                unfortunately there are still duplicates.
                I tested the COLLECT solution on both 10.1 and 11.2, it gives expected output.
                SQL> SELECT t.place_id,
                  2         rtrim(xmlelement("e", cast(collect(distinct t.tag||',') as sys.odcivarchar2list)).extract('//text()'), ',') as tags,
                  3         rtrim(xmlelement("e", cast(collect(distinct t.brand_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as brandnames,
                  4         rtrim(xmlelement("e", cast(collect(distinct t.chain_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as chainnames,
                  5         rtrim(xmlelement("e", cast(collect(distinct t.category_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as categorynames,
                  6         rtrim(xmlelement("e", cast(collect(distinct t.place_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as placenames
                  7  FROM test_a t
                  8  GROUP BY t.place_id
                  9  ;
                 
                  PLACE_ID TAGS                 BRANDNAMES    CHAINNAMES      CATEGORYNAMES      PLACENAMES
                ---------- -------------------- ------------- --------------- ------------------ --------------
                         1 asiantag             independent                   restaurant         thai thai
                         2 furniture,office     xxl           moebelchain     home-center,shop   mann mobilia
                 
                Apparently 11.1 behave differently?
                • 5. Re: pivot + removal of duplicates
                  Hoek
                  Here's what 10.2 gave me:
                  BANNER
                  ----------------------------------------------------------------
                  Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
                  PL/SQL Release 10.2.0.4.0 - Production
                  CORE    10.2.0.4.0      Production
                  TNS for Linux: Version 10.2.0.4.0 - Production
                  NLSRTL Version 10.2.0.4.0 - Production
                  
                  5 rows selected.
                  
                  SQL> SELECT t.place_id,
                    2         rtrim(xmlelement("e", cast(collect(distinct t.tag||',') as sys.odcivarchar2list)).extract('//text()'), ',') as tags,
                    3         rtrim(xmlelement("e", cast(collect(distinct t.brand_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as brandnames,
                    4         rtrim(xmlelement("e", cast(collect(distinct t.chain_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as chainnames,
                    5         rtrim(xmlelement("e", cast(collect(distinct t.category_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as categorynames,
                    6         rtrim(xmlelement("e", cast(collect(distinct t.place_name||',') as sys.odcivarchar2list)).extract('//text()'), ',') as placenames
                    7  FROM test_a t
                    8  GROUP BY t.place_id
                    9  /
                  
                  PLACE_ID TAGS                      BRANDNAMES      CHAINNAMES      CATEGORYNAMES             PLACENAMES
                  -------- ------------------------- --------------- --------------- ------------------------- -------------------------
                         1 asiantag                  independent                     restaurant                thai thai
                         2 furniture,office,office,f xxl,xxl,xxl,xxl moebelchain,moe home-center,shop,shop,hom mann mobilia,mann mobilia
                           urniture                                  belchain,moebel e-center                  ,mann mobilia,mann mobili
                                                                     chain,moebelcha                           a
                                                                     in
                  
                  
                  2 rows selected.
                  
                  SQL> 
                  • 6. Re: pivot + removal of duplicates
                    Solomon Yakobson
                    hoek wrote:

                    I wouldn't be surprised if this could be done in a much simpler way ;)
                    Well, before we rush into looking for a better solutions, I'd say you should be careful with XML solutions for non-xml data. Assume place names are: 'thai & thai' or '<mann mobilia>' and guess what, your solution will return:
                      PLACE_ID TAGS                 BRANDNAMES  CHAINNAMES  CATEGORYNAMES        PLACENAMES
                    ---------- -------------------- ----------- ----------- -------------------- ------------------------------
                             1 asiantag             independent             restaurant           thai &amp; thai
                             2 furniture,office     xxl         moebelchain shop,home-center     &lt ;mann mobilia&gt ;
                    SY.
                    • 7. Re: pivot + removal of duplicates
                      Hoek
                      I'd say you should be careful with XML solutions for non-xml data.
                      Good point, thanks for pointing out.

                      Here's a non-XML solution, borrowing the table_to_string function from http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php#collect .
                      I handle the value_error exception in order to handle possible NULLS.
                      SQL> create or replace function tbl2str
                        2  ( p_tbl in sys.odcivarchar2list 
                        3  , p_delimiter in varchar2 default ',')
                        4  return varchar2
                        5  is
                        6    l_string     varchar2(32767);
                        7  begin
                        8    for i in p_tbl.first .. p_tbl.last loop
                        9      if i != p_tbl.first then
                       10        l_string := l_string || p_delimiter;
                       11      end if;
                       12      l_string := l_string||p_tbl(i);
                       13    end loop;
                       14    --
                       15    return l_string;
                       16    --
                       17  exception 
                       18    when value_error then return null;
                       19  end tbl2str;
                       20  /
                      
                      Function created.
                      
                      SQL> with t as ( select place_id 
                        2              ,      nullif(tag, lag(tag) over (partition by place_id order by place_id, tag)) tag
                        3              ,      nullif(category_name, lag(category_name) over (partition by place_id order by place_id, category_name)) category_name 
                        4              ,      nullif(brand_name, lag(brand_name) over (partition by place_id order by place_id, brand_name)) brand_name
                        5              ,      nullif(chain_name, lag(chain_name) over (partition by place_id order by place_id, chain_name)) chain_name
                        6              ,      nullif(place_name, lag(place_name) over (partition by place_id order by place_id, place_name)) place_name
                        7              ,      row_number() over (partition by place_id order by tag )indx
                        8              from   test_a
                        9            ) 
                       10  select place_id
                       11  ,      tbl2str(cast(collect(tag) as sys.odcivarchar2list)) tag 
                       12  ,      tbl2str(cast(collect(category_name) as sys.odcivarchar2list)) category_name
                       13  ,      tbl2str(cast(collect(brand_name) as sys.odcivarchar2list)) brand_name
                       14  ,      tbl2str(cast(collect(chain_name) as sys.odcivarchar2list)) chain_name
                       15  ,      tbl2str(cast(collect(place_name) as sys.odcivarchar2list)) place_name
                       16  from   t
                       17  group by place_id;
                      
                      PLACE_ID TAG                       CATEGORY_NAME             BRAND_NAME                CHAIN_NAME                PLACE_NAME
                      -------- ------------------------- ------------------------- ------------------------- ------------------------- -------------------------
                             1 asiantag                  restaurant                independent                                         thai thai
                             2 furniture,office          shop,home-center          xxl                       moebelchain               mann mobilia
                      
                      2 rows selected.
                      • 8. Re: pivot + removal of duplicates
                        odie_63
                        And with a UDT such as
                        create type varchar2tab as table of varchar2(4000);
                        /
                        we can use the SET operator to remove the duplicates :
                        select place_id
                        ,      tbl2str(set(cast(collect(tag) as varchar2tab))) tag 
                        ,      tbl2str(set(cast(collect(category_name) as varchar2tab))) category_name
                        ,      tbl2str(set(cast(collect(brand_name) as varchar2tab))) brand_name
                        ,      tbl2str(set(cast(collect(chain_name) as varchar2tab))) chain_name
                        ,      tbl2str(set(cast(collect(place_name) as varchar2tab))) place_name
                        from test_a
                        group by place_id;
                        (of course the function also needs a little change to accept the collection type)

                        Edited by: odie_63 on 4 nov. 2010 18:09
                        • 9. Re: pivot + removal of duplicates
                          Hoek
                          Nice one, thanks!
                          • 10. Re: pivot + removal of duplicates
                            742431
                            Thanks a lot everyone.
                            Here is the final code
                            drop table test_a;
                            drop type t_string_list;
                             
                            create table test_a (
                            place_id number,
                            place_name varchar2(30), 
                            tag varchar2(30),
                            category_name varchar2(30),
                            brand_name varchar2(30),
                            chain_name varchar2(30));
                             
                            insert into test_a values (1, 'thai thai',    'asiantag',  'restaurant',  'independent', null);
                            insert into test_a values (2, 'mann mobilia', 'furniture', 'home-center', 'xxl', 'moebelchain');
                            insert into test_a values (2, 'mann mobilia', 'furniture', 'shop',        'xxl', 'moebelchain');
                            insert into test_a values (2, 'mann mobilia', 'office',    'home-center', 'xxl', 'moebelchain');
                            insert into test_a values (2, 'mann mobilia', 'office',    'shop',        'xxl', 'moebelchain');
                            commit;
                            
                            
                            
                            
                            CREATE TYPE T_STRING_LIST AS TABLE OF VARCHAR2(4000);
                            
                            
                            create or replace function tbl2str
                                ( p_tbl in T_STRING_LIST 
                                , p_delimiter in varchar2 default ',')
                                return varchar2
                                is
                                  l_string     varchar2(32767);
                            begin
                                  for i in p_tbl.first .. p_tbl.last loop
                                    if i != p_tbl.first then
                                     l_string := l_string || p_delimiter;
                                   end if;
                                   l_string := l_string||p_tbl(i);
                                 end loop;
                                 --
                                 return l_string;
                                 --
                               exception 
                                 when value_error then return null;
                            end tbl2str;
                            / 
                            
                            
                            select place_id
                            ,      tbl2str(set(cast(collect(tag) as T_STRING_LIST))) tag 
                            ,      tbl2str(set(cast(collect(category_name) as T_STRING_LIST))) category_name
                            ,      tbl2str(set(cast(collect(brand_name) as T_STRING_LIST))) brand_name
                            ,      tbl2str(set(cast(collect(chain_name) as T_STRING_LIST))) chain_name
                            ,      tbl2str(set(cast(collect(place_name) as T_STRING_LIST))) place_name
                            from test_a
                            group by place_id;
                            output:
                            |1|asiantag        |restaurant      |independent|           |thai thai   |
                            |2|furniture,office|home-center,shop|xxl        |moebelchain|mann mobilia|
                            • 11. Re: pivot + removal of duplicates
                              742431
                              Thanks a lot everyone.
                              Here is the final code
                              drop table test_a;
                              drop type t_string_list;
                               
                              create table test_a (
                              place_id number,
                              place_name varchar2(30), 
                              tag varchar2(30),
                              category_name varchar2(30),
                              brand_name varchar2(30),
                              chain_name varchar2(30));
                               
                              insert into test_a values (1, 'thai thai',    'asiantag',  'restaurant',  'independent', null);
                              insert into test_a values (2, 'mann mobilia', 'furniture', 'home-center', 'xxl', 'moebelchain');
                              insert into test_a values (2, 'mann mobilia', 'furniture', 'shop',        'xxl', 'moebelchain');
                              insert into test_a values (2, 'mann mobilia', 'office',    'home-center', 'xxl', 'moebelchain');
                              insert into test_a values (2, 'mann mobilia', 'office',    'shop',        'xxl', 'moebelchain');
                              commit;
                              
                              
                              
                              
                              CREATE TYPE T_STRING_LIST AS TABLE OF VARCHAR2(4000);
                              
                              
                              create or replace function tbl2str
                                  ( p_tbl in T_STRING_LIST 
                                  , p_delimiter in varchar2 default ',')
                                  return varchar2
                                  is
                                    l_string     varchar2(32767);
                              begin
                                    for i in p_tbl.first .. p_tbl.last loop
                                      if i != p_tbl.first then
                                       l_string := l_string || p_delimiter;
                                     end if;
                                     l_string := l_string||p_tbl(i);
                                   end loop;
                                   --
                                   return l_string;
                                   --
                                 exception 
                                   when value_error then return null;
                              end tbl2str;
                              / 
                              
                              
                              select place_id
                              ,      tbl2str(set(cast(collect(tag) as T_STRING_LIST))) tag 
                              ,      tbl2str(set(cast(collect(category_name) as T_STRING_LIST))) category_name
                              ,      tbl2str(set(cast(collect(brand_name) as T_STRING_LIST))) brand_name
                              ,      tbl2str(set(cast(collect(chain_name) as T_STRING_LIST))) chain_name
                              ,      tbl2str(set(cast(collect(place_name) as T_STRING_LIST))) place_name
                              from test_a
                              group by place_id;
                              output:
                              |1|asiantag        |restaurant      |independent|           |thai thai   |
                              |2|furniture,office|home-center,shop|xxl        |moebelchain|mann mobilia|
                              • 12. Re: pivot + removal of duplicates
                                742431
                                Thanks a lot everyone.
                                Here is the final code
                                drop table test_a;
                                drop type t_string_list;
                                 
                                create table test_a (
                                place_id number,
                                place_name varchar2(30), 
                                tag varchar2(30),
                                category_name varchar2(30),
                                brand_name varchar2(30),
                                chain_name varchar2(30));
                                 
                                insert into test_a values (1, 'thai thai',    'asiantag',  'restaurant',  'independent', null);
                                insert into test_a values (2, 'mann mobilia', 'furniture', 'home-center', 'xxl', 'moebelchain');
                                insert into test_a values (2, 'mann mobilia', 'furniture', 'shop',        'xxl', 'moebelchain');
                                insert into test_a values (2, 'mann mobilia', 'office',    'home-center', 'xxl', 'moebelchain');
                                insert into test_a values (2, 'mann mobilia', 'office',    'shop',        'xxl', 'moebelchain');
                                commit;
                                
                                
                                
                                
                                CREATE TYPE T_STRING_LIST AS TABLE OF VARCHAR2(4000);
                                
                                
                                create or replace function tbl2str
                                    ( p_tbl in T_STRING_LIST 
                                    , p_delimiter in varchar2 default ',')
                                    return varchar2
                                    is
                                      l_string     varchar2(32767);
                                begin
                                      for i in p_tbl.first .. p_tbl.last loop
                                        if i != p_tbl.first then
                                         l_string := l_string || p_delimiter;
                                       end if;
                                       l_string := l_string||p_tbl(i);
                                     end loop;
                                     --
                                     return l_string;
                                     --
                                   exception 
                                     when value_error then return null;
                                end tbl2str;
                                / 
                                
                                
                                select place_id
                                ,      tbl2str(set(cast(collect(tag) as T_STRING_LIST))) tag 
                                ,      tbl2str(set(cast(collect(category_name) as T_STRING_LIST))) category_name
                                ,      tbl2str(set(cast(collect(brand_name) as T_STRING_LIST))) brand_name
                                ,      tbl2str(set(cast(collect(chain_name) as T_STRING_LIST))) chain_name
                                ,      tbl2str(set(cast(collect(place_name) as T_STRING_LIST))) place_name
                                from test_a
                                group by place_id;
                                output:
                                |1|asiantag        |restaurant      |independent|           |thai thai   |
                                |2|furniture,office|home-center,shop|xxl        |moebelchain|mann mobilia|
                                • 13. Re: pivot + removal of duplicates
                                  Aketi Jyuuzou
                                  This is unDocumented solution :8}
                                  col place_name for a13
                                  col tag for a16
                                  col category_name for a16
                                  col brand_name for a15
                                  col chain_name for a15
                                  
                                  select place_id,
                                  wmsys.wm_concat(distinct place_name) as place_name,
                                  wmsys.wm_concat(distinct tag) as tag,
                                  wmsys.wm_concat(distinct category_name) as category_name,
                                  wmsys.wm_concat(distinct brand_name) as brand_name,
                                  wmsys.wm_concat(distinct chain_name) as chain_name
                                    from test_a
                                  group by place_id;
                                  
                                  PLACE_ID  PLACE_NAME     TAG               CATEGORY_NAME     BRAND_NAME       CHAIN_NAME 
                                  --------  -------------  ----------------  ----------------  ---------------  -----------
                                         1  thai thai      asiantag          restaurant        independent      null       
                                         2  mann mobilia   furniture,office  home-center,shop  xxl              moebelchain