Forum Stats

  • 3,839,036 Users
  • 2,262,440 Discussions
  • 7,900,837 Comments

Discussions

CSV column comparison to single value column - SQL query

Rengudi
Rengudi Member Posts: 568 Bronze Badge

Hi

Greetings


CREATE TABLE TABLE1(GROUP_NAME,Group_members) AS

SELECT 'Alpha,beta, gamma' AS GROUP_NAME,'con2,con3,con4' AS Group_members FROM dual;


CREATE TABLE TABLE2(GROUP_NAME,Group_Owners) AS

SELECT 'Alpha' AS GROUP_NAME,'con1,cont2' AS Group_members FROM dual

UNION ALL

SELECT 'beta','con3' FROM dual

UNION ALL

SELECT 'gamma','con4, con5' FROM dual

UNION ALL

SELECT 'delta','con6' FROM dual;


Tagged:

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,246 Red Diamond

    Hi, @Rengudi

    Here's an example of how you should give your requirements:

    1. In table1, group_name is unique; it contains a comma-delimited list of individual names. Spaces immediately before or after commas should be ignored.
    2. In table2, group name is also unique, but the name is misleading: it contains only individual names.
    3. The result set should contain one row for every row in table1, plus one row for every value of tabl2.group_name that is not part of any group_names in table1. (E.g., in this example, table2.group_names='delta' is not a comma-delimited part of any value in tabl1,group_names, so the result set contains a row with group_names='delta' and NULL group_members.)
    4. The group_owners column of the output should contain a comma-delimited list of all the group_owners values from table2 that match a substring of table1.group_names. These items can be in any order.

    If those happen to be your requirements, then here's one way to meet them:

    WITH  normalized_table1  AS
    (
      SELECT  t1.group_name
      ,       t1.group_members
      ,	  n1.single_name
      FROM  table1 t1
      CROSS APPLY (
      		  SELECT  TRIM  ( REGEXP_SUBSTR ( t1.group_name
    		  	  		 	 , '[^,]+'
    					 	 , 1
    					 	 , LEVEL
    						 )
    				 ) AS single_name
    		  FROM   dual
    		  CONNECT BY LEVEL <= REGEXP_COUNT ( t1.group_name
    		  	  		 	    , '[^,]+'
    						    )
      		)  n1
    )
    SELECT   COALESCE ( m.group_name
    	 	  , o.group_name
    		  )	AS group_name
    ,	 m.group_members
    ,	 LISTAGG (o.group_owners, ',')
    	      WITHIN GROUP (ORDER BY o.group_owners) AS group_owners
    FROM	  normalized_table1 m
    FULL JOIN table2	   o ON o.group_name = m.single_name
    GROUP BY COALESCE ( m.group_name
    	 	  , o.group_name
    		  )
    ,	 m.group_members
    ORDER BY group_name
    ;
    

    Notice that the sub-query normalized_table1 contains most of the code. None of that would be necessary if you stored the data in First Normal Form.

    The output from the given sample data is:

    GROUP_NAME        GROUP_MEMBERS GROUP_OWNERS
    ----------------- -------------- ------------------------------
    Alpha,beta, gamma con2,con3,con4 con1,cont2,con3,con4, con5
    delta             con6
    

    This is not quite what you posted: it contains 'cont2', as found in table1, not 'con2' as you posted. I'm guessing that was just a mistake in your message.

    Rengudi

Answers

  • BEDE
    BEDE Oracle Developer Member Posts: 2,454 Gold Trophy
    edited Jul 27, 2022 1:33PM

    That looks like bad design for a relational database. Personally, I'd rather avoid hacking on such a bad design, which is bound to cause problems in the future.

    A cleaner design should be something like below:

    create sequence group_id_s start with 1 increment by 1 nomaxvalue nocache
    ;
    create table group_mstr as (
    group_id number default group_id_s.nextval not null -- to be referred via FK
    ,group_cd varchar2(30) not null --- short code for group identification
    ,group_descript varchar2(4000)
    )
    ;
    alter table group_mstr add constraint group_mstr_pk primary key(group_id)
    ;
    alter table group_mstr add constraint group_mstr_cd_uq unique(group_cd)
    ;
    create table group_members (
    group_id number not null
    ,member_code varchar2(30) not null
    ,active verchar2(1) default 'Y' not null
    )
    ;
    alter table group_members  add constraint group_members_pk
    primary key (group_id, member_code)
    ;
    alter table add constraint group_members_grp_fk
    foreign key(group_id) references group_mstr(group_id)
    ;
    create table group_admins (
    group_id number not null
    ,admin_code varchar2(30) not null
    ,active varchar2(1) default 'Y' not null
    )
    ;
    alter table group_admins  add constraint group_admins_pk
    primary key (group_id, admin_code)
    ;
    alter table add constraint group_admins_grp_fk
    foreign key(group_id) references group_mstr(group_id)
    ;
    
    

    Or, even better:

    create sequence group_id_s start with 1 increment by 1 nomaxvalue nocache
    ;
    create table group_mstr as (
    group_id number default group_id_s.nextval not null -- to be referred via FK
    ,group_cd varchar2(30) not null --- short code for group identification
    ,group_descript varchar2(4000)
    )
    ;
    alter table group_mstr add constraint group_mstr_pk primary key(group_id)
    ;
    alter table group_mstr add constraint group_mstr_cd_uq unique(group_cd)
    ;
    create table group_members (
    group_id number not null
    ,member_code varchar2(30) not null
    ,is_admin varchar2(30) default 'N' not null --- indicate whether the member of the group is admin or not
    ,active varchar2(1) default 'Y' not null
    )
    ;
    alter table group_members  add constraint group_members_pk
    primary key (group_id, member_code)
    ;
    alter table add constraint group_members_grp_fk
    foreign key(group_id) references group_mstr(group_id)
    ;
    


    BluShadow
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,246 Red Diamond
    edited Jul 27, 2022 9:42AM

    HI, @Rengudi

    Thanks for posting the sample data and desired results. Don't forget to explain (in general terms) how you get the desired results from the sample data.  Always post your complete Oracle version (e.g. 18.4.0.0.0).

    Relational databases like Oracle work best when each column of each table contains (at most) one piece of information, not a delimited list of several items. This is so basic to table design that it's called FIRST Normal Form. Not using First Normal Form is simply asking for complicated, inefficient, error-prone code; it is literally breaking the first rule of table design. The first step in any query using data that is not in First Normal Form will often be converting it to First Normal Form.

    BEDE
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,246 Red Diamond

    Hi, @Rengudi

    Here's an example of how you should give your requirements:

    1. In table1, group_name is unique; it contains a comma-delimited list of individual names. Spaces immediately before or after commas should be ignored.
    2. In table2, group name is also unique, but the name is misleading: it contains only individual names.
    3. The result set should contain one row for every row in table1, plus one row for every value of tabl2.group_name that is not part of any group_names in table1. (E.g., in this example, table2.group_names='delta' is not a comma-delimited part of any value in tabl1,group_names, so the result set contains a row with group_names='delta' and NULL group_members.)
    4. The group_owners column of the output should contain a comma-delimited list of all the group_owners values from table2 that match a substring of table1.group_names. These items can be in any order.

    If those happen to be your requirements, then here's one way to meet them:

    WITH  normalized_table1  AS
    (
      SELECT  t1.group_name
      ,       t1.group_members
      ,	  n1.single_name
      FROM  table1 t1
      CROSS APPLY (
      		  SELECT  TRIM  ( REGEXP_SUBSTR ( t1.group_name
    		  	  		 	 , '[^,]+'
    					 	 , 1
    					 	 , LEVEL
    						 )
    				 ) AS single_name
    		  FROM   dual
    		  CONNECT BY LEVEL <= REGEXP_COUNT ( t1.group_name
    		  	  		 	    , '[^,]+'
    						    )
      		)  n1
    )
    SELECT   COALESCE ( m.group_name
    	 	  , o.group_name
    		  )	AS group_name
    ,	 m.group_members
    ,	 LISTAGG (o.group_owners, ',')
    	      WITHIN GROUP (ORDER BY o.group_owners) AS group_owners
    FROM	  normalized_table1 m
    FULL JOIN table2	   o ON o.group_name = m.single_name
    GROUP BY COALESCE ( m.group_name
    	 	  , o.group_name
    		  )
    ,	 m.group_members
    ORDER BY group_name
    ;
    

    Notice that the sub-query normalized_table1 contains most of the code. None of that would be necessary if you stored the data in First Normal Form.

    The output from the given sample data is:

    GROUP_NAME        GROUP_MEMBERS GROUP_OWNERS
    ----------------- -------------- ------------------------------
    Alpha,beta, gamma con2,con3,con4 con1,cont2,con3,con4, con5
    delta             con6
    

    This is not quite what you posted: it contains 'cont2', as found in table1, not 'con2' as you posted. I'm guessing that was just a mistake in your message.

    Rengudi
  • BluShadow
    BluShadow Member, Moderator Posts: 42,138 Red Diamond
    edited Jul 27, 2022 11:16AM

    Assuming the tables were normalized properly, then the querying would be fairly trivial...

    with grps(id, nm) as (
           select 1, 'Alpha' from dual union all
           select 2, 'Beta' from dual union all
           select 3, 'Gamma' from dual union all
           select 4, 'Delta' from dual
           )
        ,con(id, nm) as (
           select 1, 'Con1' from dual union all
           select 2, 'Con2' from dual union all
           select 3, 'Con3' from dual union all
           select 4, 'Con4' from dual union all
           select 5, 'Con5' from dual union all
           select 6, 'Con6' from dual
           )
        ,members (grp_id, con_id) as (
           select 1, 2 from dual union all -- Alpha - Con2
           select 1, 3 from dual union all -- Alpha - Con3
           select 1, 4 from dual union all -- Alpha - Con4
           select 2, 2 from dual union all -- Beta - Con2
           select 2, 3 from dual union all -- Beta - Con3
           select 2, 4 from dual union all -- Beta - Con4
           select 3, 2 from dual union all -- Gamma - Con2
           select 3, 3 from dual union all -- Gamma - Con3
           select 3, 4 from dual  -- Gamma - Con4
           )
        ,owners (grp_id, con_id) as (
           select 1, 1 from dual union all -- Alpha - Con1
           select 1, 2 from dual union all -- Alpha - Con2
           select 2, 3 from dual union all -- Beta - Con3
           select 3, 4 from dual union all -- Gamma - Con4
           select 3, 5 from dual union all -- Gamme - Con5
           select 4, 6 from dual  -- Delta - Con6
           )
    -- end of example data that is properly normalized
        ,grp_mem as (
          -- group together members for each group
          select g.id
                ,g.nm
                ,listagg(distinct mc.nm,',') within group (order by mc.nm) as grp_members
          from   grps g
                 left outer join members m on (m.grp_id = g.id)
                 left outer join con mc on (mc.id = m.con_id)
          group by g.id, g.nm
         )
    -- now group together groups with the same group members to combine the owners
    select listagg(distinct g.nm,',') within group (order by g.nm) as grp_names
          ,g.grp_members
          ,listagg(distinct oc.nm,',') within group (order by oc.nm) as grp_owners
    from   grp_mem g
           left outer join owners o on (o.grp_id = g.id)
           left outer join con oc on (oc.id = o.con_id)
    group by g.grp_members
    order by 1
    /
    
    
    GRP_NAMES          GRP_MEMBERS                    GRP_OWNERS
    ------------------ ------------------------------ ------------------------------
    Alpha,Beta,Gamma   Con2,Con3,Con4                 Con1,Con2,Con3,Con4,Con5
    Delta                                             Con6
    
    
    


    Using one query to group together the distinct members for each group and then another query following that which groups together the groups that have identical members and gives the distinct owners across those groups at the same time.

    (edit: slight correction forgot to distinct the group names)