8 Replies Latest reply: Jan 4, 2013 10:37 AM by 838196 RSS

    find duplication values from multiple columns in a big table

    601245
      Hi All,

      I am working on a 11gR2 database in linux. i want to display record that have duplicate values in 2 columns.

      1. Table Structure :-


      CREATE TABLE A
      (
      ID NUMBER(10),
      F_NAME VARCHAR2(100 BYTE),
      L_NAME VARCHAR2(100 BYTE)
      )

      2. Sample Data:-

      Insert into A
      (ID, F_NAME, L_NAME) Values (1,'TONY' ,'SUMIT');
      Insert into A
      (ID, F_NAME, L_NAME) Values (2,'SUMIT' ,'KEITH');
      Insert into A
      (ID, F_NAME, L_NAME) Values (3,'NORA','SMITH');
      Insert into A
      (ID, F_NAME, L_NAME) Values (4,'APRIL','TONY');
      Insert into A
      (ID, F_NAME, L_NAME) Values (5,'ROSS','TAM');


      ID F_NAME L_NAME
      -----------------------------------------------------------
      1 TONY SUMIT
      2 SUMIT KEITH
      3 NORA SMITH
      4 APRIL TONY
      5 ROSS TAM



      4. My requirement is i need display IDs that it's F_NAME or L_NAME has duplication in F_NAME or L_NAME columns.
      The result should be

      ID
      --------
      1 reason: F_NAME (TONY) equals to L_NAME of record 4, L_NAME (SUMIT) equals to F_NAME of record 2
      2 reason: F_NAME (SUMIT) equals to L_NAME of record 1
      4 reason: L_NAME (TONY) equals to F_NAME of record 1


      record 3, 5 aren't in the result because there is no duplication in F_NAME or L_NAME columns for NORA,SMITH, ROSS, TAM


      The table contains 10 million records, i really need to consider the performance.



      kindly suggest me the solution
        • 1. Re: find duplication values from multiple columns in a big table
          636309
          I haven't tested it, but how about something like this?

          select id
          from tableA a
          where exists (select 1 from tablea b where a.f_name = b.l_name)
          or exists (select 1 from tableB where a.l_name = b.f_name)
          • 2. Re: find duplication values from multiple columns in a big table
            982851
            select Distinct(A.id) from a A,a B where A.f_name=B.L_name or B.F_name=A.L_name;

            or

            select Distinct(B.id) from a A,a B where A.f_name=B.L_name or B.F_name=A.L_name;
            • 3. Re: find duplication values from multiple columns in a big table
              Gurjeet
              HI,

              select distinct a1.id ,a1.f_name,a1.l_name  from a a1, a a2 where a1.f_name = a2.l_name or a2.f_name = a1.l_name;



              or


              select distinct from (
              select distinct a1.id ,a1.f_name,a1.l_name from a a1, a a2 where a1.f_name = a2.l_name
              union all
              select distinct a1.id ,a1.f_name,a1.l_name from a a1, a a2 where a2.f_name = a1.l_name)

              Edited by: Mr. singh on Jan 4, 2013 11:43 AM
              • 4. Re: find duplication values from multiple columns in a big table
                MaheshKaila
                Trye this one
                select   *
                  from   A out_a
                 where   exists (select   'x'
                                   from   a in_a
                                  where   in_a.f_name = out_a.l_name or in_a.l_name = out_a.f_name)
                Regards,
                Mahesh Kaila
                • 5. Re: find duplication values from multiple columns in a big table
                  Manik
                  Note: Forum members please suggest better approach to this -- below.. convert into SQL :)

                  I know I will be opposed by many people in this forum for posting such in-efficient solution.
                  But trying to learn along with you.. its an interesting problem which must deal with all rows vs all rows to get all combinations.
                  But I am still thinking how to write it in SQL, probably will learn from this post after we receive some good SQL solution for the code what I am currently doing now.

                  step 1: created a table B similar to table A and added a column reason
                  CREATE TABLE B
                  (
                    ID      NUMBER(10),
                    F_NAME  VARCHAR2(100 BYTE),
                    L_NAME  VARCHAR2(100 BYTE),
                    REASON  VARCHAR2(1000 BYTE)  --- ADDED THIS
                  )
                  Definetely inefficient :(
                  BEGIN
                     FOR rec_outer IN (SELECT * FROM A) LOOP
                        FOR rec_inner IN (SELECT * FROM A) LOOP
                           IF (rec_outer.f_name = rec_inner.l_name) THEN
                              UPDATE B
                                 SET reason =
                                           rec_outer.id
                                        || ' reason: F_NAME ('
                                        || rec_outer.f_name
                                        || ') equals to L_NAME of record '
                                        || rec_inner.id
                               WHERE b.id = rec_outer.id;
                           END IF;
                        END LOOP;
                  
                        FOR rec_inner IN (SELECT * FROM A) LOOP
                           IF (rec_outer.l_name = rec_inner.f_name) THEN
                              UPDATE B
                                 SET reason =
                                        reason
                                        || CASE
                                              WHEN reason IS NULL THEN
                                                 rec_outer.id || ' reason: '
                                              ELSE
                                                 ','
                                           END
                                        || 'L_NAME ('
                                        || rec_inner.f_name
                                        || ') equals to F_NAME of record '
                                        || rec_inner.id
                               WHERE b.id = rec_outer.id;
                           END IF;
                        END LOOP;
                     END LOOP;
                  
                     COMMIT;
                  EXCEPTION
                     WHEN OTHERS THEN
                        rollback;
                        RAISE;
                  END;
                  OUTPUT:
                  ID     F_NAME     L_NAME     REASON
                  ------------------------------------------------------------
                  1     TONY     SUMIT     1 reason: F_NAME (TONY) equals to L_NAME of record 4,L_NAME (SUMIT) equals to F_NAME of record 2
                  2     SUMIT     KEITH     2 reason: F_NAME (SUMIT) equals to L_NAME of record 1
                  3     NORA     SMITH     
                  4     APRIL     TONY     4 reason: L_NAME (TONY) equals to F_NAME of record 1
                  5     ROSS     TAM     
                  Cheers,
                  Manik.

                  Edited : Added rollback
                  • 6. Re: find duplication values from multiple columns in a big table
                    Purvesh K
                    One way of doing so:
                    select id, listagg(res, ', ') within group (order by id) result
                      from (
                    select id, 'F_Name (' || f_name || ') matched for id ' || p_id res
                      from (
                            select distinct id, f_name, l_name, prior id p_id
                              from a
                            connect by nocycle f_name = prior l_name
                           )
                     where p_id is not null
                    union all
                    select id, 'L_Name (' || l_name || ') matched for id ' || p_id res
                      from (
                            select distinct id, f_name, l_name, prior id p_id
                              from a
                            connect by nocycle l_name = prior f_name
                           )
                     where p_id is not null
                    )
                    group by id;
                    ID RESULT 
                    - ----------------------------------------------------------------------------------------
                    1 F_Name (TONY) matched for id 4, L_Name (SUMIT) matched for id 2 
                    2 F_Name (SUMIT) matched for id 1 
                    4 L_Name (TONY) matched for id 1 
                    • 7. Re: find duplication values from multiple columns in a big table
                      Manik
                      aha nice :)

                      Cheers,
                      Manik.
                      • 8. Re: find duplication values from multiple columns in a big table
                        838196
                        Hi,
                        it is not exact the desired output, but probably the fastest solution.
                        SQL> select
                          2    iq2.id,
                          3    listagg( iq2.result, ' - ') within group( order by 1) as result
                          4  from
                          5    (
                          6      select
                          7        iq.id,
                          8        iq.name,
                          9        listagg( iq.id || '-' || type || '(' || name || ')', ' equals to ') within group( order by iq.id) over( partition by iq.name) as result,
                         10        count( distinct iq.id) over( partition by iq.name) as name_count
                         11      from
                         12        (
                         13          select id, f_name as name, 'f_name' as type from a
                         14          union all
                         15          select id, l_name, 'l_name' from a
                         16        ) iq
                         17    ) iq2
                         18  where
                         19    iq2.name_count > 1
                         20  group by
                         21    iq2.id;
                        
                                ID RESULT
                        ---------- ------------------------------------------------------------------------------------------------------------------------------------
                                 1 1-f_name(TONY) equals to 4-l_name(TONY) - 1-l_name(SUMIT) equals to 2-f_name(SUMIT)
                                 2 1-l_name(SUMIT) equals to 2-f_name(SUMIT)
                                 4 1-f_name(TONY) equals to 4-l_name(TONY)