5 Replies Latest reply: Dec 28, 2012 2:26 PM by 963739 RSS

    Delete Dups based on multiple Columns

    963739
      Hi,

      Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

      I want to delete dups from a table based on 3 columns
      with sample_table as (
                            select '101' as ID1, '201' as ID2, '4' as weight  from dual union all
                            select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
                            select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                            select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                            select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                            select '110' as ID1, '215' as ID2, '7' as weight  from dual union all
                            select '110' as ID1, '215' as ID2, '8' as weight  from dual union all
                            select '110' as ID1, '215' as ID2, '9' as weight  from dual
                           )
                           select * from sample_table
      Based on ID1, ID2 and Weight....
      if weights are different for same id1 & id2 then delete Min(weight)
      if weights are same for same ID1 & Id2 keep one record and delete remaining

      Desired Output
      with sample_table as (
                            select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
                            select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                            select '110' as ID1, '215' as ID2, '9' as weight  from dual
                           )
                           select * from sample_table
        • 1. Re: Delete Dups based on multiple Columns
          ranit B
          Try use this...( tested on v11.2.0 )
          Ranit>> select * from sample_table;
          
          ID1 ID2 W                                                                       
          --- --- -                                                                       
          101 201 4                                                                       
          101 201 5                                                                       
          105 205 6                                                                       
          105 205 6                                                                       
          105 205 6                                                                       
          110 215 7                                                                       
          110 215 8                                                                       
          110 215 9                                                                       
          
          8 rows selected.
          
          Ranit>> DELETE from sample_table x1
            2  where x1.rowid < (select max(x2.rowid) from sample_table x2
            3  where x1.id1 = x2.id1 and x1.id2 = x2.id2
            4  );
          
          5 rows deleted.
          
          Ranit>> select * from sample_table;
          
          ID1 ID2 W                                                                       
          --- --- -                                                                       
          101 201 5                                                                       
          105 205 6                                                                       
          110 215 9                                                                       
          Hope this Helps.
          Ranit B.

          NOTE: Please mark the post answered if you find your appropriate solution
          • 2. Re: Delete Dups based on multiple Columns
            Solomon Yakobson
            ranit B wrote:
            Try use this...( tested on v11.2.0 )
            Requirement was to keep max(weight), not max(rowid). The only reason you got right result is rows were entered in weight order. So correct solution would be:
            delete from sample_table
              where rowid not in (
                                  select  max(rowid) keep(dense_rank last order by weight)
                                    from  sample_table
                                    group by id1,
                                             id2
                                 )
            /
            Just flip first two rows to verify:
            SQL> create table sample_table as
              2                        select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
              3                        select '101' as ID1, '201' as ID2, '4' as weight  from dual union all
              4                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
              5                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
              6                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
              7                        select '110' as ID1, '215' as ID2, '7' as weight  from dual union all
              8                        select '110' as ID1, '215' as ID2, '8' as weight  from dual union all
              9                        select '110' as ID1, '215' as ID2, '9' as weight  from dual
             10  /
            
            Table created.
            
            SQL> DELETE from sample_table x1
              2    where x1.rowid < (select max(x2.rowid) from sample_table x2
              3    where x1.id1 = x2.id1 and x1.id2 = x2.id2);
            
            5 rows deleted.
            
            SQL> select * from sample_table;
            
            ID1 ID2 W
            --- --- -
            101 201 4
            105 205 6
            110 215 9
            
            SQL> drop table sample_table purge
              2  /
            
            Table dropped.
            
            SQL> create table sample_table as
              2                        select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
              3                        select '101' as ID1, '201' as ID2, '4' as weight  from dual union all
              4                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
              5                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
              6                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
              7                        select '110' as ID1, '215' as ID2, '7' as weight  from dual union all
              8                        select '110' as ID1, '215' as ID2, '8' as weight  from dual union all
              9                        select '110' as ID1, '215' as ID2, '9' as weight  from dual
             10  /
            
            Table created.
            
            SQL> delete from sample_table
              2    where rowid not in (
              3                        select  max(rowid) keep(dense_rank last order by weight)
              4                          from  sample_table
              5                          group by id1,
              6                                   id2
              7                       )
              8  /
            
            5 rows deleted.
            
            SQL> select * from sample_table;
            
            ID1 ID2 W
            --- --- -
            101 201 5
            105 205 6
            110 215 9
            
            SQL> 
            SY.
            • 3. Re: Delete Dups based on multiple Columns
              963739
              Thanks Solomon,

              The query was right....If i use select Statement.....with Million of records then its taking almost 1hr to get the results...
              SELECT *   FROM   sample_table
               WHERE   ROWID NOT IN
                             (  SELECT   MAX (ROWID) KEEP (DENSE_RANK LAST ORDER BY weight)
                                  FROM   sample_table
                              GROUP BY   id1, id2)
              • 4. Re: Delete Dups based on multiple Columns
                Solomon Yakobson
                960736 wrote:
                If i use select Statement.....with Million of records then its taking almost 1hr to get the results...
                It might go much faster if you rename original table, create new table with original table name and structure: no indexes, constraints, triggers... Populate it with rows you want to keep. Then drop renamed table. Recretae indexes, constraints, triggers, etc. and recompile invalid objects. In any case, what you are doing isn't an everyday job but rather an exception done once, so 1 hour maybe not too bad comparing to amount of work you need to do to make this one time job to run faster.

                SY.
                • 5. Re: Delete Dups based on multiple Columns
                  963739
                  Yes, Now i am using with the original table...It has indexes on ID1 & Id2. Total records which i have was 400 Million and it has 30 partitions.

                  When i use partition by partition also to retrieve it, it took an hr - hr and half soo..... Anyway I need to try with parallel hint also...i haven't done that yet..

                  Any other possible way to tune it better way...(if possible)


                  Thanks
                  Karthik