This discussion is archived
5 Replies Latest reply: Dec 28, 2012 12:26 PM by 963739 RSS

Delete Dups based on multiple Columns

963739 Newbie
Currently Being Moderated
Hi,

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

I want to delete dups from a table based on 3 columns
with sample_table as (
                      select '101' as ID1, '201' as ID2, '4' as weight  from dual union all
                      select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
                      select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                      select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                      select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                      select '110' as ID1, '215' as ID2, '7' as weight  from dual union all
                      select '110' as ID1, '215' as ID2, '8' as weight  from dual union all
                      select '110' as ID1, '215' as ID2, '9' as weight  from dual
                     )
                     select * from sample_table
Based on ID1, ID2 and Weight....
if weights are different for same id1 & id2 then delete Min(weight)
if weights are same for same ID1 & Id2 keep one record and delete remaining

Desired Output
with sample_table as (
                      select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
                      select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
                      select '110' as ID1, '215' as ID2, '9' as weight  from dual
                     )
                     select * from sample_table
  • 1. Re: Delete Dups based on multiple Columns
    ranit B Expert
    Currently Being Moderated
    Try use this...( tested on v11.2.0 )
    Ranit>> select * from sample_table;
    
    ID1 ID2 W                                                                       
    --- --- -                                                                       
    101 201 4                                                                       
    101 201 5                                                                       
    105 205 6                                                                       
    105 205 6                                                                       
    105 205 6                                                                       
    110 215 7                                                                       
    110 215 8                                                                       
    110 215 9                                                                       
    
    8 rows selected.
    
    Ranit>> DELETE from sample_table x1
      2  where x1.rowid < (select max(x2.rowid) from sample_table x2
      3  where x1.id1 = x2.id1 and x1.id2 = x2.id2
      4  );
    
    5 rows deleted.
    
    Ranit>> select * from sample_table;
    
    ID1 ID2 W                                                                       
    --- --- -                                                                       
    101 201 5                                                                       
    105 205 6                                                                       
    110 215 9                                                                       
    Hope this Helps.
    Ranit B.

    NOTE: Please mark the post answered if you find your appropriate solution
  • 2. Re: Delete Dups based on multiple Columns
    Solomon Yakobson Guru
    Currently Being Moderated
    ranit B wrote:
    Try use this...( tested on v11.2.0 )
    Requirement was to keep max(weight), not max(rowid). The only reason you got right result is rows were entered in weight order. So correct solution would be:
    delete from sample_table
      where rowid not in (
                          select  max(rowid) keep(dense_rank last order by weight)
                            from  sample_table
                            group by id1,
                                     id2
                         )
    /
    Just flip first two rows to verify:
    SQL> create table sample_table as
      2                        select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
      3                        select '101' as ID1, '201' as ID2, '4' as weight  from dual union all
      4                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
      5                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
      6                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
      7                        select '110' as ID1, '215' as ID2, '7' as weight  from dual union all
      8                        select '110' as ID1, '215' as ID2, '8' as weight  from dual union all
      9                        select '110' as ID1, '215' as ID2, '9' as weight  from dual
     10  /
    
    Table created.
    
    SQL> DELETE from sample_table x1
      2    where x1.rowid < (select max(x2.rowid) from sample_table x2
      3    where x1.id1 = x2.id1 and x1.id2 = x2.id2);
    
    5 rows deleted.
    
    SQL> select * from sample_table;
    
    ID1 ID2 W
    --- --- -
    101 201 4
    105 205 6
    110 215 9
    
    SQL> drop table sample_table purge
      2  /
    
    Table dropped.
    
    SQL> create table sample_table as
      2                        select '101' as ID1, '201' as ID2, '5' as weight  from dual union all
      3                        select '101' as ID1, '201' as ID2, '4' as weight  from dual union all
      4                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
      5                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
      6                        select '105' as ID1, '205' as ID2, '6' as weight  from dual union all
      7                        select '110' as ID1, '215' as ID2, '7' as weight  from dual union all
      8                        select '110' as ID1, '215' as ID2, '8' as weight  from dual union all
      9                        select '110' as ID1, '215' as ID2, '9' as weight  from dual
     10  /
    
    Table created.
    
    SQL> delete from sample_table
      2    where rowid not in (
      3                        select  max(rowid) keep(dense_rank last order by weight)
      4                          from  sample_table
      5                          group by id1,
      6                                   id2
      7                       )
      8  /
    
    5 rows deleted.
    
    SQL> select * from sample_table;
    
    ID1 ID2 W
    --- --- -
    101 201 5
    105 205 6
    110 215 9
    
    SQL> 
    SY.
  • 3. Re: Delete Dups based on multiple Columns
    963739 Newbie
    Currently Being Moderated
    Thanks Solomon,

    The query was right....If i use select Statement.....with Million of records then its taking almost 1hr to get the results...
    SELECT *   FROM   sample_table
     WHERE   ROWID NOT IN
                   (  SELECT   MAX (ROWID) KEEP (DENSE_RANK LAST ORDER BY weight)
                        FROM   sample_table
                    GROUP BY   id1, id2)
  • 4. Re: Delete Dups based on multiple Columns
    Solomon Yakobson Guru
    Currently Being Moderated
    960736 wrote:
    If i use select Statement.....with Million of records then its taking almost 1hr to get the results...
    It might go much faster if you rename original table, create new table with original table name and structure: no indexes, constraints, triggers... Populate it with rows you want to keep. Then drop renamed table. Recretae indexes, constraints, triggers, etc. and recompile invalid objects. In any case, what you are doing isn't an everyday job but rather an exception done once, so 1 hour maybe not too bad comparing to amount of work you need to do to make this one time job to run faster.

    SY.
  • 5. Re: Delete Dups based on multiple Columns
    963739 Newbie
    Currently Being Moderated
    Yes, Now i am using with the original table...It has indexes on ID1 & Id2. Total records which i have was 400 Million and it has 30 partitions.

    When i use partition by partition also to retrieve it, it took an hr - hr and half soo..... Anyway I need to try with parallel hint also...i haven't done that yet..

    Any other possible way to tune it better way...(if possible)


    Thanks
    Karthik

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points