1 2 Previous Next 18 Replies Latest reply: Nov 20, 2012 6:53 PM by vansul RSS

    Deleting all duplicate rows from a table

    973608
      How can i Delete all duplicate rows from a table
        • 1. Re: Deleting all duplicate rows from a table
          Stew Ashton
          [EDIT: PLEASE SEE DISCUSSION BELOW, this solution is flawed and will only work properly if the whole table is locked first.]
          create table emp as select * from scott.emp;
          
          insert into emp select * from emp;
          
          insert into emp select * from emp;
          
          delete from emp where rowid in (
            select rid from (
              SELECT rowid rid, row_number() over(
                partition by EMPNO,
                ENAME     ,
                JOB       ,
                MGR       ,
                HIREDATE  ,
                SAL       ,
                COMM      ,
                DEPTNO order by null
              ) rn
              FROM EMP
            )
            where rn > 1
          );
          Edited by: Stew Ashton on Nov 20, 2012 11:26 PM
          • 2. Re: Deleting all duplicate rows from a table
            Hoek
            http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1224636375004
            and you can find more examples by doing a search on Tom's site or this forum.

            Side note:
            Make sure to read {message:id=9360002} before asking a new question.
            • 3. Re: Deleting all duplicate rows from a table
              ranit B
              Try this...
              delete from <your_table> xx
              where rowid > (select min(rowid) from <your_table> yy where xx.id = yy.id);
              NOTE - If you have a composite primary key, you have to provide all columns for join in the subquery.

              HTH
              Ranit B.
              • 4. Re: Deleting all duplicate rows from a table
                Stew Ashton
                http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1224636375004
                Warning to the OP: Tom's answer to the original question assumed that the compared columns were NOT NULL.

                The solution I proposed handles null values.
                • 5. Re: Deleting all duplicate rows from a table
                  Stew Ashton
                  ranit B wrote:
                  delete from <your_table> xx
                  where rowid > (select min(rowid) from <your_table> yy where xx.id = yy.id);
                  Won't that delete every row that has a ROWID greater than the one you found? Even if that row has a different ID?
                  [EDIT: of course not, as ranit explains below. Sorry...]

                  To the OP: if you have lots of duplicates and a big table, this alternative might be faster and create less UNDO & REDO:
                  create table new_emp as select distinct * from emp;
                  drop table emp;
                  alter table new_emp rename to emp;
                  Edited by: Stew Ashton on Nov 20, 2012 6:12 PM

                  Edited by: Stew Ashton on Nov 20, 2012 8:59 PM
                  • 6. Re: Deleting all duplicate rows from a table
                    ranit B
                    delete from <your_table> xx
                    where rowid > (select min(rowid) from <your_table> yy where xx.id = yy.id);
                    WHOA! won't that delete every row that has a ROWID greater than the one you found? Even if that row has a different ID?
                    No Stew. It won't delete. I guess you missed by given NOTE

                    We need to take all the primary key cols in the join.
                    • 7. Re: Deleting all duplicate rows from a table
                      user10857924
                      May below will help to understand how it will remove all rows EXCEPT the the row with min rowid


                      with t as 
                      (
                           select 1 rowid,0 id from dual union all
                           select 2 ,0 from dual union all
                           select 3 ,0 from dual union all
                           select 4 ,1 from dual union all
                           select 5 ,1 from dual union all
                           select 6 ,1 from dual union all
                           select 7 ,4 from dual union all
                           select 8 ,4 from dual union all
                           select 9 ,4 from dual union all
                           select 10,7 from dual union all
                           select 12,7 from dual union all
                           select 14,7 from dual union all
                           select 11,8 from dual union all
                           select 13,8 from dual union all
                           select 15,8 from dual          
                      
                      )
                      select  *
                      from t x
                      where rowid>(select min(rowid) from t y where x.id=y.id)
                      base on your approach you need to do something as below
                      delete
                      from t x
                      where (rowid, id) not in (select min(rowid), y.id from t y where x.id=y.id group by y.id )
                      HTH
                      • 8. Re: Deleting all duplicate rows from a table
                        Stew Ashton
                        You are right, my apologies. I missed the correlation between xx and yy. Note to self: never question without testing first!

                        I notice we have different assumptions on what "duplicate" means.

                        - You assume there are duplicates based on one or more columns that should be primary keys. In that case, there should be no NULL values at all, so the "=" comparison works.

                        - I assume the duplicates are based on all columns, or at least on some columns that may contain NULL values. If NULL values can occur, "=" cannot be used because equality comparisons don't work with NULL.
                        • 9. Re: Deleting all duplicate rows from a table
                          Sayan Malakshinov.
                          Stew Ashton wrote:
                          create table emp as select * from scott.emp;
                          
                          insert into emp select * from emp;
                          
                          insert into emp select * from emp;
                          
                          delete from emp where rowid in (
                          select rid from (
                          SELECT rowid rid, row_number() over(
                          partition by EMPNO,
                          ENAME     ,
                          JOB       ,
                          MGR       ,
                          HIREDATE  ,
                          SAL       ,
                          COMM      ,
                          DEPTNO order by null
                          ) rn
                          FROM EMP
                          )
                          where rn > 1
                          );
                          It can cause problems with "write consistancy"(update in another session+ mini-rollback)

                          Regards,
                          Sayan Malakshinov
                          http://orasql.org
                          • 10. Re: Deleting all duplicate rows from a table
                            Sayan Malakshinov.
                            First of all it is needed to lock table before deleting, because update in another session could cause problems with deleting wrong rows.

                            Regards,
                            Sayan Malakshinov
                            http://orasql.org
                            • 11. Re: Deleting all duplicate rows from a table
                              user10857924
                              You are right, my apologies. I missed the correlation between xx and yy
                              even though there is a coorelation it will delete all rows > min(rowid), pls see exaample above I just put dummy data and make rowid sequentail.

                              Thanks
                              • 12. Re: Deleting all duplicate rows from a table
                                Stew Ashton
                                I run your code and get
                                ORA-00923: FROM keyword not found where expected
                                00923. 00000 -  "FROM keyword not found where expected"
                                *Cause:    
                                *Action:
                                Error at Line: 3 Column: 11
                                • 13. Re: Deleting all duplicate rows from a table
                                  Stew Ashton
                                  xtender wrote:
                                  It can cause problems with "write consistancy"(update in another session+ mini-rollback)
                                  Any mass UPDATES or DELETES can have that problem. It is not specific to my query.

                                  [EDIT: I misunderstood xtender's point, see below.]

                                  Edited by: Stew Ashton on Nov 20, 2012 11:28 PM
                                  • 14. Re: Deleting all duplicate rows from a table
                                    ranit B
                                    with t as 
                                    (
                                         select 1 rowid,0 id from dual union all
                                         select 2 ,0 from dual union all
                                         select 3 ,0 from dual union all
                                         select 4 ,1 from dual union all
                                         select 5 ,1 from dual union all
                                         select 6 ,1 from dual union all
                                         select 7 ,4 from dual union all
                                         select 8 ,4 from dual union all
                                         select 9 ,4 from dual union all
                                         select 10,7 from dual union all
                                         select 12,7 from dual union all
                                         select 14,7 from dual union all
                                         select 11,8 from dual union all
                                         select 13,8 from dual union all
                                         select 15,8 from dual          
                                     
                                    )
                                    select  *
                                    from t x
                                    where rowid>(select min(rowid) from t y where x.id=y.id)
                                    Dude, Can u just tell me what is the Key over here to identify a row uniquely?

                                    Refer this -- http://www.toadworld.com/KNOWLEDGE/KnowledgeXpertforOracle/tabid/648/TopicID/TSQ416/Default.aspx
                                    1 2 Previous Next