3 Replies Latest reply: Jun 27, 2013 8:42 AM by 874273 RSS

    de-duplication

    874273

      Hi Masters,

       

      I have a master table which is contains duplicate records. I wanted to insert original record into one table, and remaining duplicate records into another table. I wanted to write a pl/sql block

      for this senerio. Please help me.

       

      create table sample(name varchar2(10),sal number(5));

       

      insert into sample  values('ABC',100);
      insert into sample  values('BCA',200);
      insert into sample  values('CAB',300);
      insert into sample  values('ABC',100);
      insert into sample  values('BCA',200);
      insert into sample  values('CAB',300);
      insert into sample  values('ABC',100);
      insert into sample  values('BCA',200);
      insert into sample  values('CAB',300);


      create table union_a (name varchar2(10),number(5));

      create table dup_b (name varchar2(10), number(5));

       

      please find above the table script and insert statements..I am using Oracle 11g(11.2.01). I wanted to insert one distinct record into table union_a .. duplicate records into dup_b table.

       

      Thanks in adv.

       

      Regards

      SA

       

       


        • 1. Re: de-duplication
          APC

          If the whole row is a duplicate what is the point of saving the duplicates?  Why not keep the distinct rows and just zap the clones?

           

          Alternate answer:

           

              create table tmp_sample as select distinct name, sal from sample

               /

               rename sample to old_sample

              /

              rename tmp_sample to sample

              /


          Obviously you'll need to transfer grants, constraints, etc but that will probably take way less time than any other approach.  Although, as you have duplicate rows you obviously haven't got many integrity constraints to worry about.  But now is a good time to put some on the cleaned table.


          Now, if your rows aren't complete duplicates but differ in certain respects, e.g. a timestamp, you'll need to use a sub-query to select the right rows on the basis of the relevant columns.   But you'll need to post more details if you can't figure that out.


          Cheers, APC





          • 2. Re: de-duplication
            Frank Kulash

            Hi,

             

            This sounds like a Top-N Query.

            Use the analytic ROW_NUMBER fucnton to assign numbers 1, 2, 3, ... to all rows, with a separate set of numebrs for each group

            Use a multi-table INSERT to put all the rows numbered 1 into one table, and all the other rows into the other table, like this:

             

            INSERT ALL

                WHEN  r_num = 1

                    THEN INTO union_a VALUES (name, sal)

                    ELSE INTO dup_b   VALUES (name, sal)

            SELECT  name, sal

            ,       ROW_NUMBER () OVER ( PARTITION BY  name, sal

                                         ORDER BY      NULL  -- or whatever

                                       )  AS r_num

            FROM    sample

            ;

             

            You must use an OREDER BY clause with ROW_NUMBER, even if you don;'t care which row is assigned 1.

            • 3. Re: de-duplication
              874273

              Thanks alot frank..!!!