This discussion is archived
3 Replies Latest reply: Jun 27, 2013 6:42 AM by 874273 RSS

de-duplication

874273 Newbie
Currently Being Moderated

Hi Masters,

 

I have a master table which is contains duplicate records. I wanted to insert original record into one table, and remaining duplicate records into another table. I wanted to write a pl/sql block

for this senerio. Please help me.

 

create table sample(name varchar2(10),sal number(5));

 

insert into sample  values('ABC',100);
insert into sample  values('BCA',200);
insert into sample  values('CAB',300);
insert into sample  values('ABC',100);
insert into sample  values('BCA',200);
insert into sample  values('CAB',300);
insert into sample  values('ABC',100);
insert into sample  values('BCA',200);
insert into sample  values('CAB',300);


create table union_a (name varchar2(10),number(5));

create table dup_b (name varchar2(10), number(5));

 

please find above the table script and insert statements..I am using Oracle 11g(11.2.01). I wanted to insert one distinct record into table union_a .. duplicate records into dup_b table.

 

Thanks in adv.

 

Regards

SA

 

 


  • 1. Re: de-duplication
    APC Oracle ACE
    Currently Being Moderated

    If the whole row is a duplicate what is the point of saving the duplicates?  Why not keep the distinct rows and just zap the clones?

     

    Alternate answer:

     

        create table tmp_sample as select distinct name, sal from sample

         /

         rename sample to old_sample

        /

        rename tmp_sample to sample

        /


    Obviously you'll need to transfer grants, constraints, etc but that will probably take way less time than any other approach.  Although, as you have duplicate rows you obviously haven't got many integrity constraints to worry about.  But now is a good time to put some on the cleaned table.


    Now, if your rows aren't complete duplicates but differ in certain respects, e.g. a timestamp, you'll need to use a sub-query to select the right rows on the basis of the relevant columns.   But you'll need to post more details if you can't figure that out.


    Cheers, APC





  • 2. Re: de-duplication
    Frank Kulash Guru
    Currently Being Moderated

    Hi,

     

    This sounds like a Top-N Query.

    Use the analytic ROW_NUMBER fucnton to assign numbers 1, 2, 3, ... to all rows, with a separate set of numebrs for each group

    Use a multi-table INSERT to put all the rows numbered 1 into one table, and all the other rows into the other table, like this:

     

    INSERT ALL

        WHEN  r_num = 1

            THEN INTO union_a VALUES (name, sal)

            ELSE INTO dup_b   VALUES (name, sal)

    SELECT  name, sal

    ,       ROW_NUMBER () OVER ( PARTITION BY  name, sal

                                 ORDER BY      NULL  -- or whatever

                               )  AS r_num

    FROM    sample

    ;

     

    You must use an OREDER BY clause with ROW_NUMBER, even if you don;'t care which row is assigned 1.

  • 3. Re: de-duplication
    874273 Newbie
    Currently Being Moderated

    Thanks alot frank..!!!

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points