Forum Stats

  • 3,826,362 Users
  • 2,260,636 Discussions
  • 7,896,916 Comments

Discussions

delete duplicate rows without pseudo columns like rowid

ora_1978
ora_1978 Member Posts: 522 Bronze Badge
edited May 26, 2022 5:34PM in SQL & PL/SQL

create table dept (dno number, dname varchar2(100));

insert into dept values (10, 'IT');

insert into dept values (10, 'IT');

COMMIT;


AFTER DELETION UPON EXECUTING THE QUERY TEH RESULT SHOULD BE


SELECT * FROM DEPT;

10, IT

Tagged:
«1

Answers

  • mathguy
    mathguy Member Posts: 10,539 Blue Diamond

    This seems pretty trivial, did you try it and you ran into issues? Or did you just not know how to do it?

    delete from dept where rowid not in (select max(rowid) from dept group by dno, dname);
    
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,052 Red Diamond

    @ora_1978

    Why do you want to

    delete duplicate rows without pseudo columns like rowid

    ? What's wrong with ROWID?

    If you don't want to use ROWID, then use the primary key. If you don't have a primary key, why not? Create one.

  • Warren Tolentino
    Warren Tolentino Member Posts: 5,542 Silver Trophy

    these examples might give you some ideas.

    SQL> select * from employees;
    
    
    YEAR EM NAME       PO
    ---- -- ---------- --
    2001 02 Scott      91
    2001 02 Scott      01
    2001 02 Scott      07
    2001 03 Tom        81
    2001 03 Tom        84
    2001 03 Tom        87
    
    
    6 rows selected.
    
    
    SQL> select year, empcode, name, position,
      2         row_number() over (partition by year, empcode, name
      3                            order by year, empcode, name, position) as rn
      4    from employees;
    
    
    YEAR EM NAME       PO         RN
    ---- -- ---------- -- ----------
    2001 02 Scott      01          1
    2001 02 Scott      07          2
    2001 02 Scott      91          3
    2001 03 Tom        81          1
    2001 03 Tom        84          2
    2001 03 Tom        87          3
    
    
    6 rows selected.
    
    
    SQL> Select year, empcode, name, position
      2    From (Select year, empcode, name, position,
      3                 row_number() over (partition by year, empcode, name
      4                                    order by year, empcode, name, position) as rn
      5            From employees) emp
      6   Where rn = 1;
    
    
    YEAR EM NAME       PO
    ---- -- ---------- --
    2001 02 Scott      01
    2001 03 Tom        81
    
    
    SQL> Delete From employees
      2   Where rowid in (Select emp.rid
      3                     From (Select year, empcode, name, position,
      4                                  rowid as rid,
      5                                  row_number() over (partition by year, empcode, name
      6                                            order by year, empcode, name, position) as rn
      7                             From employees) emp
      8                    Where emp.rn > 1);
    
    
    4 rows deleted.
    
    
    SQL> select * from employees;
    
    
    YEAR EM NAME       PO
    ---- -- ---------- --
    2001 02 Scott      01
    2001 03 Tom        81
    
    
    SQL> 
    
    
    


  • ora_1978
    ora_1978 Member Posts: 522 Bronze Badge

    Today I attended an interview. In that they asked this question

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,052 Red Diamond
    edited May 26, 2022 6:34PM

    Hi, @ora_1978

    Today I attended an interview. In that they asked this question

    Then I'd say the correct answer is: "Why? What's wrong with ROWID?" They probably asked you the question to see if you were able to recognize and resist a stupid request.

  • ora_1978
    ora_1978 Member Posts: 522 Bronze Badge

    I answered. But they asked me not to use the pseudo column.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,974 Blue Diamond
    edited May 26, 2022 7:52PM

    If the interviewer wanted an answer that didn't involve rowid then they might have been waiting to see if you would come up with an analytic solution (e.g. a row_number() approach like the one above).

    They might also have been waiting for questions about what "duplicate" meant - duplicate keys, or duplicate of the the entire row, or duplicate of a subset of columns.

    Then there's the question of how much duplication - just a tiny fraction of the rows, or a massive proportion of the rows, many duplicates per "real" row for a small fraction of the rows, or a couple of duplicates for many rows - and would your strategy change depending on the pattern.


    Regards

    Jonathan Lewis

  • BluShadow
    BluShadow Member, Moderator Posts: 41,978 Red Diamond

    It's not good rejecting people's answers because they don't meet an interview questions requirements, when you didn't even mention it was an interview question in the first place. You should have made that clear.

    The point of interview questions is to see how you would tackle problems.

    So, the obvious way to deal with deletion of duplicates in a table is to use rowid as that uniquely identifies a row at a point in time. They've stipulated you're not to use rowid, probably because a common interview question is how to delete duplicates and the 'cheat sheet' way of doing that is to use rowid, but they want to see that you understand the issue rather than having just learn answers verbatim from the internet - there are many people who try and get jobs by learning answers without actually understanding. So, as an interviewer (and it depends on the level of job on offer too), I would expect a couple of possible trains of thought to come up...

    a) Deleting truly duplicate rows (all the columns are the same), requires a means to identify rows uniquely and then delete all but one of them. Without rowid, we could use a cursor update that uses a query to apply row_number() to each row within each duplicate group, and then if that row number is > 1 we UPDATE ... WHERE CURRENT OF... to set a specific column in the data to a specific value, then follow that cursor loop with a DELETE to remove all rows with that value. However, this will not be very performant on large sets of data due to the row-by-row nature and the two step UPDATE then DELETE methodology.

    b) Another option would be to query the distinct rows to another table, then either (1) drop the first table and rename the second table to the original name, or (2) truncate the data from the first table and re-insert the distinct data back in to it or (3) some variant of that that only determines the duplicates and stores those elsewhere, deletes those from the original table and re-inserts just 1 of each back in. This latter option has issues if there are triggers, indexes, constraints etc. on the original table as things can be 'lost' or 'broken'

    c) A final train of thought from it would also be to discuss why there are duplicates in the first place and, once fixed, what can be done to prevent further duplicates, such as instating a primary key.

    And that's the point of interviews... not to try and trick you to find a perfect solution when certain contraints are given, but to test your understanding of the original solution and the methodology behind other solutions and issues they would present. When you look at the amount of understanding of a person's knowledge that comes from the information given in just a, b and c above (and there are potentially other options) then the interviewer gets a better understanding of your technical abilities, and not just your ability to learn answers from interview 'cheat sheets'

    Jonathan Lewis
  • mathguy
    mathguy Member Posts: 10,539 Blue Diamond

    If the interviewer wanted an answer that didn't involve rowid then they might have been waiting to see if you would come up with an analytic solution (e.g. a row_number() approach like the one above).

    where "the one above" is a solution using analytic functions and rowid.

  • mathguy
    mathguy Member Posts: 10,539 Blue Diamond

    The only question I would ask the interviewer, if I was in your place, would be "why without using ROWID?" That makes absolutely no sense. If I didn't hear a satisfactory answer, I would thank them for their interest and I would move on. (That is what _ I _ would do; whether it would be appropriate for you, I have no idea.)