On January 27th, this site will be read-only as we migrate to Oracle Forums for an improved community experience. You will not be able to initiate activity until January 30th, when you will be able to use this site as normal.

    Forum Stats

  • 3,889,532 Users
  • 2,269,755 Discussions
  • 7,916,773 Comments

Discussions

Best way for data archiving

4135353
4135353 Member Posts: 3
edited Jul 8, 2020 10:00AM in SQL & PL/SQL

Hi All,

Have one query. This was asked in one of the interview and I couldn't answer so well. Need your expertise in understanding this process.

What is the best way to do the archiving process for large set of data.

Ex : I have to move 50 Million rows from Table A and insert into Table B and then delete those records from Table A. Its a transaction table(Table A) where the DMLs are happening and meanwhile need to carry the archive process. And the interviewer said, during the insertion to table B, how we can avoid getting the same records updated (which are getting inserted to table B) .

Thank you.

Tagged:
Tubby4135353jaramill

Answers

  • EdStevens
    EdStevens Member Posts: 28,778 Gold Crown
    edited Jul 7, 2020 9:12AM
    4135353 wrote:Hi All,Have one query. This was asked in one of the interview and I couldn't answer so well. Need your expertise in understanding this process.What is the best way to do the archiving process for large set of data.Ex : I have to move 50 Million rows from Table A and insert into Table B and then delete those records from Table A. Its a transaction table(Table A) where the DMLs are happening and meanwhile need to carry the archive process. And the interviewer said, during the insertion to table B, how we can avoid getting the same records (which are getting inserted to table B) to get updated. Thank you.

    And how did you respond?

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,785 Bronze Crown
    edited Jul 7, 2020 9:29AM

    How do you know what the "cut off point" is?

    Seems like a badly proffered question with too little information to give a sensible answer. No wonder you "couldn't answer so well"...

    4135353
  • EdStevens
    EdStevens Member Posts: 28,778 Gold Crown
    edited Jul 7, 2020 9:36AM
    Gaz in Oz wrote:How do you know what the "cut off point" is?Seems like a badly proffered question with too little information to give a sensible answer. No wonder you "couldn't answer so well"...

    Perhaps that was the point ... to see how the candidate would respond to poorly stated "requirements".  To see if he would recognize them as such and what kind of questions he'd ask to clarify.  Perhaps even to see if he had the self-confidence to push back.

    But then again, it could be a clueless interviewer who himself is out of his depth.

    Tubby
  • EdStevens
    EdStevens Member Posts: 28,778 Gold Crown
    edited Jul 7, 2020 2:06PM

    One other thought about the interview question .....

    Perhaps everything was as it seemed, with the interviewer being reasonably technically competent, but simply hadn't thought through the question and thought it was reasonable on the face of it.  Still ...  that situation/possibility  reminded me of when I interviewed for my current job, eight years ago.

    Before I met with the hiring manager, the HR dept. sat me down for a written 'test' of oracle technical questions, mostly multiple-choice.  Several of the questions had 'issues' with none of the questions answers being accurate, or questions that appeared to be based on a misunderstanding of how things worked.  I had no problem at all providing 'write-in' answers in the margins, stating what was wrong with the question and the provided answers, and giving my own expanded answer.

    4135353jaramill
  • 4135353
    4135353 Member Posts: 3
    edited Jul 7, 2020 1:41PM

    Yes, as you guessed I couldn't answer well. I informed him that purging operations shouldn't be done during the peak time and its not very clear for me the scenario you have put in. I just explained the general approaches for purging like Partitioning, Bulk operations, dropping indexes and constraints to make the deletes faster.

  • Tubby
    Tubby Oracle Consultant Member Posts: 6,990 Bronze Crown
    edited Jul 7, 2020 2:01PM
    4135353 wrote:Hi All,Have one query. This was asked in one of the interview and I couldn't answer so well. Need your expertise in understanding this process.What is the best way to do the archiving process for large set of data.Ex : I have to move 50 Million rows from Table A and insert into Table B and then delete those records from Table A. Its a transaction table(Table A) where the DMLs are happening and meanwhile need to carry the archive process. And the interviewer said, during the insertion to table B, how we can avoid getting the same records updated (which are getting inserted to table B) .Thank you.

    When you have the chance to be interactive it never hurts to get clarification on the questions. For example in this case I would ask why we need to utilize a second table, I find that's typically "old school" thinking. Why not just have archived partitions and set them to read only for example?  The answer from the interviewer may be that you don't have licensing for partitioning, can't partition on the archive column(s), or maybe it will be that's a good idea, who knows?

    Cheers,

    4135353
  • Mark D Powell
    Mark D Powell Member Posts: 5,914 Blue Diamond
    edited Jul 7, 2020 2:55PM

    user4135353, there is no way I can know for sure but the questioner may have been looking for you to reply that you would use SELECT FOR UPDATE to lock the selected rows from DML activity other than by our session.  So open select for update cursor, insert the row into tableB, then delete where current of the select cursor.

    - -

    HTH -- Mark D Powell --

    4135353
  • L. Fernigrini
    L. Fernigrini Database Practice Lead Rosario, ArgentinaMember Posts: 4,196 Gold Crown
    edited Jul 8, 2020 9:37AM

    There is no simple and direct way to achieve that, as others have mentioned.

    I once had to implement a similar solution, but slightly more complex, with a master/detail couple of tables. We had to:

    1) Make users stop seeing data that was going to get purged.

    2) Insert the data in the archive DB in batches.

    3) Check the batches were all OK on the archive.

    4) Delete from the source tables.

    We finished doing this:

    1) Added a BatchID column to the tables

    2) Renamed the tables and create 2 views with the table name and a "WHERE" clause: WHERE BatchID is NULL

    3) Run a process that updated the BatchID using some conditions. We had to "remove" orders older than X months, except if they were on some special status or had some dependency. We had to order by a set of statuses and creation date (starting for from the oldest) then break the process in batches of 100K rows (customer decision, no real sense)

    4) Insert into the archive each batch

    5) Check everything was OK once all batches were processed

    6) delete from source tables where BatchID was not null

    7) Remove the column, drop the views and rename the tables.

    Pretty ugly and complex, but worked at the moment

  • EdStevens
    EdStevens Member Posts: 28,778 Gold Crown
    edited Jul 8, 2020 10:00AM

    But then there is always the possibility that the questioner did have a specific solution in mind, and would only accept that solution.  I call this the "guess what's in my pocket" type of question.  If I sensed that was the game being played, I'd have to consider if I really wanted to work for such a manager.