Discussions
Categories
- 385.5K All Categories
- 5.1K Data
- 2.5K Big Data Appliance
- 2.5K Data Science
- 453.4K Databases
- 223.2K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 47 Multilingual Engine
- 606 MySQL Community Space
- 486 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.2K ORDS, SODA & JSON in the Database
- 584 SQLcl
- 4K SQL Developer Data Modeler
- 188K SQL & PL/SQL
- 21.5K SQL Developer
- 45 Data Integration
- 45 GoldenGate
- 298.4K Development
- 4 Application Development
- 20 Developer Projects
- 166 Programming Languages
- 295K Development Tools
- 150 DevOps
- 3.1K QA/Testing
- 646.7K Java
- 37 Java Learning Subscription
- 37.1K Database Connectivity
- 201 Java Community Process
- 108 Java 25
- 22.2K Java APIs
- 138.3K Java Development Tools
- 165.4K Java EE (Java Enterprise Edition)
- 22 Java Essentials
- 176 Java 8 Questions
- 86K Java Programming
- 82 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 208 Java User Groups
- 25 JavaScript - Nashorn
- Programs
- 666 LiveLabs
- 41 Workshops
- 10.3K Software
- 6.7K Berkeley DB Family
- 3.6K JHeadstart
- 6K Other Languages
- 2.3K Chinese
- 207 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 474 Portuguese
Best way for data archiving

Hi All,
Have one query. This was asked in one of the interview and I couldn't answer so well. Need your expertise in understanding this process.
What is the best way to do the archiving process for large set of data.
Ex : I have to move 50 Million rows from Table A and insert into Table B and then delete those records from Table A. Its a transaction table(Table A) where the DMLs are happening and meanwhile need to carry the archive process. And the interviewer said, during the insertion to table B, how we can avoid getting the same records updated (which are getting inserted to table .
Thank you.
Answers
-
4135353 wrote:Hi All,Have one query. This was asked in one of the interview and I couldn't answer so well. Need your expertise in understanding this process.What is the best way to do the archiving process for large set of data.Ex : I have to move 50 Million rows from Table A and insert into Table B and then delete those records from Table A. Its a transaction table(Table A) where the DMLs are happening and meanwhile need to carry the archive process. And the interviewer said, during the insertion to table B, how we can avoid getting the same records (which are getting inserted to table
to get updated. Thank you.
And how did you respond?
-
How do you know what the "cut off point" is?
Seems like a badly proffered question with too little information to give a sensible answer. No wonder you "couldn't answer so well"...
-
Gaz in Oz wrote:How do you know what the "cut off point" is?Seems like a badly proffered question with too little information to give a sensible answer. No wonder you "couldn't answer so well"...
Perhaps that was the point ... to see how the candidate would respond to poorly stated "requirements". To see if he would recognize them as such and what kind of questions he'd ask to clarify. Perhaps even to see if he had the self-confidence to push back.
But then again, it could be a clueless interviewer who himself is out of his depth.
-
One other thought about the interview question .....
Perhaps everything was as it seemed, with the interviewer being reasonably technically competent, but simply hadn't thought through the question and thought it was reasonable on the face of it. Still ... that situation/possibility reminded me of when I interviewed for my current job, eight years ago.
Before I met with the hiring manager, the HR dept. sat me down for a written 'test' of oracle technical questions, mostly multiple-choice. Several of the questions had 'issues' with none of the questions answers being accurate, or questions that appeared to be based on a misunderstanding of how things worked. I had no problem at all providing 'write-in' answers in the margins, stating what was wrong with the question and the provided answers, and giving my own expanded answer.
-
Yes, as you guessed I couldn't answer well. I informed him that purging operations shouldn't be done during the peak time and its not very clear for me the scenario you have put in. I just explained the general approaches for purging like Partitioning, Bulk operations, dropping indexes and constraints to make the deletes faster.
-
4135353 wrote:Hi All,Have one query. This was asked in one of the interview and I couldn't answer so well. Need your expertise in understanding this process.What is the best way to do the archiving process for large set of data.Ex : I have to move 50 Million rows from Table A and insert into Table B and then delete those records from Table A. Its a transaction table(Table A) where the DMLs are happening and meanwhile need to carry the archive process. And the interviewer said, during the insertion to table B, how we can avoid getting the same records updated (which are getting inserted to table
.Thank you.
When you have the chance to be interactive it never hurts to get clarification on the questions. For example in this case I would ask why we need to utilize a second table, I find that's typically "old school" thinking. Why not just have archived partitions and set them to read only for example? The answer from the interviewer may be that you don't have licensing for partitioning, can't partition on the archive column(s), or maybe it will be that's a good idea, who knows?
Cheers,
-
user4135353, there is no way I can know for sure but the questioner may have been looking for you to reply that you would use SELECT FOR UPDATE to lock the selected rows from DML activity other than by our session. So open select for update cursor, insert the row into tableB, then delete where current of the select cursor.
- -
HTH -- Mark D Powell --
-
There is no simple and direct way to achieve that, as others have mentioned.
I once had to implement a similar solution, but slightly more complex, with a master/detail couple of tables. We had to:
1) Make users stop seeing data that was going to get purged.
2) Insert the data in the archive DB in batches.
3) Check the batches were all OK on the archive.
4) Delete from the source tables.
We finished doing this:
1) Added a BatchID column to the tables
2) Renamed the tables and create 2 views with the table name and a "WHERE" clause: WHERE BatchID is NULL
3) Run a process that updated the BatchID using some conditions. We had to "remove" orders older than X months, except if they were on some special status or had some dependency. We had to order by a set of statuses and creation date (starting for from the oldest) then break the process in batches of 100K rows (customer decision, no real sense)
4) Insert into the archive each batch
5) Check everything was OK once all batches were processed
6) delete from source tables where BatchID was not null
7) Remove the column, drop the views and rename the tables.
Pretty ugly and complex, but worked at the moment
-
But then there is always the possibility that the questioner did have a specific solution in mind, and would only accept that solution. I call this the "guess what's in my pocket" type of question. If I sensed that was the game being played, I'd have to consider if I really wanted to work for such a manager.