I have to improve a legacy "batch type" application which process 400,000 records from a database each time it runs. The process is painfully slow due to the validation which must occur on each record prior to a task being executed (roughly .5 seconds). This is a simplistic explanation of the app.
I would like to update the program to use multithreading in hopes of reducing this processing time, I am wondering what the best way to do this would be? Currently all 400k records are grabbed from the database at one time and put into a List, then the List is itterated through, grabbing each object and processing it one at a time. It has been suggested that we add a flag in the database to show which records are being currently "processed" and have several threads grab from the DB, but this just "feels wrong" to me (Not that grabbing all 400k at once "feels right"!).
Looking to learn more about multi-threading and how I may be able to make this app better in the process.
Thanks for the help
1. Start several threads to do the retrievals and the validations, but instead of adding a database flag just set different values for the first row and maximum number of rows, so they each get a distinct set of rows.
2. Do the select in a single thread and then partition the result set, giving a sublist to each of N new threads.
Thanks, I had thought about the second option (retrieve all records than divide them up and toss them to different threads for processing) but didn't like the "feeling" of grabbing all 400k and parsing out that List. But the first option you mention seems like a no-brainer, not sure why I didn't think of that (well, the reason is likely experience!).
Thanks for the help, I think #1 is a great choice
This is how I'd approach it:
1. A single thread DB select, iterate through the result set, and put each record into a BlockingQueue.
2. N threads that are the consumers of the queue to do the work.
3. The queue should bounded, so you don't produce work too much faster than can be consumed. The producer will bock appropriately and wait for consumers.
See, this is why I ask these questions. I have never come across the BlockingQueue, just looked at the API, this is very interesting. Producer/Consmer seems to be exact the type of scenario I described... Thanks for the help!