I am putting up a new server spec for a new project. I need help estimating processor power I need.
Here are, in short, the steps of the main process:
- Load the data:
Incoming file contains 160 mln rows. The SIZE of this file is 250 GB. The file has about 60 columns, out of that I upload 50 into my table.
Record length is 1500 bytes, in the table.
After the upload I validate EACH row in this table against 20 rules that are defined in the RULES table.
It is all math, like this column plus that column must be equal to this number, and validate the third column against this number.
- Output stage:
Records that pass validation go to one table, those that didn't go to the other table.
Then the first table (with good records) is extracted to a file. Those from the "wrong" table are also get inserted into "review" table
This whole process should take about 5-6 hrs.
This file contains 6 month of data.
Some times, I will run validation against 2 year worth of data, means validate two "annual" tables, each one is 500 GB in size. According to the requirements, this process should take no more than 1 day.
For this process I write hardware spec for a new server.
I have a hard time figuring out how much processor power I need for this process? to validate this much data in that much time.
Is there any formula or a guideline to tell what processor I need? I was looking at this one, but I wish I could estimate before I buy and try (and possibly fail, and have a biiiiig problems)
Intel Xeon E5-2640 (6 core, 2.50 GHz, 15MB, 95W)
Number of processors: 2