4. The type of disk storage you are using for the filesystem containing the database.
We are not using any external storage. we are using linux ext3 filesystem.
CJ>> I wanted to know the kind of disk drive (type, speed). Is it flash? 7200 RPM disk? 10,000 rpm disk? 15,000 rpm disk?
6. The number of rows of data in the CSV file.
My comment - in per CSV file there is around 50,000 Records.
CJ>> Timing 50,000 inserts is not going to give a very accurate number. You should ideally time far more.
7. Originally you said 'I am only getting 15000 to 17000 TPS'. How are you measuring this? Do you TPS (i.e. commits per second) or 'rows inserted per second'? Note that by default ttBulkCp commits every 1024 rows so if you are measuring commits then the insert rate is 1024 x that.
My comment- Now I have set timing on at bash prompt. lets say when i have run command ./ttbulkcp at that time i have note down the timing. now when the command complete , i am again note down the time. and then i am calculating the TPS. further in this, i have one file with ttbulkcp . I am having 50000 records in file. and out of these records around 38000 records gets sucsucced. and thus i am calculating TPS.
CJ>> Okay, I see the problem(s). (1) handling errors is much slower than inserting a row so timing a load where 12,000 records have errors is not useful. You need to time a load where all records succeed otherwise you are not measuring TimesTen insert speed but instead thou are measuring how fast ttBulkCp can report and log errors. (2) By default the TimesTen database is loaded into memory from disk when an application connects, remains in memory while something is connected and is then unloaded from memory when the last connections closes. The process of loading the database into and out of memory can be quite time consuming especially for a large database. So your test is including this time in the overall time and so is giving a much longer time than just that to do the inserts. Please use manual ramPolicy and load the database explicitly:
ttAdmin -ramPolicy manual yourDSN
ttAdmin -ramLoad yourDSN
Run your load test (with no error records).
Let me know how it turns out. I think it will be somewhat faster now...
4) 15,000 rpm SAS disk.is there.
I am agree with you some where i feel that i am monitoring output of ./ttbulkcp command only. If I am loading multiple csv files then also my reading is like around ( 2454436 records are loaded in 116 seconds means per second, 12301 records are loaded from CSV to timesten). can you suggest the other possible way how i can measure the insertion time?
For your information i am taking reading by loading data i have taken result on 50 million records as well. I have try below commands
ttAdmin -ramPolicy manual yourDSN
ttAdmin -ramLoad yourDSN
The result of avobe command for ram load and unload were taking around 2 min.
guide me how can i take this further
So when you have already loaded the database into memory using ttAdmin -ramLoad, how long does the ttBulkCp command take then compared to previously? If ttAdmin -ramLoad takes 2 minutes then 2 minutes (120 seconds) of your original ttBlukCp time was actually spent loading the database into memory and not doing inserts. Having said that, this should not be the case if you have Temporary=1. In this case database creation (which will occur on every load since the DB is temporary) should be fairly fast. Also, you cannot use manual ramPolicy with a Temporary database so that also does not make sense...
My recommendation is to:
1. Use a normal, permanent database (Temporary=0), with manual ramPolicy and manually load the database into memory prior to running any insert tests.
2. If you do that then it is okay to time the execution of the ttBulkCp command since for a database that is already in memory connect/disconnect is fast.
3. If you want to get an accurate time for TimesTen insertion rates do not use load files that have many records with errors.
4. Note that timing 'insert rate' using ttBulkcp is not just timing the rate at which you can insert data into Timesten. ttBulkCp has to read the records from the file, parse them, validate them and only then does it do the insert. If you want to investigate pure 'insert rate' then you should take the bulkinsert.c demo program (in the quickstart/sample_code/odbc directory) as a starting point and modify it to meet your requirements. This will give a more accurate measure of pure insert performance.
A couple more points:
1. I'm not sure what you mean by turning on timing at the bash prompt but the most accurate way to time things is to use the 'time' command:
time ttbulkcp ........
2. Based on the table definition, a row in this table could be anything from ~175 bytes up to ~27,000 bytes. What size are the actual rows? i.e. what amount of data is actually being stored into all the VARCHAR2 columns when you do the load? There is a huge difference between loading 12,000 175 byte rows per second and loading 12,000 27 KB rows per second...
the rowsize is about 915 KB.
One row is on average 915 KB? That's not possible based on the table definition you posted earlier. The largest size for any one row is around 27 KB. Could you please clarify...
Chris, sorry for above reply.
correcting average row size is around 4Kb.
I have taken another reading by uploading around 10 Milion records. which took around 10 Min for insert. and TPS around 25K. I am still not achieving TPS in timesten.
3340678 rows inserted
2356538 rows not inserted due to errors
5697216 rows read from 98 files
as per your guidance i have tried time ./ttbulkcp and it will give me result like this.
We have created sample java batch program and we have tested bulk insertion we can achived aroud 60000 tps. I would like to thank you for your support.