Ashok_Ora wrote:Thank YOU, my first post was indeed a bit confusing, at least about the reads on DB2.
Great -- thanks for the clarification.
Ashok_Ora wrote:That's right, this is quite a huge lot of data, and will keep growing, and growing... Although the main goal of the app is to be able to achieve (almost) real time reporting, it will also need to be able (potentially) to compute data over different time ranges, including yearly ranges for instance - but in this case, the real time capabilities wouldn't be relevant, I guess: if you look at some data on a year span, you probably don't need it to be accurate on a dayly interval, for instance (well, I guess), so this part of the app would probably only use the "very old" data (not the current day data), whatever it is stored in...
Based on this information, it appears that you're generating about 12 GB/day into DB2, which is about a terabyte of data every 3 months. Here are some things to consider for ad-hoc querying of about 1 TB of data (which is not a small amount of data).
Ashok_Ora wrote:Thanks for pointing that out! I did envisage using indexes, but my concern was (and you guessed it) the expectable overhead that it brings. At this stage (but I may be wrong, this is just a study in progress, that will also need proper tests and benchmarking), I plan to favour write speed over anything else, to insure that all the incoming data is indeed stored, even if it is quite tough to handle in the primary stored form.
Query performance is dramatically improved by using indexes. On the other hand, indexing data during the insert operation is going to add some overhead to the insert - this will vary depending on how many fields you want to index (how many secondary indices you want to create). BDB automatically indexes the primary key. Generally, any approach that you consider for satisfying the reporting requirement will benefit from indexing the data.>
Ashok_Ora wrote:This is pretty much "in line" with what I had in mind when posting my question here :).
Here are some alternatives to consider, for the reporting application:
- Move the data to another system like MongoDB or CouchDB as you suggest and run the queries there. The obvious cost is the movement of data and maintaining two different repositories. You can implement the data movement in the way I suggested earlier (close "old" and open "new" periodically).
Ashok_Ora wrote:I read a bit about it, and this is indeed very interesting capabilities - especially as I know how to write decent SQL statements.
- Use BDB's SQL API to insert and read data in DB1 and DB2. You should be able to run ad-hoc queries using SQL. After doing some experiments, you might decide to add a few indices to the system. This approach eliminates the need to move the data and maintaining separate repositories. It's simpler.
Ashok_Ora wrote:This is VERY interesting. VERY.
- Use the Oracle external table mechanism (Overview and how-to - http://docs.oracle.com/cd/B28359_01/server.111/b28319/et_concepts.htm) to query the data from Oracle database. Again, you don't need to move the data. You won't be able to create indices on the external tables. If you do want to move data from the BDB repository into Oracle DB, you can run a "insert into <oracle_table> select * from <external_table_in_DB2>;". As you know, Oracle database is excellent database for all sorts of applications, including complex reporting applications.
Ashok_Ora wrote:It definitely is so much useful! Makes things clearer and allow me to get more into BDB (and Oracle as well with your latest reply), and that's much appreciated. :)
Hope this was helpful. Let me know your thoughts.
Ashok_Ora wrote:Of course, I owe you a bit more details as you help me a lot in my research/study :)
Just out of curiousity, can you tell us some additional details about your application?
Ashok_Ora wrote:Many thanks again, Ashok!
Thanks and warm regards.