1 Reply Latest reply: Feb 7, 2013 2:41 PM by Greybird-Oracle RSS

    Read optimization time-series data

    989556
      I am using Berkeley DB JE to store fairly high frequency (10hz) time-series data collected from ~80 sensors. The idea is to import a large number of csv files with this data, and allow quick access to time ranges of data to plot with a web front end. I have created a "sample" entity to hold these sampled metrics, indexed by the time stamp. My entity looks like this.

      @Entity
      public class Sample {

           // Unix time; seconds since Unix epoch
           @PrimaryKey
           private double time;
           
           private Map<String, Double> metricMap = new LinkedHashMap<String, Double>();
      ////////////////////////////////////////////////////////////////////////////

      as you can see, there is quite a large amount of data for each entity (~70 - 80 doubles), and I'm not sure storing them in this way is best. This is my first question.

      I am accessing the db from a web front end. I am not too worried about insertion performance, as this doesn't happen that often, and generally all at one time in bulk. For smaller ranges (~1-2 hr worth of samples) the read performance is decent enough for web calls. For larger ranges, the read operations take quite a while. What would be the best approach for configuring this application?

      Also, I want to define granularity of samples. Basically, If the number of samples returned by a query is very large, I want to only return a fraction of the samples. Is there an easy way to count the number of entities that will be iterated over with a cursor without actually iterating over them?

      Here are my current configuration params.

      environmentConfig.setAllowCreateVoid(true);
                environmentConfig.setTransactionalVoid(true);
                environmentConfig.setTxnNoSyncVoid(true);
                environmentConfig.setCacheModeVoid(CacheMode.EVICT_LN);
                environmentConfig.setCacheSizeVoid(1000000000);
                
                databaseConfig.setAllowCreateVoid(true);
                databaseConfig.setTransactionalVoid(true);
                databaseConfig.setCacheModeVoid(CacheMode.EVICT_LN);
        • 1. Re: Read optimization time-series data
          Greybird-Oracle
          Hi Ben, sorry for the slow response.
          as you can see, there is quite a large amount of data for each entity (~70 - 80 doubles), and I'm not sure storing them in this way is best. This is my first question.
          That doesn't sound like a large record, so I don't see a problem. If the map keys are repeated in each record, that's wasted space that you might want to store differently.
          For larger ranges, the read operations take quite a while. What would be the best approach for configuring this application?
          What isolation level do you require? Do you need the keys and the data? If the amount you're reading is a significant portion of the index, have you looked at using DiskOrderedCursor?
          Also, I want to define granularity of samples. Basically, If the number of samples returned by a query is very large, I want to only return a fraction of the samples. Is there an easy way to count the number of entities that will be iterated over with a cursor without actually iterating over them?
          Not currently. Using the DPL, reading with a key-only cursor is the best available option. If you want to drop down to the base API, you can use Cursor.skipNext and skipPrev, which are further optimized.
          environmentConfig.setAllowCreateVoid(true);
          Please use the method names without the Void suffix -- those are just for bean editors.

          --mark