1 Reply Latest reply on Sep 19, 2013 4:11 PM by Charles Lamb

    How to calculate the maximum necessary filesystem space for a NoSQL Store v2.1.18 Node?




      how I can calculate the max. disk usage for a NoSQL Store Environment v2.1.18 Node?


      Example Figures for a sample calculation:


      3 Servers, Replication Factor 3

      Key Size 100 Byte - Value Size 3,900 Byte  = 4,000 Byte total 


      How large is the internal overhead of the Berkeley DB?


      In our first experiments, we obtain a ratio of 4 to 6 between the pure inserted data and the size of the db files of one RN!


      Max. gross size =  6*4,000 Byte = 24,000 Byte


      Is this a normal value? Did you also get such ratios? I measure the amount of data input and the size of the DB files on disk in one env directory.


      If I understand the Berkeley DB in the right way:


      • Each data entry / log entry  will be added to the database file, so the files will be grows for every operation
      • If the db file reaches x byte size, a new file will be written (je.log.fileMax)
      • If the db file filling degree is not under x%, the file will be not delete (je.cleaner.minFileUtilization)
      • The Cleaner work only, if there are more old, not cleaned files, then the newest one (je.cleaner.minAge)
      • The Cleaner works only if x bytes where entered to the db (je.cleaner.bytesInterval)
      • The Cleaner try to fill each file to the value of je.cleaner.minUtilization


      All the settings of the Je log cleaner values can be checked over the je.config.csv in the env directory.


      In our first tests it looks as if the files are only filled until the je.cleaner.minUtilization is reached?

      Is this possible?


      Can I set this values in the same way like in a pure replicated Berkeley DB environment?

      It works, but is this a good idea?



      But, how many disk space I need now for one year at the end?


      First Load to the Store Count of Records: 10,000,000

      New Records per Day                           :       10,000

      Changed Records per Day                    :         5,000

      Deleted Records   per Day                    :       10,000


      For one node of the store we need to store in the db files at the end per year = 19,100,000 Log entries and 10,000,000 Records.


      The delete ratio of the net data is fewer than 19% and the change Ratio is 9%  ( all together under 40%), so at the end the cleaner will not delete any db files because all files are balanced to je.cleaner.minUtilization and then no file can reach the je.cleaner.minFileUtilization?


      To calculate now the max. size for one year, I use this formula:


      Max Disk Size for one Server and for one year =  Replication factor * Max gross size of one record *  Count of operations for one year


      This figure looks a little bit high at the end.. >> 1,200 Gbyte of disk space for at the end 37GByte net data ?


      Can I calculate now my max. storage requirement in this way? 


      Or i have a flaw?


      Thanks for the help.