how I can calculate the max. disk usage for a NoSQL Store Environment v2.1.18 Node?
Example Figures for a sample calculation:
3 Servers, Replication Factor 3
Key Size 100 Byte - Value Size 3,900 Byte = 4,000 Byte total
How large is the internal overhead of the Berkeley DB?
In our first experiments, we obtain a ratio of 4 to 6 between the pure inserted data and the size of the db files of one RN!
Max. gross size = 6*4,000 Byte = 24,000 Byte
Is this a normal value? Did you also get such ratios? I measure the amount of data input and the size of the DB files on disk in one env directory.
If I understand the Berkeley DB in the right way:
- Each data entry / log entry will be added to the database file, so the files will be grows for every operation
- If the db file reaches x byte size, a new file will be written (je.log.fileMax)
- If the db file filling degree is not under x%, the file will be not delete (je.cleaner.minFileUtilization)
- The Cleaner work only, if there are more old, not cleaned files, then the newest one (je.cleaner.minAge)
- The Cleaner works only if x bytes where entered to the db (je.cleaner.bytesInterval)
- The Cleaner try to fill each file to the value of je.cleaner.minUtilization
All the settings of the Je log cleaner values can be checked over the je.config.csv in the env directory.
In our first tests it looks as if the files are only filled until the je.cleaner.minUtilization is reached?
Is this possible?
Can I set this values in the same way like in a pure replicated Berkeley DB environment?
It works, but is this a good idea?
But, how many disk space I need now for one year at the end?
First Load to the Store Count of Records: 10,000,000
New Records per Day : 10,000
Changed Records per Day : 5,000
Deleted Records per Day : 10,000
For one node of the store we need to store in the db files at the end per year = 19,100,000 Log entries and 10,000,000 Records.
The delete ratio of the net data is fewer than 19% and the change Ratio is 9% ( all together under 40%), so at the end the cleaner will not delete any db files because all files are balanced to je.cleaner.minUtilization and then no file can reach the je.cleaner.minFileUtilization?
To calculate now the max. size for one year, I use this formula:
Max Disk Size for one Server and for one year = Replication factor * Max gross size of one record * Count of operations for one year
This figure looks a little bit high at the end.. >> 1,200 Gbyte of disk space for at the end 37GByte net data ?
Can I calculate now my max. storage requirement in this way?
Or i have a flaw?
Thanks for the help.