how I can calculate the max. disk usage for a NoSQL Store Environment v2.1.18 Node?
Example Figures for a sample calculation:
3 Servers, Replication Factor 3
Key Size 100 Byte - Value Size 3,900 Byte = 4,000 Byte total
How large is the internal overhead of the Berkeley DB?
In our first experiments, we obtain a ratio of 4 to 6 between the pure inserted data and the size of the db files of one RN!
Max. gross size = 6*4,000 Byte = 24,000 Byte
Is this a normal value? Did you also get such ratios? I measure the amount of data input and the size of the DB files on disk in one env directory.
If I understand the Berkeley DB in the right way:
All the settings of the Je log cleaner values can be checked over the je.config.csv in the env directory.
In our first tests it looks as if the files are only filled until the je.cleaner.minUtilization is reached?
Is this possible?
Can I set this values in the same way like in a pure replicated Berkeley DB environment?
It works, but is this a good idea?
But, how many disk space I need now for one year at the end?
First Load to the Store Count of Records: 10,000,000
New Records per Day : 10,000
Changed Records per Day : 5,000
Deleted Records per Day : 10,000
For one node of the store we need to store in the db files at the end per year = 19,100,000 Log entries and 10,000,000 Records.
The delete ratio of the net data is fewer than 19% and the change Ratio is 9% ( all together under 40%), so at the end the cleaner will not delete any db files because all files are balanced to je.cleaner.minUtilization and then no file can reach the je.cleaner.minFileUtilization?
To calculate now the max. size for one year, I use this formula:
Max Disk Size for one Server and for one year = Replication factor * Max gross size of one record * Count of operations for one year
This figure looks a little bit high at the end.. >> 1,200 Gbyte of disk space for at the end 37GByte net data ?
Can I calculate now my max. storage requirement in this way?
Or i have a flaw?
Thanks for the help.
As you know, NoSQL Database uses Berkeley DB Java Edition as the underlying storage engine so this is really a JE question. JE uses a log structured storage system and it is typical for there to be overheads of at least 2x.
You don't say what your heap and cache sizes are set to, but I have to wonder if that is part of the issue you are seeing. You should go through the sizing exercises described in Appendix B. Initial Capacity Planning.