Could anyone answer my question ? Thanks
Seriously ?!? You might as well ask "How long is a piece of string?"
So, if you want a serious answer you need to think a bit more about what you are asking.
Question: Where does Coherence store its data (by default). Answer: In memory
Given what we now know what will determine the amount of data - yes, you guessed it - the JVM heap size.
So, how big is the heap that you are going to run with your 64 bit JVM?
Even now, asking how big the heap is can be an open ended question. A 64 bit JVM can have very big heaps so you could put a lot of data in a single JVM, but that might not be the best solution for your application and use case. You need to tune you heap sizes for your given appication and the GC times you can put up with.
As a typical rule of thumb to start with you can estimate that the data you can store in a cluster is 1/3 of the total heap size of all the storage enabled JVMs in the cluster. So if you had a cluster of 100 storage enabled members each with a heap of 4GB that would be 400 GB of heap so 400/3 = 133 GB of storage for data. You can then adjust these figures higher if you have no indexes, lower if you have a lot of indexes. You might also need to be a little lower if you need a lot of free heap in the cluster for processing - i.e. you do a lot of filters, entry processors, aggregators etc that do a lot of deserialization of data.
Mostly I have found Coherence to be better on more members with smaller heaps than a few members with massive heaps but as i said, it depends on your data access use cases.
JK
Hi JK,
This is the key information from your answer : "the data you can store in a cluster is 1/3 of the total heap size of all the storage enabled JVMs in the cluster"
I suppose this 1/3 theory is correct and proven ?
Thank you so much, always !
regards,
Xuebin
Hi Xuebin
As I said, the 1/3 heap is a good starting point for an application. Coherence, just like a database or any other data storage system, is a finite resource, you cannot just keep putting data into it forever. When you start a project using Coherence you need to do some sort of capacity planning to work out how big the cluster needs to be. Once you have the estimate then as part of the project you need to run some performance testing and load testing before you go live to see if your estimate is correct. For example, the system I work on for my current client holds less then 1/3 data, but it has a lot of indexes (possibly too many, but that is another story).
JK
A 64-bit JVM can store up to 2^64 bytes (or 16 exabytes). If you follow the 1/3 rule - you'd be limited to storing 5 1/3 exabytes of data on a single node, so you would have to start to scale horizontally if you needed more than 5 1/3 exabytes of data.
GC tuning for a 16 exabyte heap would likely be difficult. If you can afford hardware with that much RAM though, you can probably find an outside consultant willing to help you tune it.
thank you for your replies
Hey JK/Xuebin,
I find the 1/4 or even 1/5th rule is usually more accurate. It may sound pessimistic, but when you consider you never want your grid reaching 80% capacity...