This content has been marked as final. Show 2 replies
I think you are saying you have four distinct NoSQL deployments -- not that you have four nodes in a single store?
Disk space reclamation for NoSQL DB nodes is automatic, and there is no explicit user level command to reclaim space. The underlying storage is record based, as opposed to page based, so space reclamation, or cleaning, as we call it, should be responsive to deletions and updates to the store.
It is not unusual for user's data to be 50% or less of the user disk space, because additional space is used for internal metadata, such as indices, per-record headers, etc, and by the append-only log structured storage that is used to store data. But if the four stores have similar data, one would expect the same general disk space usage. Things to think about are:
- how do the usage patterns vary? Does the store that is different have a very different rate of or mix of write operations? Is there a very different read access pattern (though reads should not matter that much)
- do the different stores have different resources available? In particular, is the memory available different, and/or has cache size been set explicitly and differently on the stores?
- we have two known issues with disk space reclamation in this release, which we hope to fix in the next.
(a) SR # 21069 - disk space reclamation can stall in some cases if all write operations cease. In particular, this can show up in some cases if a lot of data is loaded into the store, and then all folowing operations are read only
(b) SR #21488 disk space cleaning can be inefficient if record key is large and the record value is tiny
There are per-node statistics that are off by default. Enabling them and comparing a node on the well behaved system with a node on the problem system can provide more information. http://www.oracle.com/technetwork/database/nosqldb/learnmore/nosqldb-faq-518364.html#ReplicationNodeparameters describes the collectEnvStats and statsInterval parameters, and you would want to set "collectEnvStats=true", and "statsInterval=5 MINUTES"
Thanks for your quick reply.
Regarding to four deploy you are right, four different stores are using 8 physical servers (two by stores).
The usage patterns is:
- Records tons of transactions (B2B XML messages) during the day
- A few record-reads by day (only some checking, heartbeat, and manual checking of failed transactions)
- A daily job which deletes many of the one-month-old records and keeping a 2% of stored-records.
- A daily job which uploads (NoSQL reads) the 2% of records to the RDBMS
All stores have identical hardware (CPU, disk, memory).
The only difference between the stores which works as expected is that it receives less messages a day, around 50% of the traffic compared to the others.
SR # 21069 is not applicable, all the stores receive continually write during all days.
SR #21488 is not applicable, key length is 80 characters and record value is 21Kbytes.
I'll configure the stores to collect stats and I'll try to see if some stats give me a signal of what happens.
Best regards, Marcelo.