Have you looked at Chapter 3 of the GSG yet?
In general, the major key is used to hash to a shard and the minor key is used to look up records within the shard. You want to keep your key size as small as possible.
There are also sizing and planning spreadsheets included in the distribution that you may want to take a look at.
1 person found this helpful
To add to the reply from Charles, using the example you put forward it would make sense to keep relatively static information in a single record under a major key of user id, e.g. a key of
with the data being the serialized (using Avro schema) value of the mostly-static information for a user.
Given that your Products would be minor key components under a given user, e.g. a product owned by a user might have the key
With a value that is the serialized (again Avro) value of the product state.
The documentation reference Charles provided has a lot of good information. A few highlights:
o keep keys short (hence use of "U" vs "User" above)
o it is fast, efficient, and transactional to get/modify records that have the same major key (e.g. all products owned by user with id "1").
o it's generally a good idea to separate static and dynamic data, depending on update rates, so use that to decide if something should be in a separate record vs another field in a record that has multiple "fields" in its value.
o know your access patterns, updates, etc and factor them into the design
Thanks...both answers were very helpful. I did read through the GSG and have been reading the "Oracle NoSQL Database: Real-Time Big Data Management for the Enterprise" on Safari.
I think the confusion exists in the different examples from the GSG and the book where data is stored with coarse keys and objects as opposed to fine grained major/minor with individual properties. It is a question that gmfeinberg alluded to is how to determine whether to create very granular major/minor key combinations to store individual bits of information vs storing the whole Object under a single major/minor key.
The GSG sample (the Oracle NoSQL Database book has the same type of example in Chapter 6 where they have a 3 major/4 minor key combination):
More of a model based on the example in gmfeinberg's response with an Avro Schema:
/Smith/Bo=Avro Binding of Person
/Smith/Patricia=Avro Binding of Person
/Wong/Bill=Avro Binding of Person
Is the discriminator as to how granular the minor key gets and the level of the value (whether an individual property or a whole Object) based on how much data one needs to access. Such as I may query for a persons birthday multiple times, but only need the whole person record a few times, so in that example it makes more sense to be granular/specific with minor keys/values so that just the birthday is being read. As opposed to instances where it is always necessary when interacting with a Person to have access to the whole data set and therefore would benefit more from a major key (little minor) and Avro Binding of the Person?
Thanks for the direction and information!