let's assume that I have the following major keys:
If I specify a KeyRange with start="10" and end="12" then I will retrieve ALL keys, instead of "10", "11" and "12". As far as I understand, this is because Keys.compareTo() uses the String.compareTo().
So, my question is this. Is it possible to specify that I would like an Integer order and not a String order so, in the above scenario, I can retrieve only the keys: "10", "11" and "12"?
Would it be possible to specify a comparator (such as the Alphanum Comparator http://www.davekoelle.com/files/AlphanumComparator.java) for comparing the keys?
Thank you very much for your help. It is very much appreciated.
I should point out that all the stored keys should have a fixed length in order for this to work.
Correct, and the length cannot be changed without breaking the sort order. Which implies that you have to decide in advance what maximum numeric value you require, for the lifetime of the store.
I think you encoded the character digits as base64, which won't save space.
Base64 converts byte arrays to characters, and converts each sequence of 3 bytes to 4 characters. To take advantage of this, you have to first encode your number as a byte array, for example using ByteBuffer.putInt and getInt, and then encode that byte array as base64. Additionally you can strip off the = characters at the end after encoding, and add them back before decoding to make the number of characters a multiple of 4 -- this is possible because your byte array will be fixed length (based on the maximum numeric value you've decided on). In base64, = characters are added when the number of encoded bytes is not an exact multiple of 3. See Base64 - Wikipedia, the free encyclopedia
However, I think I misled you -- sorry about that. Although base64 is a compact character representation for numbers, apparently it doesn't preserve sort order, which is something I wasn't aware of. If you were only concerned about space, and not sort order, it would be a good choice. But you do want to preserve sort order, so it's not appropriate.
There may be other encodings that are more appropriate, for example base32hex. A colleague of mine has done some work in this area but he's on vacation. I'll ask him for information about this when he's back and post it.
To save space (compared to decimal encoding) and also preserve sort order, you may want to consider using hexadecimal encoding. For 'int' values, you can use Integer.toHexString to encode and Integer.valueOf(String,16) to decode, so this is very easy to do. In hexadecimal encoding each binary byte is converted to two characters, so for example a two byte integer is represented as 4 characters. Before using it in a Key, be sure to pad the encoded string with zeros on the left to create a fixed length string.
Please be aware of that is I have not addressed negative numbers above. What I said only applies to positive values. If you need to encode negative numbers and you need them to sort correctly, you'll need to come up with a scheme for that. Again, my colleague may have ideas on this as well and I'll ask him about it when he's back.
Mark, you were right that base64 did not preserve sort order. I later identified some marginal cases that the sort was broken. I used instead base32hex and it seems to work fine now.
At first I used the apache commons codec base32 (with the hex option enabled) implementation, but it didn't preserve the sort order. Then, I used this implementation of base32hex (GrepCode: com.buck.common.codec.Base32Hex (.java) - Class - Source Code View) that seems to work as expected.
Thank you again very much,