8 Replies Latest reply on Aug 25, 2013 11:19 PM by Apostolis

Retrieve keys by Integer order

Hello,

let's assume that I have the following major keys:

"10"

"11"

"12"

"1001"

"1011"

"1112"

If I specify a KeyRange with start="10" and end="12" then I will retrieve ALL keys, instead of "10", "11" and "12". As far as I understand, this is because Keys.compareTo() uses the String.compareTo().

So, my question is this. Is it possible to specify that I would like an Integer order and not a String order so, in the above scenario, I can retrieve only the keys: "10", "11" and "12"?

Would it be possible to specify a comparator (such as the Alphanum Comparator http://www.davekoelle.com/files/AlphanumComparator.java) for comparing the keys?

Thank you very much for your help. It is very much appreciated.

Best regards,

Apostolis

• 1. Re: Retrieve keys by Integer order

To cause a decimal string to sort in numeric order, you can pad it on the left with zeros.  Note that you'll have to choose a maximum number of digits.  Since smaller keys are more efficient, you may want to use base64 encoding rather than decimal.

--mark

• 2. Re: Retrieve keys by Integer order

Hello Mark,

It seems that padding along with base64 encoding does the trick.

I should point out that all the stored keys should have a fixed length in order for this to work.

Cheers,

Apos

• 3. Re: Retrieve keys by Integer order
`I should point out that all the stored keys should have a fixed length in order for this to work.`

Correct, and the length cannot be changed without breaking the sort order.  Which implies that you have to decide in advance what maximum numeric value you require, for the lifetime of the store.

--mark

• 4. Re: Retrieve keys by Integer order

Would it be possible to elaborate more on the reason that you suggest to encode it using base64?

For example. why is "MDAwNA==" a more efficient key than "0004"?

Thank you very much for your assistance,

Apostolis

• 5. Re: Retrieve keys by Integer order

I think you encoded the character digits as base64, which won't save space.

Base64 converts byte arrays to characters, and converts each sequence of 3 bytes to 4 characters.  To take advantage of this, you have to first encode your number as a byte array, for example using ByteBuffer.putInt and getInt, and then encode that byte array as base64.  Additionally you can strip off the = characters at the end after encoding, and add them back before decoding to make the number of characters a multiple of 4 -- this is possible because your byte array will be fixed length (based on the maximum numeric value you've decided on).  In base64, = characters are added when the number of encoded bytes is not an exact multiple of 3.  See Base64 - Wikipedia, the free encyclopedia

However, I think I misled you -- sorry about that.  Although base64 is a compact character representation for numbers, apparently it doesn't preserve sort order, which is something I wasn't aware of.  If you were only concerned about space, and not sort order, it would be a good choice.  But you do want to preserve sort order, so it's not appropriate.

There may be other encodings that are more appropriate, for example base32hex.  A colleague of mine has done some work in this area but he's on vacation.  I'll ask him for information about this when he's back and post it.

To save space (compared to decimal encoding) and also preserve sort order, you may want to consider using hexadecimal encoding.  For 'int' values, you can use Integer.toHexString to encode and Integer.valueOf(String,16) to decode, so this is very easy to do.  In hexadecimal encoding each binary byte is converted to two characters, so for example a two byte integer is represented as 4 characters.  Before using it in a Key, be sure to pad the encoded string with zeros on the left to create a fixed length string.

Please be aware of that is I have not addressed negative numbers above.  What I said only applies to positive values.  If you need to encode negative numbers and you need them to sort correctly, you'll need to come up with a scheme for that.  Again, my colleague may have ideas on this as well and I'll ask him about it when he's back.

--mark

• 6. Re: Retrieve keys by Integer order

Mark, you were right that base64 did not preserve sort order. I later identified some marginal cases that the sort was broken. I used instead base32hex and it seems to work fine now.

At first I used the apache commons codec base32 (with the hex option enabled) implementation, but it didn't preserve the sort order. Then, I used this implementation of base32hex (GrepCode: com.buck.common.codec.Base32Hex (.java) - Class - Source Code View) that seems to work as expected.

Thank you again very much,

Apostolis

• 7. Re: Retrieve keys by Integer order

Awesome, thanks for letting us know and thanks for the info on base32hex!  I'm sure other users will benefit from your experience.

--mark

• 8. Re: Retrieve keys by Integer order

True, that's an issue that will definitely also affect others and this thread might benefit them.

Again thank you for your help,

Apostolis