This discussion is archived
6 Replies Latest reply: Nov 8, 2011 3:19 AM by 622100 RSS

secondary indexing

622100 Newbie
Currently Being Moderated
Hello,

I currently use a object persistence mechanism developed on top of bdb. I built a lot of rich indexing/query features into the implementation. I am interested in implementing the same persistence interface on top of oracle nosql.

My main use case is sharding the key space by user/record owner. So I envision my major key path to be something like.

USERID/Entity type/Entity id

So if I was storing an order with a user the order key path would look like

4142526162515/Order/43535253

I want to be able to index my entity types arbitrarily. For example, I might like to index my order entities by city.

This leads to my first question. Since I am normally dealing with data in a user centric context I am thinking of grouping the indexes with the records themselves to preserve locality.

So in the example above if I had an index named IDX-BY-CITY the index entry key would look like

4142526162515/Order/IDX-BY-CITY/San Francisco

How would you go about supporting duplicate keys for the index entries? Would I need to append some sequence number to the tail end of my index path and ignore it when I am doing a lookup?

Does the approach I am outlining even make sense at all for sec indexing? I figure the cases where I would ever need to select all records by a sec IDX(as opposed to by user) would be some sort of map reduce/batch situation.

Thank you for your time.
Topher


I read the docs some more and am changing my approach so you can ignore this post. For sec indexes I am going to attempt to use the minor part of the key to simulate the duplicates strategy I was using with BDB. I have also decided not to cluster the index with the record by default. I partition the index by the index key itself.

So continuing the example above if i have three orders where the city is san francisco my sec keys will look like.

Order/IDX-BY-CITY/San Francisco.${order_1_id}
Order/IDX-BY-CITY/San Francisco.${order_2_id}
Order/IDX-BY-CITY/San Francisco.${order_3_id}


This should put all the records containing San Francisco on the same partition.
I still haven't decided whether or not these sec indexes will be fat.

onwards.

Thank you.

Edited by: topherlafata on Nov 6, 2011 12:58 PM
  • 1. Re: secondary indexing
    greybird Expert
    Currently Being Moderated
    So continuing the example above if i have three orders where the city is san francisco my sec keys will look like.
    Order/IDX-BY-CITY/San Francisco.${order_1_id}
    Order/IDX-BY-CITY/San Francisco.${order_2_id}
    Order/IDX-BY-CITY/San Francisco.${order_3_id}
    This should put all the records containing San Francisco on the same partition.>

    You haven't said which is the major and minor path in this key, so I'm not sure that's true. If you want all index records for San Francisco on the same partition, the key should be /Order/IDX-BY-CITY/San Francisco/-/OrderID, in other words, the major key path must be /Order/IDX-BY-CITY/San Francisco and the minor key path must be the order ID. Only keys with the same major path are guaranteed to be in the same partition.

    Also, you may already realize this, but when you update a secondary key pair and its related primary key pair, you cannot update both in an atomic transaction. This means that your application will have to deal with the possibility that the index could be out of date after a failure.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 2. Re: secondary indexing
    622100 Newbie
    Currently Being Moderated
    Thanks Mark.

    Sorry about the lack of clarity. I was using the convention used in the documentation to denote major and minor keys. '/' for major key components and '.' for the minor key component.

    Yes.The inability to do transactions across partitions in this case seems like it may be a bit of a problem. Not sure how i am gonna handle that yet. Seems like i might need to have some sort of basic transaction log i am maintaining on insert/update/delete. Any insight you have into solving the latter problem would be appreciated.

    Thanks again.

    Topher
  • 3. Re: secondary indexing
    greybird Expert
    Currently Being Moderated
    Sorry about the lack of clarity. I was using the convention used in the documentation to denote major and minor keys. '/' for major key components and '.' for the minor key component.
    That's not your fault. I used the wrong convention myself (I'll edit my earlier post), and our documentation is inconsistent. The correct convention is to separate the major and minor paths with a /-/ as described here in the javadoc:
    http://download.oracle.com/docs/cd/NOSQL/html/javadoc/oracle/kv/Key.html#toString()
    Yes.The inability to do transactions across partitions in this case seems like it may be a bit of a problem. Not sure how i am gonna handle that yet. Seems like i might need to have some sort of basic transaction log i am maintaining on insert/update/delete. Any insight you have into solving the latter problem would be appreciated.
    I can only say that it is quite a difficult problem to solve in a perfectly general way. If you can limit the use cases for your indices in some way, it may make the problem less difficult. I realize that's a very general statement and may not be very helpful.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 4. Re: secondary indexing
    622100 Newbie
    Currently Being Moderated
    I can only say that it is quite a difficult problem to solve in a perfectly general way. If you can limit the use cases for your indices in some way, it may make the problem less difficult. I realize that's a very general statement and may not be very helpful.


    ;-) If you had a specific answer to that I was going to be very impressed. Ha Ha.

    In terms of making the problem less difficult by limiting the use cases what sorts of constraints tend to make distributed transactions more manageable. Any resources you can suggest?

    I am thinking of something along the lines of persisting things on disk until all operations complete successfully. If one fails undo the ones that have been done. If an undo fails keep retrying at some interval until successful so eventually I will be back to where I started.

    Thanks again.

    Topher
  • 5. Re: secondary indexing
    greybird Expert
    Currently Being Moderated
    I think what you're talking about is building your own local transaction log and recovery mechanism. This could be a lot of work if you need to make it truly reliable. If you can tolerate errors when you have a failure and can't access your local machine storage, then it may be easier.

    I was referring to a different implementation, which is where your application is tolerant of the index being out of date. For example, if you add the index record first and update the primary record second (two transactions), then your application may have to tolerate the situation where the index record is stored and the change to the primary record is not. But that is just one case. This is very a complex topic and I don't have any simple suggestions at this point in time, sorry.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 6. Re: secondary indexing
    622100 Newbie
    Currently Being Moderated
    Thanks a lot Mark.

    Your suggestion makes a lot of sense.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points