This discussion is archived
5 Replies Latest reply: Aug 13, 2012 12:25 PM by 912175 RSS

Clustering Hash Table

912175 Newbie
Currently Being Moderated
I'm pondering the idea of using BDB to do some clustering using 'locality sensitve hashing' on an [extremely] large scale. I know I can set my own hash function, but I'm wondering if there is any(simple?) way to access all of the values in the same hash bucket.

Has anyone tried this before with BDB?

Are there any major issues I am overlooking here that might cause problems?

Any thoughts ideas would be appreciated, thanks.
  • 1. Re: Clustering Hash Table
    Ashok_Ora Explorer
    Currently Being Moderated
    Hi,

    If you describe the problem scenario in more detail, we might be able to come up with a solution together.

    Thanks and warm regards.
    ashok
  • 2. Re: Clustering Hash Table
    912175 Newbie
    Currently Being Moderated
    I am trying to cluster high dimensional data (dim = 100 - 1000+) using 'Locality sensitive hashing'(http://en.wikipedia.org/wiki/Locality-sensitive_hashing)

    This method requires hashing all feature vectors to a hash bucket through some hash function. The idea is that high dimensional data that is close in high dimensions should also be close after hashing(this hash function hash special properties which make this approximately true) . So once we have hashed all of our data we want to compare the data that has fallen in the same bucket and perhaps run some local clustering on that data(eg k-nearest neighbor).

    I know I can use BDB to hash my data using my specialhash function, but I am less sure about how to get all the data that has fallen into the same bucket efficiently. I am also wondering if there are other technical issues I might run into trying to do this procedure using a BDB hash table.

    Furthermore, and one reason I think using BDB is a good solution is that I want this table to be persistent, instead of just running the method on an in memory table, I can add data, eg daily, and continue to grow the table.

    The main question is how do I access all data in the same hash bucket?

    Does that clarify a bit?
  • 3. Re: Clustering Hash Table
    Ashok_Ora Explorer
    Currently Being Moderated
    Thanks. This is very helpful. I'll think about this issue and connect with you directly. You can reach me at ashok dot joshi at Oracle dot com. I'm based in the US in the Boston area.

    Thanks and warm regards.
    ashok
  • 4. Re: Clustering Hash Table
    Ashok_Ora Explorer
    Currently Being Moderated
    Here's the answer I got from a colleague of mine:
    We do not export any notion of pages through the API, so he can't do exactly what he wants, but you can certainly use a cursor and get a set of objects that are all "next to each other" which would mean either in the same bucket or in adjacent buckets. I think that would allow him to get K nearest neighbors (you search for the item and then ask for the K/2 items Prev and the K/2 items Next)

    It's not exactly what you requested, but may be sufficient.

    Hope this helps.
    Thanks and warm regards.
    ashok
  • 5. Re: Clustering Hash Table
    912175 Newbie
    Currently Being Moderated
    Thank you for the reply, I figured that would be the best method as well, except I will Likely use either values in same bucket(through iterating) or actual bucket plus previous and next, same idea though.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points