This discussion is archived
1 2 Previous Next 17 Replies Latest reply: Sep 26, 2007 9:33 PM by 529636 RSS

Strange behaviour with custom data comparator

578286 Newbie
Currently Being Moderated
I'll find this result that I cannot understand:
when I put a record with the same key and the same data (a custom comparator return 0) in a database, if there is only one record for a given key the put method correctly overrides the data, if there is more then one record with the same key the put doesn't override the data.

This is an example:

public class DBTest {


private static Environment env;
private static Database store;

/* The comparator */
static class MyComparator implements Comparator,Serializable {

public int compare(Object o1, Object o2) {
byte[] b1=(byte[]) o1;
byte[] b2=(byte[]) o2;

return b1[0]-b2[0];
}

}


/* Used for printing the result */
static String byteview(byte[] b) {
StringBuffer sb=new StringBuffer("[");
for(int i=0;i<b.length;i++) {
if(i>0) sb.append(',');
sb.append(Byte.toString(b[ i]));
}
sb.append(']');
return sb.toString();
}




public static void main(String[] args) { 

EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setTransactional(true);
envConfig.setAllowCreate(true);
envConfig.setCacheSize(1000000);

try {
env = new Environment(new File("E:\\Knowledgebase\\DEACache\\"), envConfig);
DatabaseConfig dataConfig = new DatabaseConfig();
dataConfig.setAllowCreate(true);
dataConfig.setSortedDuplicates(true);
dataConfig.setOverrideDuplicateComparator(true);
dataConfig.setDuplicateComparator(new MyComparator());
store = env.openDatabase(null, "_data_", dataConfig);

DatabaseEntry k=new DatabaseEntry();
DatabaseEntry d=new DatabaseEntry();

k.setData(new byte[]{0,1,2});
d.setData(new byte[]{0,0,1,2,3,4});
store.put(null, k, d);

k.setData(new byte[]{0,1,2});
d.setData(new byte[]{0,0,1,4,2,3});
store.put(null, k, d);

k.setData(new byte[]{0,1,2});
d.setData(new byte[]{1,0,10,20});
store.put(null, k, d);

k.setData(new byte[]{0,1,2});
d.setData(new byte[]{1,0,25,35});
store.put(null, k, d);

// print the content of the database (2 record)
Cursor c=store.openCursor(null, null);
c.getFirst(k, d, LockMode.DEFAULT);
System.out.println("key:"+byteview(k.getData())+" data:"+byteview(d.getData()));
while(c.getNext(k, d, LockMode.DEFAULT)==OperationStatus.SUCCESS) {
System.out.println("key:"+byteview(k.getData())+" data:"+byteview(d.getData()));
}
c.close();


k.setData(new byte[]{0,1,2});
d.setData(new byte[]{0,0,4,3,2,1});
store.put(null, k, d);

k.setData(new byte[]{0,1,2});
d.setData(new byte[]{1,0,40,30,20,10});
store.put(null, k, d);

// print the content of the database (2 record)
c=store.openCursor(null, null);
c.getFirst(k, d, LockMode.DEFAULT);
System.out.println("key:"+byteview(k.getData())+" data:"+byteview(d.getData()));
while(c.getNext(k, d, LockMode.DEFAULT)==OperationStatus.SUCCESS) {
System.out.println("key:"+byteview(k.getData())+" data:"+byteview(d.getData()));
}
c.close();

}
catch (DatabaseException e) {
e.printStackTrace();
}


}

}

When I run this program I have those results:

key:[0,1,2] data:[0,0,1,4,2,3] correct, it overrides
key:[0,1,2] data:[1,0,10,20] strange, should be [1,0,25,35]
key:[0,1,2] data:[0,0,1,4,2,3] strange, should be [0,0,4,3,2,1]
key:[0,1,2] data:[1,0,10,20] strange, should be [1,0,40,30,20,10]

Do I have miss somethings?

Thank-you in advance
  • 1. Re: Strange behaviour with custom data comparator
    Oracle, Sandra Whitman Journeyer
    Currently Being Moderated
    hello,

    I will take a look and let you know.

    thanks,
    Sandra
  • 2. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Hi,

    I tried running your test program and took a look at what's happening. Using a duplicate comparator is fairly uncommon and I have to admit that our documentation is not as good as it should be in this area.

    The bottom line is that you can't replace a record with data that is different, but compares as equal using a custom duplicate comparator. Your duplicate comparator must compare all bytes and must not return zero if there are any differences. So unfortunately, you may not be able to to do what you're attempting to do.

    In addition, with a duplicate comparator that does not compare all bytes (such as the one in your test), JE is behaving badly as your test shows. We'll work on fixing this so that an error is reported. And we'll also work on improving the docs in this area.

    My apologies for the confusion caused.

    Mark
  • 3. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Also, if you can describe what you're trying to accomplish at a higher level, I'll be happy to suggest other approaches.

    Mark
  • 4. Re: Strange behaviour with custom data comparator
    578286 Newbie
    Currently Being Moderated
    Hi,

    thank you very much for your rapid answer.

    What I'm doing is that:
    I have to build a database of answers to different surveys for a panel of respondent. So the key represents a respondent id,while the data represents the answers for that subject.
    The first bytes of the data is a counter that identify the survey. I need a special comparator because I want to sort the records in descending order, just to have the most recent record found by the getKeySearch (normally I don't know what is the last survey a subject responds to).
    Of course, because the questionnaires change for each survey, I would check only the first bytes that identifies the survey.
    I didn't put the survey id into the key because the subject id is a byte array of variable length, so it was not so immediately to recover the survey id from the key and it would seem to me easier to separate the survey id from the key and put into data stream.
    Everything seems to go well except the update (and perhaps the delete that I didn't check).
    Naturally, it is not a problem to put the survey id back to the key if it is the most safe thing to do.

    Thank again
    Giampiero
  • 5. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Yes, I think it is best to put the survey ID into the key, since logically the survey ID identifies the survey record. The use of duplicates could also work, but is more difficult to model and program.

    I can think of a couple ways to do this. I'll describe this using the DPL (com.sleepycat.persist) because it is much simpler than the base API for modeling. Have you considered using the DPL? If you are decided on the base API, I can give more information on doing the same thing with the base API.

    Approach #1:
    @Entity
    class Survey {
       @PrimaryKey
       int surveyId;
       @SecondaryKey(relate=MANY_TO_ONE)
       int respondentId;
       // survey data
       Survey(int surveyId, int respondentId, ...) { ... }
       private Survey() {}
    }

    // After creating the EntityStore, get the indexes:
    PrimaryIndex<Integer,Survey> priIndex =
       store.getPrimaryIndex(Integer.class, Survey.class);
    SecondaryIndex<Integer,Integer,Survey> secIndex =
       store.getSecondaryIndex(priIndex, Integer.class, "responsdentId");

    // To store a Survey:
    priIndex.put(new Survey(...));

    // To get a Survey by surveyId
    Survey survey = priIndex.get(surveyId);

    // To get all Surveys for a respondentId, in reverse order of Survey number
    EntityCursor<Survey> surveys = secIndex.subIndex(respondentId).entities();
    try {
       Survey survey = surveys.last();
       while (survey != null) {
          // do something with survey
          survey = surveys.prev();
    } finally {
       surveys.close();
    }
    Approach #2:
    @Persistent
    class SurveyKey
       @KeyField(1)
       int respondentId;
       @KeyField(2)
       int surveyId;
       private SurveyKey() {}
       SurveyKey(int respondentId, int surveyId) { ... }
    }

    @Entity
    class Survey {
       @PrimaryKey
       SurveyKey key;
       // survey data
       Survey(SurveyKey key, ...) { ... }
       private Survey() {}
    }

    PrimaryIndex<SurveyKey,Survey> priIndex =
       store.getPrimaryIndex(SurveyKey.class, Survey.class);

    priIndex.put(new Survey(new SurveyKey(...), ...));
    Survey survey = priIndex.get(new SurveyKey(...));

    // Use a key range to get all Surveys for a respondentId
    SurveyKey firstKey = new SurveyKey(respondentId, 0);
    SurveyKey lastKey = new SurveyKey(respondentId, Integer.MAX_VALUE);
    EntityCursor<Survey> surveys = priIndex.entities(firstKey, true, lastKey, true);
    try {
       Survey survey = surveys.last();
       while (survey != null) {
          // do something with survey
          survey = surveys.prev();
    } finally {
       surveys.close();
    }
    The second approach is almost exactly what you mentioned -- the survey ID and the respondent ID are both part of the primary key. However, in neither approach did I use a custom key comparator. Instead, I simply use the cursor to walk in reverse order. It is possible to use a custom key comparator instead, but it is not necessary.

    The first approach is the simplest from a modeling point of view. Each survey has a unique ID and can be updated or deleted by just knowing that survey ID -- you don't have to know the respondent ID.

    However, the first approach requires two indexes while the second approach only requires a single primary index. If performance is a critical factor, and you always know the respondent ID when accessing a survey, then the second approach will provide better performance. If performance is not a critical factor, or you need to access surveys by survey ID alone, then I recommend the first approach.

    Mark
  • 6. Re: Strange behaviour with custom data comparator
    578286 Newbie
    Currently Being Moderated
    I didn't consider the DPL, but looking at your solution it's seem a good one.
    I'll give it a chance.
    I agree that your solution 1 is easier to implement.I don't think that the performance would be a problem, so that appears the best one.

    Thank's a lot
    Giampiero
  • 7. Re: Strange behaviour with custom data comparator
    579218 Newbie
    Currently Being Moderated
    Hi user575283,
    From the statement of the problem, it is not very clear to me what is meant by 'put doesn't override the data'. However, I think you meant, that for a particular key, the last entered data should get inserted in the DB (correct me if I got it wrong). In the given test program, duplicate data entries are allowed for a particular key by this statement :
    dataConfig.setOverrideDuplicateComparator(true);
    So, for a particular key, several data set could be entered in the DB. Which happens as expected.
    key:[0,1,2] data:[0,0,1,4,2,3]
    key:[0,1,2] data:[0,0,4,3,2,1]
    key:[0,1,2] data:[1,0,10,20]
    key:[0,1,2] data:[1,0,25,35]
    key:[0,1,2] data:[1,0,40,30,20,10]
    In all of these entries the key is same but, the data is different. The order of the numbers in the data give them an unique byte values. If, the same data set e.g [1,0,10,20] is entered 3 times, the DB will register it only once but, if the ordering of the numbers in the data is changed, it will register it as it renders another unique byte value.
    Also, the compare() method compares only the 1st byte from the 2 data objects. Shouldn't it be comparing all of the bytes? It is however, insignificant in this case I think.

    Regards,
    DR
  • 8. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Hello DR,
    So, for a particular key, several data set could be
    entered in the DB. Which happens as expected.
    key:[0,1,2] data:[0,0,1,4,2,3]
    key:[0,1,2] data:[0,0,4,3,2,1]
    key:[0,1,2] data:[1,0,10,20]
    key:[0,1,2] data:[1,0,25,35]
    key:[0,1,2] data:[1,0,40,30,20,10]
    I'm afraid what you have listed above is not the correct expected output. The output the poster identifies as expected in his original post is correct, because data items should be replaced when they are considered equal by the comparison method. While this is normally the case, because his comparator does not compare all bytes JE behaves badly in this test case.

    Please see my first post above -- the first reply to his post -- for more details on this situation. As I mention in this post, we are working to make JE perform more predictably in this scenario. Additionally, as you can see in the rest of this thread, the user has taken our advice and is using a different approach that does not involve using a duplicate comparator.

    Mark
  • 9. Re: Strange behaviour with custom data comparator
    594690 Newbie
    Currently Being Moderated
    This behaviour, of not allowing duplicate data to be overwritten is puzzling me!

    if I use putCurrent, when allowing dup's in the db, I will always get an error ("can't replace duplicate with different data"). Surely I want to change the data, that's why I called putCurrent. The work around is to cursor.delete() and then put(). But this work around is not what I'd expect to have to do, from the doc's it looks like I'd just be able to call putCurrent().

    Is this going to be addressed, or am i missing something here?

    Rob.
  • 10. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Hi,

    You're right, with duplicates configured the putCurrent method is not very useful. The only thing it gives you is a write lock on the record. It is useful without dups of course.

    In this thread earlier I mentioned that JE was behaving badly when trying to change the data via putCurrent with dups configured. This was causing data corruption. We now prevent the data corruption problem, but don't allow changing the dup data.

    We have a task on our road map for allowing changing the dup data, as long as the custom comparator returns 0. This would allow you to change data that doesn't impact the sort order. However, this is not high priority right now. The data corruption bug has been present since JE was released 4 years ago, so in that time period no one has reported a need for the feature. If there is a need for it, we will prioritize it higher. If you can describe a use case, that will help us to prioritize.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 11. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Hi all,

    I was discussing this duplicate comparator issue with another forum user, Dane (studdugie), over email, and I got his permission to re-post it and continue the discussion here. I realize that this issue is a little confusing and may be interesting to others on the forum.

    For reference, this discussion is about this change log item, which is the fix for the problem discussed earlier in this thread:
    ---
    Fixed a bug that caused incorrect query results, and possibly errors during recovery, when a custom duplicate comparison method is configured and that method does not compare all bytes of its operands. A custom comparator should always compare all bytes of its operands and should return non-zero if they are unequal. When updating existing records, JE will detect this kind of incorrect comparator in order to prevent database inconsistency. The problem was reported in this JE Forum thread <Strange behaviour with custom data comparator [#15527]
    ---

    I'll post the email discussion with Dane below and then reply in a second post:
    ------
    studdugie wrote:
    I have a problem w/ #8 in the change log. Should I respond on the
    forum thread that triggered it or are you the man to complain to?
    On 8/20/07, Mark Hayes <mark.hayes@oracle.com> wrote:
    Hello Dane,

    I'm Mark Hayes on the BDB JE team. You wrote below that you're having a
    problem with a fix in the latest JE 3.2 release that Charles Lamb gave
    you. Could you please supply more details on the problem you're having?
    studdugie wrote:
    It's more of a speculative problem than a concrete problem. As I read
    through the release notes I followed the link referenced. In the forum
    conversation (and the change log) it is suggested that the comparator
    MUST compare all the bytes of its operands in order to work
    successfully and that je would detect when it doesn't. After reading
    that, all sorts of alarm bells went off in my head based on its
    implications. So when I originally said something about it I hadn't
    actually run any tests to confirm that my concerns where real so I
    thought it was pointless to beat you guys up until I had. Eventually,
    I got around to testing my suspicions and for my use case it turns out
    that it doesn't affect me, so I just let it lie.
    Mark Hayes wrote:
    Thanks for your reply Dane. I understand your concerns, and I should
    clarify that the fix mentioned -- to check that when the comparator
    returns zero, the data bytes are equal -- is meant to prevent
    corruption. We have a TODO to remove that limitation, although we
    haven't had any requests for it.

    If you can describe your use of duplicates and comparators, I would
    appreciate it. It's always good to understand how people are using
    JE as we plan our work.
    studdugie wrote:
    Hey Mark. Thanx for the response. Let me begin by being the first to
    request that the "limitation" be removed.

    As I said in my previous post, the limitation does not affect my
    current use-cases, but I have very strong feelings against the
    limitations nonetheless. My current project is a port of a BDB-C
    version of the app to JE. BDB-C supports both sorted and unsorted
    duplicates. A feature which is sorely missed in JE but can be
    replicated using a duplicate comparator. So when you guys start
    imposing restrictions on what the developer can and can't do in their
    own comparator I get irritated because I wouldn't need to do anything
    at all had JE not done away w/ non-sorted duplicates. So that's the
    catalyst of my discontent but that's not all.

    There is an argument to be made as to why no limitations should be
    imposed. The argument begins with the question, whose needs should
    sorting satisfy? Given that the ordering rules are externalized
    visa-vi a Comparator that the developer is allowed to override, I
    submit that it should satisfy the needs of the developer, not the API.
    If the API want things sorted a certain way it should make sorting
    private. But if the API is going to leave the ordering rules up to the
    developer then it can't turn around and tell the developer "what you
    think is sorted is WRONG!". That's really bad form. Either the API
    shuts up and does what it's told or get rid of custom sorting.
    As an example of why its bad form, nowhere in Java's collection APIs
    are such restrictions imposed. When a collection accepts a comparator
    it does what the comparator tells it to do. Period. Anything less is a
    bug.

    So that's the big picture argument against the new limitations. Now
    lets discuss use cases. In my current project I use one and only one
    custom comparator. The database that it's used on acts like a
    sequential log file, where each [data] entry is a log record. The log
    file/log record nomenclature is something I'm sure you are familiar w/
    because internally JE's logging behaves the same way (I've been
    reading the source). The keys are 64 bit unsigned integers that
    represents a user id. A single user will generate multiple log events
    in sequential order. So each log record has a unique 32 bit unsigned
    integer id that is used to order the events as they occurred. So it
    should be apparent that the rest of the data in the log record has
    absolutely no meaning from a ordering/sorting POV. Now lets assume
    that the log record sizes are fixed at 1MB. What the change log and
    forum thread demands is that for every entry the comparator must
    examine 1MB of data when only 4 bytes are needed to make an ordering
    decision. Does that make a lot of sense to you? The performance hit
    alone justifies eliminating the "limitation".

    The reason the effects described in the forum thread doesn't affect my
    current project is because log records are never updated. They are
    immutable snapshots in time. But I'm sure you can imagine a variation
    of my use-case where sequential ordering of events are required but
    updates are allowed. Again I ask the question, why should the
    comparator have to examine 1,048,576 bytes when only 4 of them are
    needed to make an ordering decision? That's 262,144 as many bytes as
    needed to make a decision. So my recommendation is do whatever you
    have to eliminate this [potential abhorrent] waste of processor cycles
    and memory bandwidth.
    ------

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 12. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    Hi again Dane,

    You've brought up several good issues in your response and I'd like to address them individually.
    My current project is a port of a BDB-C
    version of the app to JE. BDB-C supports both sorted and unsorted
    duplicates. A feature which is sorely missed in JE but can be
    replicated using a duplicate comparator.
    Unsorted dups are not available in JE because when JE was designed, this was a feature in BDB-C that was not considered mainstream. JE was never intended to have all features in BDB-C. We picked the features used the most by BDB-C users, according to our company's experience with BDB-C. Some examples of features in BDB-C that are not in JE are: access methods other than btree (queue, recno, btree-recnum), unsorted dups, and duplicate duplicates. We have had a very small number of requests for these features. If the number of requests is large enough, then of course we'll consider adding them. At the moment we have no plans to do so. We should not increase complexity (and risk decreasing reliability) for new features that are not useful to a large number of JE users.

    I'm not sure how you can simulate unsorted dups in JE with the dup comparator, because you can't insert at a specific position (like you can in BDB-C) unless the data itself can be sorted in that order. With a dup comparator you can sort in any order you'd like to based on the data, but you can't really have unsorted dups as provided by BDB-C.
    Given that the ordering rules are externalized
    visa-vi a Comparator that the developer is allowed to override, I
    submit that it should satisfy the needs of the developer, not the API.
    I completely agree. What I'd like to clarify is that in JE, it has never been possible to change the data in a dup record, so we did not add a limitation when we made the change described previously in the change log. From day one, there has been a bug that caused data corruption if this was attempted. This is unfortunate. The fix we made (for the change log entry mentioned above), prevents this data corruption. This is a step forward, not a new limitation. Preventing data corruption or data loss is our highest priority, and is more important than a restriction in the API.
    Again I ask the question, why should the
    comparator have to examine 1,048,576 bytes when only 4 of them are
    needed to make an ordering decision?
    This is a very good point, and I think we have made a mistake in the change log entry. I apologize for that. The dup comparator may compare a subset of the data bytes.

    The current limitation is that you may not change the data of an existing record, when using dups. You must instead first delete the record and then insert it (using put). This limitation has always been present in JE, but before it caused data corruption (if you changed the data and the comparator did not compare all bytes), and now you get an exception if you try it.

    This limitation will be removed in a future release. Please note that even when this limitation is removed, you will not be able to change the order of duplicates by replacing the data with Cursor.putCurrent. To change the order of a dup record, you must delete and insert.

    We have not removed this limitation in the 3.2.x release line for a couple reasons:

    1) Obviously, no one has replaced dup data successfully in JE because if they tried it, it would cause data corruption. So no one is currently relying on this capability for JE.

    2) In general, the vast majority of the use of dups is for secondaries. With secondaries, the primary key is the dup data and all bytes of the primary key are normally compared. A custom dup comparator for a secondary is a very rare use case.

    This led us to make the decision to remove the limitation in a future release, not in the 3.2.x line. In general, we try to avoid changes in the current release line that could destabilize it.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 13. Re: Strange behaviour with custom data comparator
    529636 Newbie
    Currently Being Moderated
    I said:
    My current project is a port of a BDB-C
    version of the app to JE. BDB-C supports both sorted and unsorted
    duplicates. A feature which is sorely missed in JE but can be
    replicated using a duplicate comparator.
    You said:
    Unsorted dups are not available in JE because when JE was designed, this was a feature in BDB-C that was not considered mainstream. JE was never intended to have all features in BDB-C. We picked the features used the most by BDB-C users, according to our company's experience with BDB-C. Some examples of features in BDB-C that are not in JE are: access methods other than btree (queue, recno, btree-recnum), unsorted dups, and duplicate duplicates. We have had a very small number of requests for these features. If the number of requests is large enough, then of course we'll consider adding them. At the moment we have no plans to do so. We should not increase complexity (and risk decreasing reliability) for new features that are not useful to a large number of JE users.

    I'm saying:
    I've always viewed the unsorted duplicates feature of BDB-C as a simple append operation. Using Java terminology one can think of a database configured for unsorted duplicates as a ConcurrentMap with byte[] keys and a List as its value, where each duplicate put simply appends the entry to the end of the List. A database configured for sorted duplicates can be envisioned as a ConcurrentMap with byte[] keys and a SortedSet as its value. The advantage of the unsorted is duplicate put operations are faster because it does not incur any sorting evaluation overhead. The advantage of the sorted configuration is, given a large collection of duplicates, moving a Cursor to or getting a specific key/value pair is faster because the library does not have to evaluate every duplicate entry to find the matching pair.

    So given that unsorted duplicates is an alias for "append" it is possible to emulate that behavior using a duplicate comparator. This is exactly what my use case does. It uses a monotonically increasing integer (j.u.c.AtomicInteger) to add a 32bit integer field to each entry and sorts on that int. The resulting behavior emulates the append only nature of an unsorted duplicates configured BDB-C database.



    You:
    The current limitation is that you may not change the data of an existing record, when using dups. ... This limitation has always been present in JE, but before it caused data corruption (if you changed the data and the comparator did not compare all bytes), and now you get an exception if you try it.

    Me:
    This isn't made clear in the documentation. I have always labored under the assumption that if I did Cursor.getSearchXXX then did Cursor.putCurrent(partial_database_entry_object) it would update the current entry. I'm sure I'm not the only one who never gave it a second thought and since unit testing and code coverage analysis isn't ubiquitous I am sure there is data corruption taking place that developers and admins are unaware of. In other words, not because you haven't received tons of complaints it doesn't mean it's impact is small. People just may not know its happening.
  • 14. Re: Strange behaviour with custom data comparator
    greybird Expert
    Currently Being Moderated
    So given that unsorted duplicates is an alias for
    "append" it is possible to emulate that behavior
    using a duplicate comparator. This is exactly what my
    use case does. It uses a monotonically increasing
    integer (j.u.c.AtomicInteger) to add a 32bit integer
    field to each entry and sorts on that int. The
    resulting behavior emulates the append only nature of
    an unsorted duplicates configured BDB-C database.
    I see, for this specific use case, you can use JE sorted dups then. This should work today. Thanks for clarifying.
    Me:
    This isn't made clear in the documentation. I have
    always labored under the assumption that if I did
    Cursor.getSearchXXX then did
    Cursor.putCurrent(partial_database_entry_object) it
    would update the current entry. I'm sure I'm not the
    only one who never gave it a second thought and since
    unit testing and code coverage analysis isn't
    ubiquitous I am sure there is data corruption taking
    place that developers and admins are unaware of. In
    other words, not because you haven't received tons of
    complaints it doesn't mean it's impact is small.
    People just may not know its happening.
    I understand that it is possible that someone is attempting to use this feature, but I believe it's very unlikely and because of the reasons I outlined I think we're making the right decision by fixing it in a future release rather than in the 3.2.x release line. I feel strongly that our priorities are correct in this case, given the information we have about the use of the product, including the information posted to this forum.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
1 2 Previous Next