Forum Stats

  • 3,767,858 Users
  • 2,252,726 Discussions
  • 7,874,366 Comments

Discussions

comparison function returns incorrect data

2

Answers

  • Carol.Sandstrom-Oracle
    Carol.Sandstrom-Oracle Member Posts: 38
    edited Oct 24, 2016 10:42AM

    I have confirmed that the example passes as written on 6.0.20 on a basic Oracle Linux 7 host, executing the test with command

    ./ex_heap -h TESTDIR -b

    What system and command line are you using?  Does the test pass prior to your modifications?

    Thanks

  • 3333809
    3333809 Member Posts: 15
    edited Oct 24, 2016 11:21AM

    My problem isn't necessarily whether the test passes, but specifically around the values received by the comparison function for the keys a and b.  They appear to be truncated.  Is this expected?  I'm running on CentOS release 6.3.  

  • 3333809
    3333809 Member Posts: 15
    edited Oct 24, 2016 11:39AM

    Command i'm running is ./ex_heap -b-h database

  • 3333809
    3333809 Member Posts: 15
    edited Oct 24, 2016 11:48AM

    The design of my comparison function assumes that the keys will match ones in the db exactly, so if that isn't guaranteed i will need to re-design the comparison function.  Please advise.

  • 3333809
    3333809 Member Posts: 15
    edited Oct 24, 2016 12:22PM

    the example prior to my changes appears to work as expected, however in the example the keys are ints whereas in my case the keys are expected to be 24 character strings.  This is where the truncation is noticeable.

  • userBDBDMS-Oracle
    userBDBDMS-Oracle Member Posts: 783 Employee
    edited Oct 24, 2016 1:26PM

    The purpose of the comparison routine is to build/traverse the btree in a well-ordered manner. 

    In the comparison routine  a is the key from the application and b is the key from the database.

    Your claim seems to be that the data portion of the B key is being truncated.

    So what we need to do is get a print of the data portion of the key before any manipulation is done to it while in the comparison routine

    and a dump of the database page of that key.     Your code will need modifications for this.  You can dump with db_dump.

    It is not the role of the comparison routine to always match.   The role is to compare keys and

    determine if key A is less than, equal, or greater than key B.   You should not be assuming the keys

    will match.   This would not be correct usage of the comparison function.  If the keys match, then if

    this was triggered from an insert, it is a duplicate and you will need to have duplicates enabled.

    BDB can certainly handle alpha keys.   We can do a test case with alpha keys and verify.

    Most likely this is a bug in your code in the way you are handling alpha keys.

    At this point we ran a test, and the test example works.

    Before we can take this on as a bug, we will need more proof that a bug exists.

    Using db_dump and a proper print can do that.

    Our internal comparison routine handles alpha strings.

    Usually customers write their own comparison if they want a special sorting.

    What is your intended purpose?

  • 3333809
    3333809 Member Posts: 15
    edited Oct 24, 2016 1:52PM

    Thanks.  I will make the necessary changes in my code to produce the results you are looking for.

    To give you some more context, our db is growing at a rapid pace due to these "holes" resulting from all the deletions we're doing with a btree db.  This is with the default lexicographical sort that is provided by BDB, i.e. we are not overriding the comparison function.  My thinking is that if we use a smarter comparison function it's possible that we have a broader range keys that will map to any one key that is removed from the db, hence the comparison function that is given in the initial post.  Please let me know if my thinking is incorrect here.

  • userBDBDMS-Oracle
    userBDBDMS-Oracle Member Posts: 783 Employee
    edited Oct 24, 2016 2:17PM

    this background is very useful.

    the default lexicographic ordering is done by our internal comparison routine.    Yes, if you delete out of the btree, we do not garbage collect.  So, yes there will be "holes" as you call them.    In general what you are describing is duplicates.     Except in your case you want multiple different keys to act/behave as a single duplicate key.    I would view this as being pretty tricky to do.    But I am not sure if this will get you what you want.   If you have duplicates but you only want to delete data from one of the duplicates then you have to handle that case as well.     So you would need to mapping of the data back to the original key.    Plus you will need duplicates enabled for this to work.   This could also be viewed as not well-ordered.     Let me explain -- you insert key 'abc',  then you insert key 'cba'.    you key cba to match key abc.   This in BDB terms is a duplicate.   then you store the data from key cba.    now you want to delete key abc -- but connected to that key is 2 pieces of data, one from abc and one from cba.    this is all extremely complex and very error prone. 

    What is far more safer, is to do a dump and a reload of the data.   This will rebuild your btree.  We have info in the docs on this.

    We are not going to be able to consult with you on your code because we do not have an API to do what you are looking to do.

    What I want to be sure of is that BDB is not truncating the key being sent to the comparison routine.  If it was, this would be a bug .

  • 3333809
    3333809 Member Posts: 15
    edited Oct 24, 2016 3:08PM

    the keys are unique. sorry when i said map i meant basically whenever a record is deleting and another record is in the process of being added and it fits within the sort order of the record that was deleted, the likelihood that it will fit into the sort order will be greater if you follow something like the frequency of the letters like how i'm trying to do it.  that's my thinking anyway.  so if the key fits in the sort order it will fill the hole that existed before.  This is the intent. 

    Re dumping and reloading data, how expensive is that to do?  Is that not essentially compacting the db?

    So I added this code to the very beginning of my comparison function

    printf("a = %s\n", a->data);

    printf("b = %s\n", b->data);

    char b_data[50];

    strcpy(b_data, b->data);

    char command[50];

    strcpy(command, "db5.3_dump -da /root/BerkeleyDB/build_unix/cmid.db | grep ");

    strcat(command, b_data);

    system(command);

    and the output is as follows

    a = --abcDefgJijklmnopqrstuv

    b = V5OWtJ8A

    a = --mnopqrstuvabcDefgJijkl

    b = V3B

      [000] 4068 len:  25 data: V3BnopqrstuvabcDefgJijkl\0

      [002] 4028 len:  25 data: V3BnopqrstupqrstfgJBn---\0

    Please let me know if this is sufficient info.

  • userBDBDMS-Oracle
    userBDBDMS-Oracle Member Posts: 783 Employee
    edited Oct 24, 2016 3:44PM

    Could you post the entire program and we will grab it and test in house or better yet, trim it down so it just show this problem and post that.

    b = V5OWtJ8A  this isnt matching anything with what you have printed.  so maybe there are other pages that need to get dumped.

    B should be from the database.

    We do have tests in our test system with alpha keys.  We are not seeing any issues with those and how they run.

    Since we will do the compile/link, also give us your config.log file.  This will have the flags used during compile.

This discussion has been closed.