Forum Stats

  • 3,826,047 Users
  • 2,260,590 Discussions
  • 7,896,779 Comments

Discussions

bulk loading B-tree with unsigned long keys does not work.

Hi,

I am trying to bulk load and read from BDB-Btree with "unsigned long int" keys.

When I make some minor modifications in the Bulkexample code, the tree is not being generated correctly.

What I have done: I changed the key type as u_int32_t and changed the variables type in compare_int function as well.

It seemed to me the problem is about alignment.

Then, I put my key data in byte array and used marshalling/unmarshalling techniques, still I cannot read data sorted by key :(

The code has something like the below

u_int32_t keyvalue=rnd->nextUniformUnsignedLong(); // rnd is my random generator

uint32_t sizeOfKey = sizeof(u_int32_t);

uint8_t *bufkey = new uint8_t[sizeOfKey]; // allocate a buffer area.

uint8_t *it = bufkey; // pointer variable to iterate over the buffer.

memcpy(it, &keyvalue, sizeOfKey);


uint8_t *bufdata = new uint8_t[DATALEN]; // allocate a buffer area. 

it = bufdata; // pointer variable to iterate over the buffer.

memcpy(it, &(data_val->id), sizeof(int));

it += sizeof(int);

memcpy(it, data_val->str, STRLEN);

...

if (ptrkd->append(bufkey, sizeOfKey,

  bufdata, DATALEN) == false)

throwException(dbenv,

  txnp, EXIT_FAILURE,...)


In the compare function:

uint32_t sizeOfKey = sizeof(u_int32_t); //uint_fast32_t

uint8_t *it = (uint8_t *)a->get_data(); // allocate a total buffer area. Write explicit size for the array, do not use sizeof()

memcpy(&ai, it, sizeOfKey);

it=nullptr;

it = (uint8_t*) b->get_data(); // allocate a total buffer area. Write explicit size for the array, do not use sizeof()

memcpy(&bi, it, sizeOfKey);


// memcpy(&ai, a->get_data(), sizeof(int));

// memcpy(&bi, b->get_data(), sizeof(int));

return (ai - bi);

Similiar problems occur when I try to load b-tree with double keys..It does not sort correctly.

Thanks for the help. Good holidays..

Answers

  • User_DAIKJ
    User_DAIKJ Member Posts: 5 Green Ribbon

    Is this a 32-bit or 64-bit compiled program? The "long" data type changes from 32-bit to 64-bit depending on how you compile, so it is best never use the C data type names any more. Use the system unambiguous types, like uint32_t and uint64_t, etc.

    I see some use of uint32_t in your example code, but you are still saying "long" in your description. You are also assuming "int" is 32-bit, which is less of a problem, but it is a bad assumption to make.

    The "double" type in C is always 64-bit and is not the same size as a "long", depending on what I mentioned above. Use the unambiguous types provided by your compiler/OS headers.

    Based on your function name, nextUniformUnsignedLong(), I do not know what size data I can expect from that (32-bit or 64-bit)?

    Since BDB stores keys as byte arrays (and uses memcpy), Little Endian keys derived from native types will not sort correctly without your own comparison function. Alternatively (and easier), you can just byte-swap the key before storing it or using it for a lookup. My preference, it saves having to mess around with custom compare functions and such.

    I have been using 32-bit and 64-bit keys in my BDBs for over a decade.