10 Replies Latest reply: Nov 28, 2013 10:26 PM by greybird RSS

    Insert with a duplicate constraint

    kiwiclive

      Hello guys,

       

      Version 5.0.97

       

      Could I please clarify if it is possible to distinguish an insert from an update in BerkeleyDB JE (Collections API) ?

       

      I can provide some code if you need it but I thought I'd ask the question first, as I could be doing something stupid.

       

      In the BerkeleyDB-GSG (page 74), there is this statement:

       

      • DatabaseConfig.setSortedDuplicates()

        If true, duplicate records are allowed in the database. If this value is false, then putting a duplicate record into the database results in an error return from the put call. Note that this property can be set only at database creation time. Default is false.

      I set this value as part of the setup mechanism:

       

      private void initDBConfig(boolean readOnly) {

              dbCfg = new DatabaseConfig();

              dbCfg.setReadOnly(readOnly);                   // we want Read/Write

              dbCfg.setAllowCreate(true);                 // create if it does not exist

              dbCfg.setSortedDuplicates(false);           // do not allow duplicates in primaryDB

              dbCfg.setTemporary(false);                  // this must be a persistent database, not in-memory

              dbCfg.setDeferredWrite(false);              // No deferred writes, it must be transactional

      //        dbCfg.setTransactional(true);               // explicitly make DB transactional TODO: make !readOnly ??

          }

       

       

      I can insert an item into the primaryDB but if I try to insert a second record using the same key, I find the new record overwrites the old and does not throw an error.

       

      Have I misinterpreted this or am I missing a configuration key somewhere ?

       

      I'm happy to provide an example if this helps although I suspect I am not setting things up correctly.

       

      Thanks for any help.

      Clive

        • 1. Re: Insert with a duplicate constraint
          Andrei Costache, Oracle

          Hi Clive,

           

          Well, the BDB JE GSG documentation is a bit misleading in that particular section, because it does not detail all possible cases of doing the Database.put*() calls.

          DatabaseConfig.setSortedDuplicates() configures the database to support duplicates (duplicates records having the same key) or not. Note that this database property is persistent and cannot be changed once set;  the default is false, that is, duplicates are not allowed.

          If the database is configured to not support duplicates -- setSortedDuplicates(false) -- as in your case, then a Database.put() call to insert a record having a key that already exists in the database will result in overwriting the record with the same key, basically replacing the data associated with the key (an update);  it will not result in an error.  A Database.putNoOverwrite() call to insert a record having a key that already exists in the database will result in an OperationStatus.KEYEXIST error being returned, regardless of whether the database is configured to support duplicates or not (this is what you would want to attempt an insert). 

          There's also Database.putNoDupData() which stores the key/data pair into the database if it  does not already appear in the database, but this method may only be called if the  database supports sorted duplicates.

           

          Though, you mentioned you are using the Collections API.  Also, as your database is a primary database, then you've correctly configured it so that it does not allow duplicates.

          In the JE Collection API, if the database is configured to not allow duplicates then the StoredMap.put() call will result in inserting a new record if the key does not already exist in the database or in updating the data if the key already exists in the database.  Note that the return value will be null if the key was not present or it will be the previous value associated with the key, if the key was present in the database.

          So, if you've configured to not allow duplicates, and if you want to prevent a StoredMap.put() call from replacing/overwriting existing data if the key is already present, then you would need to first check if the key is present, by using Map.containsKey(). See the Adding Database Items section in the Java Collections Tutorial documentation.

           

          Regards,

          Andrei

          • 2. Re: Insert with a duplicate constraint
            Bogdan Coman

            Hi Clive,

             

            • DatabaseConfig.setSortedDuplicates()

              If true, duplicate records are allowed in the database. If this value is false, then putting a duplicate record into the database results in an error return from the put call. Note that this property can be set only at database creation time. Default is false.

             

            Well, that's not very intuitive, isn't it? I think it should get corrected.

            If the database doesn't support duplicates, you know for sure there can only by one record (record=key,value) with a certain key in the database. So any put you do for an already existing key in the database should update the value of the record, while any put for a new key should insert the record.

             

            Now, if you go deeper into the BASE API doc you'll find out exactly how put works. There is a database Put method, a putNoOverwrite and a putNoDupData. When no duplicates, Put will replace the data if it exists (no error returned to the application), putNoOverwrite returns OpeationStatus.KEYEXIST if the key exists, while putNoDupData will not work if no duplicates: http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Database.html#putNoDupData%28com.sleepycat.je.Transaction,%20com.sleepycat.je.DatabaseEntry,%20com.sleepycat.je.DatabaseEntry%29

             

            For more on storing: http://docs.oracle.com/cd/E17277_02/html/GettingStartedGuide/applicationoverview.html#storing-intro

            • Could I please clarify if it is possible to distinguish an insert from an update in BerkeleyDB JE (Collections API) ?

            Since you are using the standard Java collections API I think you are calling Map.put() for writes, which doesn't distinguish between the two cases (insert/update), so probably using the Map.get method prior to using Map.put is the way to find out whether or not it already exists.

             

            Thanks,

            Bogdan

            • 3. Re: Insert with a duplicate constraint
              kiwiclive

              Hi Andrei and Bogdan,

               

              Thank you very much for the detailed and helpful answers. This clarifies things tremendously although this does pose me some problems. It may be necessary for me to abandon the Collections API and resort to the raw Berkeley API which  is a real shame as the Collections API fits so well with our program infrastructure. What concerns me about checking to see if the item exists in the DB first is the possibility of needing to iterate through the whole collection which could contain 500,000 records. I presume this will mean opening and reading all the je log files rather than just appending to the latest one. However, if the raw API uses Database.putNoOverwrites(), then I presume something similar need to be done there so there may not be any gain by moving in that direction?

               

              I've just been reading the StoredMap API and see there is a putIfAbsent() method. If this calls the underlying Database.putNoOverwrites(), then I get my cake and eat it, although it may be at a performance cost.

               

              If my thinking is correct, I would get a more performant solution if I moved to a predominantly-update architecture and only insert when we really need to be sure that an update must not happen (eg when versioning records etc).

               

              I'll run some tests and let you know what mileage I get !

               

              Once again, many many thanks for the great feedback.

              Clive

              • 4. Re: Insert with a duplicate constraint
                Bogdan Coman

                kiwiclive wrote:

                I've just been reading the StoredMap API and see there is a putIfAbsent() method. If this calls the underlying Database.putNoOverwrites(), then I get my cake and eat it, although it may be at a performance cost.

                You are actually right on the money, we missed that method. Good find!

                 

                Bogdan

                • 5. Re: Insert with a duplicate constraint
                  greybird

                  Just a small addition:  putIfAbsent() does not have an additional performance cost.  It simply calls putNoOverwrite().

                  --mark

                  • 6. Re: Insert with a duplicate constraint
                    kiwiclive

                    Thanks Mark, that's good to know

                    • 7. Re: Insert with a duplicate constraint
                      kiwiclive

                      Hi Guys,

                       

                      Just FYI: I have run some throughput tests comparing putIfAbsent() to put().  putIfAbsent() is about 40% slower than put in our environment, which is to be expected if it needs to make a round-trip to check for existence. As Berkeley is so fast, this is well within tolerance so all is good :-)


                      Clive

                      • 8. Re: Insert with a duplicate constraint
                        greybird

                        putIfAbsent does less work than put, so there may be something fishy with your test.  put does a putIfAbsent, and if that fails then does a putCurrent to replace the existing record.  Either way, there is only one Btree search.

                         

                        --mark

                        • 9. Re: Insert with a duplicate constraint
                          kiwiclive

                          Hi Mark,

                           

                          That is very interesting, so putIfAbsent() should be faster than put() ?. I'm certainly not seeing that. Admittedly my 'tests' have just been running for a period using 'put()', then doing the same using 'putIfAbsent()' and counting log transactions. The same database was however larger by the time I ran the second test and there are networks in the way so I'm the first to agree its not a solid lab test.

                           

                          However, if put() does call putIfAbsent(), then it must always fail in my case because the record in the DB is always overwritten. I cannot find putCurrent(). I am using StoredMaps so maybe I am using the wrong collection ?

                           

                          I don't wish to take up your time as the performance I am getting is still very good but if there is a way to make an insert faster than an update, I'm all ears :-)

                           

                          Clive

                          • 10. Re: Insert with a duplicate constraint
                            greybird

                            I'm not suggesting there is a way to get better performance.  I'm only questioning the validity of the test results, since put() does more work than putIfAbsent().  put() attempts to do an insert, and then if the insert fails, it does an update.  putIfAbsent() only attempts to do an insert.

                             

                            --mark