4 Replies Latest reply: Jan 27, 2013 5:10 AM by rukbat RSS

    Simply delete all entries in a database.

    886619
      Hello,

      how do I simply delete all entries in a database (which must be thread safe, and most probably is)? For instance it is needed, as I'm developing a versioned open source XML/JSON database system, whereas I'm using a BerkeleyDB environment/database as a transaction log per revision (resource/log/version_number/...) for dirty blocks/pages and now want to introduce checkpointing (with currently only a single write-transaction per resource). That is another reading transaction (possibly another thread might read a transaction log, a BerkeleyDB environment/database while another thread, the checkpointer (most probably a deamon thread) commits, that is writes the log periodically or during less workload into the real resource. After the data is commited, the transaction-log must be emptied, but probably a reading transaction still reads from the log and falls back to reading from the real resource if the page is not in the log. That is I can't remove the database, but probably simply have to delete all entries and a simple .commit-file flag which indicates if the data has been written back to the real resource or the checkpointer must be writing it back sometime in the future (if the .commit-file still exists). Do I have to iterate through the database with a cursor (and .getNext())? Or does a dedicated method exist?

      kind regards
      Johannes
        • 1. Re: Simply delete all entries in a database.
          Greybird-Oracle
          Hi Johannes,

          As I think you've already discovered, there is no built-in method for deleting all records of a database that is open. The only similar built-in methods are those for removing or truncating an entire database (removeDatabase and truncateDatabase), and the database must be closed.

          If you can't find a way to use removeDatabase or truncateDatabase, then you'll have to iterate through the records and delete them individually. If this is done for a large numbers of records in a single transaction, it will be expensive on a number of fronts, including memory usage for the locks: each record is individually locked.

          If you don't need to delete all records in a single transaction (I couldn't completely understand your use case), then you can iterate with a cursor using READ_UNCOMMITTED and delete the records in individual transactions using Database.delete. This avoids using lots of memory for the locks, since only one record is locked at a time.

          In either case the cost can be reduced by using DatabaseEntry.setPartial(0, 0, true) for the DatabaseEntry that is passed as the data parameter. You only need the key to delete the record, not the data, and avoiding a fetch of the data is a big cost savings (if the record data is not in cache). This optimization is only in JE 5.0 and above --in JE 4.1 and earlier, this has no advantage because the data is always fetched internally, as part of the deletion operation.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
          • 2. Re: Simply delete all entries in a database.
            886619
            Hm, the environment/database is used for persisting changes done by a write-transaction if a simple in-memory LRUCache overflows (or a write-to-disk option before actually commiting is activated). But once I've implemented the checkpointing which should now be really easy to implement (just adding the daemon thread in the commit()-method), another thread might read from the transaction write-ahead log (the BerkeleyDB environment/database), while the checkpointer possibly is deleting all entries in the log which have been written to the resource data-file. Thus, it is always possible to read commited versions (either from the BerkeleyDB environment/database if the checkpointer has not written the log to the resource data-dir or from the data-dir) I've simply implemented:

                 @Override
                 public void clearPersistent() throws SirixIOException {
                      final Cursor cursor = mDatabase.openCursor(null, null);
                      final DatabaseEntry valueEntry = new DatabaseEntry();
                      final DatabaseEntry keyEntry = new DatabaseEntry();
                      keyEntry.setPartial(0, 0, true);
                      OperationStatus status = cursor.getNext(keyEntry, valueEntry,
                                LockMode.DEFAULT);
                      while (status == OperationStatus.SUCCESS) {
                           try {
                                status = cursor.getNext(keyEntry, valueEntry, LockMode.DEFAULT);
                                remove(mKeyBinding.entryToObject(keyEntry));
                           } catch (final DatabaseException e) {
                                throw new SirixIOException(e.getCause());
                           }
                      }
                      cursor.close();
                 }

            A read-transaction simple reads a commit revision/version like this:

            final boolean doesExist = session.commitFile(revision).exists();
            mNodeLog = doesExist ? Optional
                      .of(new TransactionLogCache<UnorderedKeyValuePage>(
                                session.mResourceConfig.mPath, revision, "node", this)) : Optional
                      .<TransactionLogCache<UnorderedKeyValuePage>> absent();
            mNodeCache = CacheBuilder
                      .newBuilder()
                      .maximumSize(1000)
                      .expireAfterWrite(10, TimeUnit.SECONDS)
                      .expireAfterAccess(10, TimeUnit.SECONDS)
                      .concurrencyLevel(1)
                      .build(
                                new CacheLoader<Long, RecordPageContainer<UnorderedKeyValuePage>>() {
                                     public RecordPageContainer<UnorderedKeyValuePage> load(
                                               final Long key) throws SirixException {
                                          return mNodeLog
                                                    .isPresent() ? mNodeLog.get().get(key)
                                                    : pageReadTrx
                                                    .<Long, Record, UnorderedKeyValuePage> getRecordPageContainer(
                                                              key, PageKind.NODEPAGE);
                                     }
                                });

            Should be working, I think (thus it reads at first the commited transaction-log and falls back to reading and reconstructing pages from the resource data-directory if the data is not in the transaction-log, or the transaction-log is cleared. I could also delete the entries while I'm reading them to write the transaction-log to the resource data-file during the commit, thus avoiding the cost of having to traverse the database a second time for deletion. However, then it might not be easy to implement failure detection in case of a power failure or such things because of different versioning strategies and a recursive commit. Truncating the file to the last commited revision would be easier and reapply the whole log and possibly some other pages if a full-dump must be written.

            kind regards
            Johannes
            • 3. Re: Simply delete all entries in a database.
              886619
              Hm, the environment/database is used for persisting changes done by a write-transaction if a simple in-memory LRUCache overflows (or a write-to-disk option before actually commiting is activated). But once I've implemented the checkpointing which should now be really easy to implement (just adding the daemon thread in the commit()-method), another thread might read from the transaction write-ahead log (the BerkeleyDB environment/database), while the checkpointer possibly is deleting all entries in the log which have been written to the resource data-file. Thus, it is always possible to read commited versions (either from the BerkeleyDB environment/database if the checkpointer has not written the log to the resource data-dir or from the data-dir) I've simply implemented:

              <pre>
                   @Override
                   public void clearPersistent() throws SirixIOException {
                        final Cursor cursor = mDatabase.openCursor(null, null);
                        final DatabaseEntry valueEntry = new DatabaseEntry();
                        final DatabaseEntry keyEntry = new DatabaseEntry();
                        keyEntry.setPartial(0, 0, true);
                        OperationStatus status = cursor.getNext(keyEntry, valueEntry,
                                  LockMode.DEFAULT);
                        while (status == OperationStatus.SUCCESS) {
                             try {
                                  status = cursor.getNext(keyEntry, valueEntry, LockMode.DEFAULT);
                                  remove(mKeyBinding.entryToObject(keyEntry));
                             } catch (final DatabaseException e) {
                                  throw new SirixIOException(e.getCause());
                             }
                        }
                        cursor.close();
                   }
              </pre>

              A read-transaction simple reads a commit revision/version like this:

              <pre>
              final boolean doesExist = session.commitFile(revision).exists();
              mNodeLog = doesExist ? Optional
                        .of(new TransactionLogCache<UnorderedKeyValuePage>(
                                  session.mResourceConfig.mPath, revision, "node", this)) : Optional
                        .<TransactionLogCache<UnorderedKeyValuePage>> absent();
              mNodeCache = CacheBuilder
                        .newBuilder()
                        .maximumSize(1000)
                        .expireAfterWrite(10, TimeUnit.SECONDS)
                        .expireAfterAccess(10, TimeUnit.SECONDS)
                        .concurrencyLevel(1)
                        .build(
                                  new CacheLoader<Long, RecordPageContainer<UnorderedKeyValuePage>>() {
                                       public RecordPageContainer<UnorderedKeyValuePage> load(
                                                 final Long key) throws SirixException {
                                            return mNodeLog
                                                      .isPresent() ? mNodeLog.get().get(key)
                                                      : pageReadTrx
                                                      .<Long, Record, UnorderedKeyValuePage> getRecordPageContainer(
                                                                key, PageKind.NODEPAGE);
                                       }
                                  });
              </pre>

              Should be working, I think (thus it reads at first the commited transaction-log and falls back to reading and reconstructing pages from the resource data-directory if the data is not in the transaction-log, or the transaction-log is cleared. I could also delete the entries while I'm reading them to write the transaction-log to the resource data-file during the commit, thus avoiding the cost of having to traverse the database a second time for deletion. However, then it might not be easy to implement failure detection in case of a power failure or such things because of different versioning strategies and a recursive commit. Truncating the file to the last commited revision would be easier and reapply the whole log and possibly some other pages if a full-dump must be written.

              BTW: Sorry, I don't know how to format code in this messageboard.

              kind regards
              Johannes
              • 4. Re: Simply delete all entries in a database.
                rukbat
                I don't know how to format code in this messageboard.
                It is described simply and clearly in the OTN Forums FAQ
                https://forums.oracle.com/forums/help.jspa
                which is prominently linked at the top corner of every forum page, and should have been one of the first things you read after joining more than a year ago (along with the site's Terms Of Use which is linked at the bottom of every forum page).