Forum Stats

  • 3,768,724 Users
  • 2,252,841 Discussions
  • 7,874,696 Comments

Discussions

Database corrupted according to DbVerify - next steps?

mgrandi
mgrandi Member Posts: 18
edited Jun 15, 2016 5:18PM in Berkeley DB Java Edition

Our database has been running fine for years, but a recent run of DbVerify says that it is somewhat corrupted, giving the same error on two different computers (after archiving the database and extracting it). This is also the master's copy of the database, I haven't checked any of the replica server's copy

[2016-05-31 13:02:31] [email protected]:/Volumes/mgrandi_256SSD/security_service_data/bdb$ java -cp ~/.m2/repository/com/sleepycat/je/6.4.25/je-6.4.25.jar:/Users/markgrandi/Downloads/security-5.46.255.jar  com.sleepycat.je.util.DbVerify -h /Volumes/mgrandi_256SSD/security_service_data/bdb/ -s "persist#IntraData#com.examplecompany.security.model.entity.UnadjustedIntraDataPoint\$ArchiveUnadjustedIntraDataPoint"

Verifying database persist#IntraData#com.examplecompany.security.model.entity.UnadjustedIntraDataPoint$ArchiveUnadjustedIntraDataPoint

Checking tree for persist#IntraData#com.examplecompany.security.model.entity.UnadjustedIntraDataPoint$ArchiveUnadjustedIntraDataPoint

com.sleepycat.je.EnvironmentFailureException: (JE 6.4.25) fetchIN of null lsn parent IN=990103 IN class=com.sleepycat.je.tree.IN lastFullLsn=0xffffffff/0xffffffff lastLoggedLsn=0xffffffff/0xffffffff parent.getDirty()=false state=2 NULL_LSN in upper IN UNEXPECTED_STATE: Unexpected internal state, may have side effects.

    at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:426)

    at com.sleepycat.je.tree.IN.fetchIN(IN.java:2612)

    at com.sleepycat.je.tree.Tree.getNextIN(Tree.java:1127)

    at com.sleepycat.je.tree.Tree.getNextBin(Tree.java:982)

    at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:2587)

    at com.sleepycat.je.dbi.DatabaseImpl.walkDatabaseTree(DatabaseImpl.java:1950)

    at com.sleepycat.je.dbi.DatabaseImpl.verify(DatabaseImpl.java:1899)

    at com.sleepycat.je.util.DbVerify.verifyOneDbImpl(DbVerify.java:413)

    at com.sleepycat.je.util.DbVerify.verify(DbVerify.java:327)

    at com.sleepycat.je.util.DbVerify.main(DbVerify.java:134)

Exit status = false

Looking over the documentation, it doesn't list much about what happens in this case. Is this a JE problem? Should i rebuild the database by using the DbDump and DbLoad utilities? Will this eventually fix itself if the log file gets merged into another log file? I am also concerned about actual data being corrupted as well, and or what happens if it tries to read the data that is near the corrupted section and if it will take our entire application down with it. Any advice would be appreciated!

Answers

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    edited Jun 2, 2016 10:15AM

    Yes, this does look like a corruption, but I can't tell whether it is a persistent or transient problem. We have not seen it before.


    Are you using DeferredWrite mode?

    If the problem is transient, you should be able to shut down your application (close the Environment) and try it again. I suggest doing a DbDump of all your database, to try to capture the data. But if the problem is persistent, you may encounter the same problem -- in that case you may have to revert to a backup.

    --mark

  • mgrandi
    mgrandi Member Posts: 18
    edited Jun 2, 2016 6:50PM

    I just checked and we are not using DeferredWrite mode, and the problem seems to be persistent. If I run DbVerify multiple times I always get the same error, although I haven't seen an error when the application is running normally, but it could just be the application hasn't read the data that is in the corrupted log file.  We have replicas that are on different servers that I haven't checked with DbVerify yet, but I feel those could also be corrupted due to the replication having the master just giving the replicas the corrupted log file.

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    edited Jun 2, 2016 6:58PM

    I'm sorry that I don't know what could have caused this, since we haven't seen it. If you can save your .jdb files and je.* files for the corrupted environment, and make them available to me somehow, I can try to find the cause of the problem. If you're willing to do this, we can coordinate over email.

    Replication does not copy log files normally, the transaction replay is logical rather than physical. So unless you copied the files between nodes manually, or performed a network restore, the problem should not be duplicated on other nodes.

    --mark

  • mgrandi
    mgrandi Member Posts: 18
    edited Jun 9, 2016 6:10PM

    The database doesn't contain any sensitive information, but I would have to defer to people higher up in my company to see if they would be OK with sending it. Also, the database is quite big, around 50 gb, so transferring that might be an issue. Is there a way to see what files or set of files is corrupted? If those would work then it would be a considerably smaller download, but i'm not sure if what you are planning on looking at requires the entire database or not.

    ~Mark

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    edited Jun 9, 2016 6:30PM

    The simplest approach is for you to put the .jdb files (and je.* files) on a machine local to you (so the copy is fast) and then give me ssh access to that machine only, and only give me permission to access the directory containing these file.

    If that isn't possible, I will have to ask you to zip and upload one file at a time to our ftp server. When I look at one file, I will know what files to ask for next, etc. If you want to take this approach, please send me the complete list of files (use "ls -l") in email. mark.hayes is my email address at oracle.com. I will send you instructions for uploading.

    BTW, if you have an Oracle support contract for BDB JE, you must file a support request through the official Oracle channel, and then an Oracle support person will facilitate this. This is required by your support contract, if you have one. If not, I will by happy to help you directly. It is very important to us to find and fix any bug causing data corruption.

    Were you able to restore the corrupted node from another node in the replication group?

    --mark

  • mgrandi
    mgrandi Member Posts: 18
    edited Jun 15, 2016 1:39PM

    Question: will you need the .class / jar for the classes stored inside the database? or are you able to do what you are wanting to do with opening the database in Raw mode?

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690
    edited Jun 15, 2016 5:18PM

    I only need access to the data files, and I need to have a je.jar for running DbPrintLog, and some scratch space for storing dumps while I'm looking at them. Right, DbPrintLog does not need your key comparator classes.

    --mark

This discussion has been closed.