1 2 Previous Next 27 Replies Latest reply on Oct 31, 2009 2:17 AM by Charles Lamb Go to original post
      • 15. Re: Corruption on Windows 7 ?
        Any progress on this issue? Does there appear to be a workaround? Or do we need to wait for a patch?
        • 16. Re: Corruption on Windows 7 ?
          Charles Lamb
          I have successfully reproduced it using 3.3.87. I would like you to try out 4.0 on your machine to see if it happens there (it does no on my machine with 4.0). If you send me an email I will give you instructions on how to download 4.0.

          Charles Lamb
          • 17. Re: Corruption on Windows 7 ?
            I've been testing a 100gb+ dataset using the 4.0.60 build, and so far no issues at all. It seems to have resolved the problem. What was the root cause? Is windows 7 doing something funky with the filesystem now?
            • 18. Re: Corruption on Windows 7 ?
              Charles Lamb
              I'm not ready to say what the cause is quite yet. I am reasonably certain I understand the problem, but am not ready to go out on a limb and say what it is in case I'm wrong. I'll post a report when I am certain.

              Charles Lamb
              • 19. Re: Corruption on Windows 7 ?
                Charles Lamb
                At this point I am reasonably certain that the problem has to do with a write() call being initiated on a file when an fsync() is already in progress in another thread (i.e. a concurrent fsync and write on the same file, but not with the same file descriptors). JE routinely performs concurrent IO operations on a given file. In the particular test case that user Ambber sent me, it is by virtue of the checkpointer initiating an fsync while the user application thread is writing.

                It turns out that in ext3 we previously encountered a performance slowdown because that file system takes an exclusive mutex on the inode for any IO operation, and therefore an fsync will block reads and writes. JE 4.0 has a "fix" to this problem which is described here .

                That said, there seems to be a true Windows 7 bug here, if for no other reason than I can observe corruption on sector boundaries in the log files (JE does no operations on sector boundaries).

                Charles Lamb

                Edited by: Charles Lamb on Oct 28, 2009 7:58 AM
                • 20. Re: Corruption on Windows 7 ?
                  It is reproduced not only on Windows 7. I have the same issue but now on the linux machine. It were reproduced on java 1.5_07 and 1.6_16. I'm using MontaVista Linux with ext3 filesystem. The exception I got was the same:

                  <DaemonThread name="Cleaner-1"/> caught exception: com.sleepycat.je.log.DbChecksumException: (JE 3.3.87) Location 0x0/0x488bd70 expected 3031505114 got 3055098078
                  com.sleepycat.je.log.DbChecksumException: (JE 3.3.87) Location 0x0/0x488bd70 expected 3031505114 got 3055098078
                       at com.sleepycat.je.log.ChecksumValidator.validate(ChecksumValidator.java:96)
                       at com.sleepycat.je.log.FileReader.validateChecksum(FileReader.java:593)
                       at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:314)
                       at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:386)
                       at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:233)
                       at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:138)
                       at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:141)
                       at java.lang.Thread.run(Unknown Source)

                  It was reproduced 3 times when database file was getting big. Maximum size of each JE log was set to 200Mb.
                  • 21. Re: Corruption on Windows 7 ?
                    Charles Lamb
                    Can I have a look at the 00000000.jdb log file? You can use ftp.oracle.com and put it in the berkeleydb/incoming directory.

                    Charles Lamb
                    • 22. Re: Corruption on Windows 7 ?
                      Charles Lamb
                      Also, could you please let us know what parameters you had set, including VERIFY_CHECKSUMS? Or was this the JETester program that you gave us? Does DbPrintLog -s 0x0 -e 0x1 produce the same exception? Does it happen on ext3 w/ JE 4.0?

                      • 23. Re: Corruption on Windows 7 ?
                        The database configuration is following:
                        EnvironmentConfig envConf = new EnvironmentConfig();
                        envConf.setConfigParam(EnvironmentConfig.LOG_FILE_MAX, 200000000);
                        envConf.setConfigParam(EnvironmentConfig.EVICTOR_LRU_ONLY, "false");
                        envConf.setConfigParam(EnvironmentConfig.EVICTOR_FORCED_YIELD, "false");
                        envConf.setConfigParam(EnvironmentConfig.ENV_FORCED_YIELD, "true");
                        envConf.setConfigParam(EnvironmentConfig.LOG_USE_NIO, "true");
                        envConf.setConfigParam(EnvironmentConfig.LOG_DIRECT_NIO, "true");
                        envConf.setConfigParam(EnvironmentConfig.LOG_CHUNKED_NIO, "4096");

                        The database used in non-transactional mode and VERIFY_CHECKSUMS as You can see is not set.
                        After getting this error I'd restarted everything from scratch, so 00000000.jdb was lost. I'll try to get this error one more time and will give you it.
                        I will try JETester on that configuration.

                        I didn't triesd JE 4.0, cause I don't know where to get it.
                        • 24. Re: Corruption on Windows 7 ?
                          Charles Lamb
                          Please turn off all of the NIO stuff. It is of no performance benefit. In fact, it is disabled in 4.0. This is likely the cause of the checksum errors (I hate to admit it).

                          Charles Lamb
                          • 25. Re: Corruption on Windows 7 ?
                            Thank's a lot. It seems it helped.
                            • 26. Re: Corruption on Windows 7 ?
                              Thanks for the update.

                              I've run the JETester against 4.0.60 several times now. I haven't seen the checksum error at all, both in JETester and my own app. Is this fix likely to be put into the 3.x series release?
                              • 27. Re: Corruption on Windows 7 ?
                                Charles Lamb
                                We will definitely do something for the 3.3.x codeline.

                                Charles Lamb
                                1 2 Previous Next