This discussion is archived
9 Replies Latest reply: Dec 7, 2008 11:50 AM by 673348 RSS

Infinite loop in getNextIN()

673348 Newbie
Currently Being Moderated
There is a hang in \src\com\sleepycat\je\evictor\SharedEvictor.java (version 3.3.75) in the function getNextIN().
If the value of rotationIndex is zero upon entry and isEvictionAllowed(subject) returns false for all subjects then the function will infinite loop.
The reason is that the increment of rotationIndex is too high in the loop so that at line 247: if (rotationIndex == initialIndex) {
initialIndex will be 0 and the range of rotationIndex is 1 to nSubjects.

Please move the increment of rotationIndex to the bottom of the while loop.
  • 1. Re: Infinite loop in getNextIN()
    greybird Expert
    Currently Being Moderated
    Hi,

    You are correct that this is a bug. Thank you very much for reporting this and for pointing out exactly what is wrong! We will fix this in a future release.

    This bug should only occur when all environments that are sharing the cache have a small number of Btree nodes in memory. The isEvictionAllowed method that you mentioned returns true if the environment is using less than 500 KB of cache memory for Btree nodes. So either the total cache is small or the number of environments is very large.

    What is the cache size and number of environments you are using?

    Thanks again!
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 2. Re: Infinite loop in getNextIN()
    673348 Newbie
    Currently Being Moderated
    We are using the default cache size. How would one change it? Here is our config:
    EnvironmentConfig envConf = new EnvironmentConfig();
    envConf.setSharedCache(true);
    envConf.setAllowCreate(!readonly);
    envConf.setReadOnly(readonly);
    envConf.setTransactional(!readonly);
    envConf.setConfigParam(EnvironmentConfig.ENV_CHECK_LEAKS,
    "true");
    envConf.setConfigParam(EnvironmentConfig.LOG_USE_NIO, "true");
    envConf
    .setConfigParam(EnvironmentConfig.LOG_DIRECT_NIO,
    "true");
    if (readonly) {
    envConf.setConfigParam(EnvironmentConfig.ENV_RUN_CLEANER,
    "false");
    } else {
    envConf.setConfigParam(EnvironmentConfig.CLEANER_EXPUNGE,
    "true");
    }
    env = new Environment(getStoreLocation(), envConf);

    We do have a large number of environments. This test was run with about 50-100 threads opening up readonly environments randomly out of a pool of about 500.
    In the future we will have about >100K environments. Each environment has one database and about 5K records. The average record size is about 12K.
    The large total data size prevents using just one environment.

    Any suggestions as to the proper configuration are appreciated.
  • 3. Re: Infinite loop in getNextIN()
    greybird Expert
    Currently Being Moderated
    Hi,

    With the default Java heap size (64 MB) and JE cache size (60% of the heap), the JE cache size will be around 38 MB. With 100 environments open, and all using equal portions of the cache, each environment will only have around 390 KB of cache. That's on the extreme low end of the amount of cache needed to function. If only a small number of environments are "hot" (active), then the hot environments will use the majority of the cache. But if all 100 are active, then this isn't enough cache per environment to get good performance.

    The cache size is set using EnvironmentConfig (or je.properties):
    http://www.oracle.com/technology/documentation/berkeley-db/je/java/com/sleepycat/je/EnvironmentConfig.html#MAX_MEMORY

    In general, performance is directly related to the amount of cache configured -- the more the better. If you configure cache size by percentage of the heap size, be sure to specify the Java heap size explicitly.

    The following should be removed. NIO is deprecated and should not be used:
    envConf.setConfigParam(EnvironmentConfig.LOG_USE_NIO, "true");
    envConf
    .setConfigParam(EnvironmentConfig.LOG_DIRECT_NIO,
    "true");

    The following can be removed since they are defaults:
    envConf.setConfigParam(EnvironmentConfig.ENV_CHECK_LEAKS,
    "true");
    if (readonly) {
    envConf.setConfigParam(EnvironmentConfig.ENV_RUN_CLEANER,
    "false");
    } else {
    envConf.setConfigParam(EnvironmentConfig.CLEANER_EXPUNGE,
    "true");
    }

    For the data set you describe, it sounds like the log size would be about 100 MB for each environment and 10 TB total. JE supports logs that large. So you don't need to use multiple environments because of the log size. A large log does require some tuning, but it is not difficult and we can advise you. For example, you may want to configure the log file size to be a larger value, say 50 or 100 MB instead of the default 10 MB.

    Normally, multiple environments are only used when it is important to be able to move or delete the individual environment directories (rather than the entire single environment directory) on disk. For example, if you have a hosting service for multiple clients, you may want to keep a separate environment per client so that it is completely separate from other clients and can be backed up and restored separately.

    Multiple environments also allow using multiple disks, which can have a performance advantage for some applications. Also, since only a single writer process may access an environment, multiple environments allow for multiple writer processes.

    Note that you cannot perform a transaction for records in more than one environment.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 4. Re: Infinite loop in getNextIN()
    greybird Expert
    Currently Being Moderated
    Hi Kevin,

    I've been trying to reproduce the problem you pointed out in a test case, and I have a couple questions for you.

    I see the problem in the source code that you pointed out, but it never seems to occur in my tests. As you say, rotationIndex must be zero upon entry. But this only occurs once in the entire lifetime of the shared cache. And in that one case, the cache has overflowed and isEvictionAllowed should return false for at least one environment.

    Obviously, there is something I'm not seeing.

    Did you encounter this problem during an Environment open (during the constructor execution) for an existing environment? Or at some other time?

    Is there anything else you can tell me about the circumstances when the problem occurred? By chance did you take a thread dump (please send it if you have one)?

    Thanks,
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 5. Re: Infinite loop in getNextIN()
    673348 Newbie
    Currently Being Moderated
    The thread in getNextIN is:
    Thread "httpSSLWorkerThread-8080-40" thread-id 1,158 thread-stateRUNNABLE
         at: com.sleepycat.je.evictor.SharedEvictor.getNextIN(SharedEvictor.java:262)
         at: com.sleepycat.je.evictor.Evictor.selectIN(Evictor.java:498)
         at: com.sleepycat.je.evictor.Evictor.evictBatch(Evictor.java:323)
         at: com.sleepycat.je.evictor.Evictor.doEvict(Evictor.java:244)
         at: com.sleepycat.je.evictor.Evictor.doCriticalEviction(Evictor.java:269)
         at: com.sleepycat.je.dbi.CursorImpl.close(CursorImpl.java:711)
         at: com.sleepycat.je.Cursor.close(Cursor.java:326)
         at: com.sleepycat.je.Database.get(Database.java:769)

    There is one thread with this stack:
    Thread "httpSSLWorkerThread-8080-18" thread-id 1,124 thread-stateBLOCKEDWaiting on lock: com.sleepycat.je.evictor.SharedEvictor@1381f90
         Owned by: httpSSLWorkerThread-8080-40 Id: 1,158     at: com.sleepycat.je.evictor.Evictor.doEvict(Evictor.java:228)
         at: com.sleepycat.je.evictor.Evictor.doEvict(Evictor.java:208)
         at: com.sleepycat.je.dbi.EnvironmentImpl.invokeEvictor(EnvironmentImpl.java:1663)
         at: com.sleepycat.je.recovery.RecoveryManager.redoLNs(RecoveryManager.java:1176)
         at: com.sleepycat.je.recovery.RecoveryManager.buildTree(RecoveryManager.java:465)
         at: com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:158)
         at: com.sleepycat.je.dbi.EnvironmentImpl.(EnvironmentImpl.java:389)
         at: com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:147)
         at: com.sleepycat.je.Environment.(Environment.java:210)
         at: com.sleepycat.je.Environment.(Environment.java:150)

    And many threads with this stack:
    Thread "httpSSLWorkerThread-8080-19" thread-id 1,127 thread-stateBLOCKEDWaiting on lock: com.sleepycat.je.dbi.DbEnvPool@c0e9d6
         Owned by: httpSSLWorkerThread-8080-18 Id: 1,124     at: com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:106)
         at: com.sleepycat.je.Environment.(Environment.java:210)
         at: com.sleepycat.je.Environment.(Environment.java:150)


    There were a few OutOfMemoryError exceptions thrown before this hang.
  • 6. Re: Infinite loop in getNextIN()
    673348 Newbie
    Currently Being Moderated
    One more thing. The VMArgs were "-Xmx1024M"
  • 7. Re: Infinite loop in getNextIN()
    673348 Newbie
    Currently Being Moderated
    I am only able to reproduce this problem with both LOG_USE_NIO and LOG_DIRECT_NIO set to true whuch you stated above should not be used.
  • 8. Re: Infinite loop in getNextIN()
    greybird Expert
    Currently Being Moderated
    Thanks Keven, that's interesting. Do you get an OutOfMemoryError prior to the infinite loop?

    If so, the OOME could be causing JE methods to exit, leaving things in a bad state. And the OOME could be the result of using NIO, since NIO has problems with Java garbage collection.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 9. Re: Infinite loop in getNextIN()
    673348 Newbie
    Currently Being Moderated
    I only got the OutOfMemoryErrors with both LOG_USE_NIO and LOG_DIRECT_NIO. There were many OOMEs before the hang.

    Kevin

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points