This discussion is archived
5 Replies Latest reply: Apr 1, 2012 11:41 AM by athompson88 RSS

Grid Infrastructure intense IO on local device

athompson88 Newbie
Currently Being Moderated
I have installed 11gR2 and configured a 2-node RAC on some home systems. After rebooting the systems, and randomly, one of the nodes will begin experiencing intense IO on the local drive where the software is installed. The shared storage where the OCR/VDISK/DB files are located is seeing almost no activity, even with the database started. The IO jumps as soon as the CRS is started, and disappears as soon as the final crs process stops. No matter how long I leave the host up, it just keeps hammering away on that local drive. I don't even know where to begin looking.

Restoring the entire host from a backup resolves the issue, but inevitably, the problem will return on one host or the other. I'm running centOS-5 and the grid version 11.2.0.3. The host has 4 gigs of memory in it. Oddly enough, I checked to see which logs were getting written to regularly in the GI home. There were of course many that are constantly getting updated. So I compared what was being generated before and after the issue presents. I found something in the crflogd.log.

2012-03-31 22:25:40.296: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:25:40.296: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:26:37.455: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:26:37.455: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:27:32.262: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:27:32.262: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

This is now constantly being generated. It was not prior to the issue. Might be unrelated, but again, I don't know for sure. What is odd is that I appear to have plenty of memory available:

total used free shared buffers cached
Mem: 4043760 2364212 1679548 0 79588 1153528
-/+ buffers/cache: 1131096 2912664
Swap: 5144568 0 5144568

Could it be related to the ASM instance? Hope someone can help me out on this one.

Edited by: athompson88 on Mar 31, 2012 10:30 PM
  • 1. Re: Grid Infrastructure intense IO on local device
    Levi-Pereira Guru
    Currently Being Moderated
    Hi,
    I'm running centOS-5 and the grid version 11.2.0.3. The host has 4 gigs of memory in it.
    Why you are not using a certified platform? You can not open an SR at Oracle Support because you are using CENTOS, so if Oracle can not help you, hardly anyone will be able to resolve.
    crflogd.log
    2012-03-31 22:25:40.296: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    2012-03-31 22:25:40.296: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    2012-03-31 22:26:37.455: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    2012-03-31 22:26:37.455: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    2012-03-31 22:27:32.262: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    2012-03-31 22:27:32.262: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    As test try stop CHM. (the error above should stop)
    $GRID_HOME/bin/crsctl stop res ora.crf -init 
    Also recommend you read:
    Cluster Health Monitor (CHM) FAQ [ID 1328466.1]

    Regards,
    Levi Pereira
  • 2. Re: Grid Infrastructure intense IO on local device
    athompson88 Newbie
    Currently Being Moderated
    Just to eliminate it as a cause, I increased the memory to 8GB and the issue went away. There must have been some caching going on somewhere that wasn't showing up in the "free" command. I dropped it back to 4G and the IO returned. I then tried 5GB, then 6GB -- both had the increased IO. So I put it back on 8GB and again, the IO went away.

    Entries to that log entirely ceased at the 8GB setting, so there's definitely some correlation, I just don't have the expertise in CRS to known what that might be or how to determine it. At the very least, the processes dumping to that log were pacified. Here is the current output from the "free" command

    total used free shared buffers cached
    Mem: 8174560 1591124 6583436 0 62872 744480
    -/+ buffers/cache: 783772 7390788
    Swap: 5144568 0 5144568

    There's less memory overall being used and less being cached, the ratios seem to be the same.

    The question remains: What is triggering the CRS to suddenly require considerably more memory? Starting the database and an OEM agent prodecued the following memory changes.

    total used free shared buffers cached
    Mem: 8174560 2627828 5546732 0 80128 1223612
    -/+ buffers/cache: 1324088 6850472
    Swap: 5144568 0 5144568
  • 3. Re: Grid Infrastructure intense IO on local device
    athompson88 Newbie
    Currently Being Moderated
    Levi Pereira wrote:
    Why you are not using a certified platform? You can not open an SR at Oracle Support because you are using CENTOS, so if Oracle can not help you, hardly anyone will be able to resolve.
    If you read my original post closely, this is something I'm running into on my systems at home which are just test labs for me to explore software options and configurations before considering them for the company. I will never open a support ticket for any of these systems.

    I'll take a look at your suggestions. Hopefully something crops up.
  • 4. Re: Grid Infrastructure intense IO on local device
    Levi-Pereira Guru
    Currently Being Moderated
    Hi,
    Why you are not using a certified platform? You can not open an SR at Oracle Support because you are using CENTOS, so if Oracle can not help you, hardly anyone will be able to resolve.
    If you read my original post closely, this is something I'm running into on my systems at home which are just test labs for me to explore software options and configurations before considering them for the company. I will never open a support ticket for any of these systems.
    The point is not the fact "open SR".
    The software is not supported on that platform. This means that dozens of problems can occur that were never experienced by anyone before. This issue opened this thread that you can be one of them.
    [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
    The example above is an error returned by CHM which was designed to detect and analyze operating system (OS) and cluster resource related degradation and failures. But was deployed to specific platform, because it interacts with the operating system. As you are using non-certified plaform the error show up.

    Regardless of the purpose we must use a certified platform. (We can find the Oracle platform and software available for download free of charge at site Oracle Software Delivery Cloud)

    Levi Pereira
  • 5. Re: Grid Infrastructure intense IO on local device
    athompson88 Newbie
    Currently Being Moderated
    Shutting down the CRF eliminated the IO issues. I'm not sure if it just hadn't been running, and something caused it to start up, but at least now I have a target.

    Incidentally, the reason I use centOS instead of Oracle Enterprise Linux is because centOS is closer to RHEL (what we use at the company) than OEL. However, RHEL requires a support license to access package and kernel updates and centOS does not. It's quite possible some discrepancy in centOS would cause this problem, but I am skeptical of that. centOS is RHEL without the branding. They don't make any other changes to the OS and in fact as you'll see online, many do-it-yourself guides by very astute oracle professionals recommend centOS as a stand-in for RHEL for these types of configurations.

    Nonetheless, I have a target now, so I can begin looking deeper into it. Thank you for this command and insight, it was very helpful and appears spot-on.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points