This discussion is archived
2 Replies Latest reply: Dec 19, 2012 8:24 AM by 980644 RSS

log corruption during dataguard transfer

980644 Newbie
Currently Being Moderated
Disclaimer - I am not a DBA. I am a storage admin looking at this because to the DBAs it appears to be a storage issue....

We have two 11.2 databases in a DG relationship. The logs are written to an NFS share on the standby node then applied. I am seeing what appears to be corruption in the logs, either during the log transfer or when they are written to disk on the standby node. I say the logs appear corrupted because a)Oracle complains when trying to apply them (ORA-00317: file type 0 in header is not log file); b)running the destination log file through od shows almost all nulls; and c)running the destination log file through strings gives only a few lines. Running the source log files through od and strings yield a lot of information, which I'm using as the definition of a "good" log file.

When the log files are manually copied over to the standby with ssh, and then applied, it works fine; which initially points me to the log transfer process, but if we use the same process and point to a different NFS share, it works fine (which points me to the NFS share, but the same data written via ssh appears fine...).

I'm currently trying to trace the data path using tcpdump and strace; I'd like to watch the output stream from the production node, compare it to the source log file, the input stream on the standby node, and the destination log file.

One thing I've noticed iis that the logs on the standby seldom match the logs on the production node. Sometimes they do - but sometimes they differ, usually only by a few bytes, but not at the same spot. For example, one set of logs will match, another will differ by 2 bytes at 100 and 101, and another will differ by 2 bytes at 110 and 115. Also, in my initial attempts at comparing the network traffic with the source log file, I've been unable to find an exact match.

So my questions are:
1. Is it expected that the log on the standby node may not be identical to the log on the production node, and is there any pattern to the differences?
2. Is it expected that the stream written to the network during log transfers may not match the production log?
3. Is there a better way to troubleshoot this?

I appreciate any help
Bill
  • 1. Re: log corruption during dataguard transfer
    mseberg Guru
    Currently Being Moderated
    Bill;

    1. Is it expected that the log on the standby node may not be identical to the log on the production node, and is there any pattern to the differences?

    I would expect them to be the same.

    2. Is it expected that the stream written to the network during log transfers may not match the production log?

    No.

    ORA-00317 - tells me your file transfers are either corrupt or incomplete.

    The "will differ by 2 bytes at 100 and 101" statement just confirms this.

    3. Is there a better way to troubleshoot this?

    This MOS doc might be worth a look.

    NetApp: Using 'nolock' NFS Mount Option with non-RAC Systems Results in Database Corruption [ID 430920.1]

    OERR: ORA-27054 NFS file system where the file is created or resides is not mounted with correct options" [ID 338086.1]


    It's not a very common error. If you have Oracle support I would use it.

    Best Regards

    mseberg
  • 2. Re: log corruption during dataguard transfer
    980644 Newbie
    Currently Being Moderated
    mseberg:

    Thanks for the reply! I neglected to mention that the logs that differ by a few bytes on the standby node get applied just fine - which is why I thought it might be possible that the log transfer process might alter the file a bit - though if that were the case, I would expect ALL log files to differ with the same byte count and location. The log files that are corrupted are obviously so - they are full of nulls.

    Of the two links you suggested - when I'm switching between the working NFS mount and the non-working NFS mount, the mount options are the same, so I don't think that can be the issue. The only difference in the NFS mounts is that one sits behind a 10G LACP interface, the other sits behind a 1G interface - shouldn't really be an issue that I can see. I DO see a few errors on the 1G link - which I find highly suspect - but I can't explain why the errors would only manifest during the DG log transfer and not an ssh copy. I am working with NetApp on that topic, but I wanted to try troubleshooting this from the Oracle side as well.

    Bill

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points