Disclaimer - I am not a DBA. I am a storage admin looking at this because to the DBAs it appears to be a storage issue....
We have two 11.2 databases in a DG relationship. The logs are written to an NFS share on the standby node then applied. I am seeing what appears to be corruption in the logs, either during the log transfer or when they are written to disk on the standby node. I say the logs appear corrupted because a)Oracle complains when trying to apply them (ORA-00317: file type 0 in header is not log file); b)running the destination log file through od shows almost all nulls; and c)running the destination log file through strings gives only a few lines. Running the source log files through od and strings yield a lot of information, which I'm using as the definition of a "good" log file.
When the log files are manually copied over to the standby with ssh, and then applied, it works fine; which initially points me to the log transfer process, but if we use the same process and point to a different NFS share, it works fine (which points me to the NFS share, but the same data written via ssh appears fine...).
I'm currently trying to trace the data path using tcpdump and strace; I'd like to watch the output stream from the production node, compare it to the source log file, the input stream on the standby node, and the destination log file.
One thing I've noticed iis that the logs on the standby seldom match the logs on the production node. Sometimes they do - but sometimes they differ, usually only by a few bytes, but not at the same spot. For example, one set of logs will match, another will differ by 2 bytes at 100 and 101, and another will differ by 2 bytes at 110 and 115. Also, in my initial attempts at comparing the network traffic with the source log file, I've been unable to find an exact match.
So my questions are:
1. Is it expected that the log on the standby node may not be identical to the log on the production node, and is there any pattern to the differences?
2. Is it expected that the stream written to the network during log transfers may not match the production log?
3. Is there a better way to troubleshoot this?
Thanks for the reply! I neglected to mention that the logs that differ by a few bytes on the standby node get applied just fine - which is why I thought it might be possible that the log transfer process might alter the file a bit - though if that were the case, I would expect ALL log files to differ with the same byte count and location. The log files that are corrupted are obviously so - they are full of nulls.
Of the two links you suggested - when I'm switching between the working NFS mount and the non-working NFS mount, the mount options are the same, so I don't think that can be the issue. The only difference in the NFS mounts is that one sits behind a 10G LACP interface, the other sits behind a 1G interface - shouldn't really be an issue that I can see. I DO see a few errors on the 1G link - which I find highly suspect - but I can't explain why the errors would only manifest during the DG log transfer and not an ssh copy. I am working with NetApp on that topic, but I wanted to try troubleshooting this from the Oracle side as well.