several weeks ago we upgraded to 5.9 OEL from 5.6 and upgraded my NICs from 1gb to 10gb. all supported config. Ive had 4 random node evictions. 3 of node 1, 1 of node 2. No logs. Anywhere. nothing in ocssd, nothing in any alert log, messages log, BIOS and DRAC cards not showing anything. Have a call with Oracle for several days but nothing coming fast. Its like a plug pull when it happens.
Had soaked the config in several single instances without issue and running production single instances without issue with exact same architecture and config for same timeframe . the only failures I have is in the RAC config so Im thinking RAC bug of some sort causing complete eviction.
I downloaded raccheck tool and it reported a failure on differing NTP servers.
Check:- FAIL => All nodes are not using same NTP server across cluster
Whilst there are known bugs with NTPD daemon causing node evictions, The NTP servers it lists are all valid servers and there shouldnt be an issue with them. Heres the thing, I cant find the config file that its listing. (replace ABCXYZ with valid timeserver IPs)
node 1 has NTP servers Z,Y,Z
node 2 has NTP servers A,B,C
All are pingable and reachable and are valid timeservers both internal and external to my organization. But where does raccheck look for this information? Its not my etc/ntp.conf
any other areas to troubleshoot the node evictions welcome