This discussion is archived
3 Replies Latest reply: Nov 1, 2013 7:31 PM by Karki RSS

When one node reboot other node in RAC

Karki Newbie
Currently Being Moderated

Hi Friends,

 

I faced one situation where one node of RAC cluster had been rebooted by other node. This happen due to network interconnect link fluctuation.

 

Sep 13 16:23:48 kkvs1a su: [ID 810491 auth.crit] 'su admin' failed for wipro1 on /dev/pts/3

Sep 14 00:22:17 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: link down

Sep 14 00:22:21 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: link up, , full duplex

Sep 14 00:22:31 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe1: link down

Sep 14 00:22:31 kkvs1a ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: link down


/opt/oracle/product/10.2.0/crs/log/node1/alertkk1a.log

==============================================

2013-09-14 00:22:05.180

[cssd(12561)]CRS-1612:node kk1b (2) at 50% heartbeat fatal, eviction in 14.251 seconds

2013-09-14 00:22:12.180

[cssd(12561)]CRS-1611:node kk1b (2) at 75% heartbeat fatal, eviction in 7.251 seconds

2013-09-14 00:22:13.180

[cssd(12561)]CRS-1611:node kk1b (2) at 75% heartbeat fatal, eviction in 6.251 seconds

2013-09-14 00:22:17.179

[cssd(12561)]CRS-1610:node kk1b (2) at 90% heartbeat fatal, eviction in 2.251 seconds

2013-09-14 00:22:18.180

[cssd(12561)]CRS-1610:node kkvs1b (2) at 90% heartbeat fatal, eviction in 1.251 seconds

 

This clearly shows CSSD of node kkvs1a has given node eviction message to kkvs1b node.


I got following messages on the instance which got rebooted:

ASM alert log:

Sat Sep 14 00:22:25 IST 2013

Error: KGXGN aborts the instance (6)

Sat Sep 14 00:22:25 IST 2013

Errors in file /opt/oracle/admin/+ASM/bdump/+asm2_lmon_8527.trc:

ORA-29702: error occurred in Cluster Group Service operation

LMON: terminating instance due to error 29702

A network fluctuation shouldn't give reboot like this. Then why oracle design like this way? Is this a bug? My oracle version is: 10.2.0.5.0


Could you tell me the other possible situations when 1 RC instance reboots other RAC instacne.




Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points