3 Replies Latest reply: Oct 3, 2013 7:31 AM by DBsync RSS

    Scan Listener is getting down frequently

    DBsync

      Hi,

       

      Here i have two node cluster with 3 scan IP's, Recently i have started facing the problem with scan listener it is frequently used to get down and after it shuts down the local listener also and i need to start it manually as follow.

       

      oracle>crs_start ora.scan1.vip

      oracle>crs_start ora.scan2.vip

      oracle>crs_start ora.scan3.vip

       

      oracle>srvctl start listener

       

      What are all the ways to correct it and what are all the logs do i need to refer to trace the issue.

        • 1. Re: Scan Listener is getting down frequently
          Sukrut


          Hi

           

          Any change i.e pach installation happened  in recent past ?

          Please could you send the alert log file when system goes down.

           

          Sukrut

          • 2. Re: Scan Listener is getting down frequently
            DBsync

            Hi ,

             

            There was no changes in the system for last 7 month's, But few things has been absorbed from the crsd.log and /var/adm/messages.

             

            crsd.log

             

            Failover cannot be completed for [ora.pr-oradb2.vip 1 1]. Stopping it and the resource tree

            CRS-5017: The resource action "ora.scan2.vip start" encountered the following error

            CRS-5008: Invalid attribute value: igb0 igb1 for the network interface

            CRSPE][51] {0:3:21341} Received reply to action [Start] message ID: 1084879

            CRSPE][51] {0:3:21341} Start action failed with error code: 2

            CRSRPT][52] {0:3:21341} Published to EVM CRS_ACTION_FAILURE for ora.scan2.vip

             

            /var/adm/messages

            The below mentioned messages continuously appearing on the OS log.

             

            NIC failure detected on igb0 of group ipmp0

            Successfully failed over from NIC igb0 to NIC igb1

            NIC repair detected on igb0 of group ipmp0

            Successfully failed back to NIC igb0

             

            Is there any clue on this ??

            • 3. Re: Scan Listener is getting down frequently
              DBsync

              Hi,

               

              This issue has been resolved by correcting the IPMP configuration on the server end.

               

              System configuration info

               

              Oracle 11gr2 with two node cluster

              Operating System: Sun solaris 10

               

              Node 1 IP-172.16.6.16

              Node 2 IP-172.16.6.19

               

               

              Error

               

              /oracle/11.2.0/grid/log/node1/crsd/crsd.log

               

              CRS-5008: Invalid attribute value: igb0 igb1 for the network interface

               

               

              Impact on the system:

               

              Whenever we encounter with this error CRS-5008 scan,vip and listener services used to get down. Really got confused because CRS-5008 appears on both the node, Then did the research more on the OS logs.

               

               

              Solution followed.

               

              To avoid the NIC card flapping the static route has been added in the routing table.

               

              # route -p add -host 172.16.6.16 172.16.6.16 -static

              # route -p add -host 172.16.6.19 172.16.6.19 -static

               

               

              Analysing the OS log.

               

              more /var/adm/messages

               

              NIC failure detected on igb1 of group ipmp0

              Successfully failed over from NIC igb1 to NIC igb0

              All Interfaces in group ipmp0 have failed

              NIC repair detected on igb0 of group ipmp0

              Successfully failed back to NIC igb0

              At least 1 interface (igb0) of group ipmp0 has repaired

              NIC repair detected on igb1 of group ipmp0

              Successfully failed back to NIC igb1

               

              Here is the actual trap this log shows us our NIC used get flap frequently.

               

              Reason for the flapping.

               

              As we have configured the ipmp which is probe-based active/active.

               

              IPMP probe-based uses ICMP Echo packets to check network health

              1. status. ICMP can pick up following IPMP probe targets:

               

                    - default router if configured,

                    - static routes if configured,

                    - multicast subnet neighbours if nothing else is configured.

               

              Now we need to check whether we have some static routes or not, If it is not we supposed to add static routes to avoid the flapping on the NIC interfaces.

               

              Here is the routing table, Bit to worry here we have only one default router as gateway. Default router here is a single point of failure. If router is not able to handle ICMP Echo packets

              we will see the link flaps. By the way ICMP Echo transmission is very low in

              Priority so it's very easy to drop such traffic if network is busy.

               

              oracle>netstat -rn

               

              Routing Table: IPv4

                Destination           Gateway           Flags  Ref     Use     Interface

              -------------------- -------------------- ----- ----- ---------- ---------

              default                  172.16.5.1              UG        1        829

              1. 169.254.0.0          169.254.230.26       U         1         65 aggr1:1
              2. 172.16.5.0           172.16.6.16          U         1     330956 igb0
              3. 172.16.5.0           172.16.6.16          U         1          0 igb0:1
              4. 172.16.5.0           172.16.6.16          U         1          0 igb0:2
              5. 172.16.5.0           172.16.6.16          U         1          0 igb0:3
              6. 172.16.5.0           172.16.6.16          U         1          0 igb0:4
              7. 172.16.5.0           172.16.6.16          U         1         10 igb1

               

               

              Login as a root user add the static route as follows.

               

              node1 # route -p add -host 172.16.6.16 172.16.6.16 -static

              node1 # route -p add -host 172.16.6.19 172.16.6.19 -static

               

               

              oracle>netstat -rn

               

              Routing Table: IPv4

                Destination           Gateway           Flags  Ref     Use     Interface

              -------------------- -------------------- ----- ----- ---------- ---------

              default                  172.16.5.1              UG        1        829

              1. 169.254.0.0          169.254.230.26       U         1         65 aggr1:1
              2. 172.16.5.0           172.16.6.16          U         1     330956 igb0
              3. 172.16.5.0           172.16.6.16          U         1          0 igb0:1
              4. 172.16.5.0           172.16.6.16          U         1          0 igb0:2
              5. 172.16.5.0           172.16.6.16          U         1          0 igb0:3
              6. 172.16.5.0           172.16.6.16          U         1          0 igb0:4
              7. 172.16.5.0           172.16.6.16          U         1         10 igb1
              8. 172.16.6.16          172.16.6.16          UGH       1          0
              9. 172.16.6.19          172.16.6.19          UGH       1         17

               

              That's it the issue has been fixed.