This discussion is archived
3 Replies Latest reply: Oct 3, 2013 5:31 AM by DBsync RSS

Scan Listener is getting down frequently

DBsync Newbie
Currently Being Moderated

Hi,

 

Here i have two node cluster with 3 scan IP's, Recently i have started facing the problem with scan listener it is frequently used to get down and after it shuts down the local listener also and i need to start it manually as follow.

 

oracle>crs_start ora.scan1.vip

oracle>crs_start ora.scan2.vip

oracle>crs_start ora.scan3.vip

 

oracle>srvctl start listener

 

What are all the ways to correct it and what are all the logs do i need to refer to trace the issue.

  • 1. Re: Scan Listener is getting down frequently
    Sukrut Newbie
    Currently Being Moderated


    Hi

     

    Any change i.e pach installation happened  in recent past ?

    Please could you send the alert log file when system goes down.

     

    Sukrut

  • 2. Re: Scan Listener is getting down frequently
    DBsync Newbie
    Currently Being Moderated

    Hi ,

     

    There was no changes in the system for last 7 month's, But few things has been absorbed from the crsd.log and /var/adm/messages.

     

    crsd.log

     

    Failover cannot be completed for [ora.pr-oradb2.vip 1 1]. Stopping it and the resource tree

    CRS-5017: The resource action "ora.scan2.vip start" encountered the following error

    CRS-5008: Invalid attribute value: igb0 igb1 for the network interface

    CRSPE][51] {0:3:21341} Received reply to action [Start] message ID: 1084879

    CRSPE][51] {0:3:21341} Start action failed with error code: 2

    CRSRPT][52] {0:3:21341} Published to EVM CRS_ACTION_FAILURE for ora.scan2.vip

     

    /var/adm/messages

    The below mentioned messages continuously appearing on the OS log.

     

    NIC failure detected on igb0 of group ipmp0

    Successfully failed over from NIC igb0 to NIC igb1

    NIC repair detected on igb0 of group ipmp0

    Successfully failed back to NIC igb0

     

    Is there any clue on this ??

  • 3. Re: Scan Listener is getting down frequently
    DBsync Newbie
    Currently Being Moderated

    Hi,

     

    This issue has been resolved by correcting the IPMP configuration on the server end.

     

    System configuration info

     

    Oracle 11gr2 with two node cluster

    Operating System: Sun solaris 10

     

    Node 1 IP-172.16.6.16

    Node 2 IP-172.16.6.19

     

     

    Error

     

    /oracle/11.2.0/grid/log/node1/crsd/crsd.log

     

    CRS-5008: Invalid attribute value: igb0 igb1 for the network interface

     

     

    Impact on the system:

     

    Whenever we encounter with this error CRS-5008 scan,vip and listener services used to get down. Really got confused because CRS-5008 appears on both the node, Then did the research more on the OS logs.

     

     

    Solution followed.

     

    To avoid the NIC card flapping the static route has been added in the routing table.

     

    # route -p add -host 172.16.6.16 172.16.6.16 -static

    # route -p add -host 172.16.6.19 172.16.6.19 -static

     

     

    Analysing the OS log.

     

    more /var/adm/messages

     

    NIC failure detected on igb1 of group ipmp0

    Successfully failed over from NIC igb1 to NIC igb0

    All Interfaces in group ipmp0 have failed

    NIC repair detected on igb0 of group ipmp0

    Successfully failed back to NIC igb0

    At least 1 interface (igb0) of group ipmp0 has repaired

    NIC repair detected on igb1 of group ipmp0

    Successfully failed back to NIC igb1

     

    Here is the actual trap this log shows us our NIC used get flap frequently.

     

    Reason for the flapping.

     

    As we have configured the ipmp which is probe-based active/active.

     

    IPMP probe-based uses ICMP Echo packets to check network health

    1. status. ICMP can pick up following IPMP probe targets:

     

          - default router if configured,

          - static routes if configured,

          - multicast subnet neighbours if nothing else is configured.

     

    Now we need to check whether we have some static routes or not, If it is not we supposed to add static routes to avoid the flapping on the NIC interfaces.

     

    Here is the routing table, Bit to worry here we have only one default router as gateway. Default router here is a single point of failure. If router is not able to handle ICMP Echo packets

    we will see the link flaps. By the way ICMP Echo transmission is very low in

    Priority so it's very easy to drop such traffic if network is busy.

     

    oracle>netstat -rn

     

    Routing Table: IPv4

      Destination           Gateway           Flags  Ref     Use     Interface

    -------------------- -------------------- ----- ----- ---------- ---------

    default                  172.16.5.1              UG        1        829

    1. 169.254.0.0          169.254.230.26       U         1         65 aggr1:1
    2. 172.16.5.0           172.16.6.16          U         1     330956 igb0
    3. 172.16.5.0           172.16.6.16          U         1          0 igb0:1
    4. 172.16.5.0           172.16.6.16          U         1          0 igb0:2
    5. 172.16.5.0           172.16.6.16          U         1          0 igb0:3
    6. 172.16.5.0           172.16.6.16          U         1          0 igb0:4
    7. 172.16.5.0           172.16.6.16          U         1         10 igb1

     

     

    Login as a root user add the static route as follows.

     

    node1 # route -p add -host 172.16.6.16 172.16.6.16 -static

    node1 # route -p add -host 172.16.6.19 172.16.6.19 -static

     

     

    oracle>netstat -rn

     

    Routing Table: IPv4

      Destination           Gateway           Flags  Ref     Use     Interface

    -------------------- -------------------- ----- ----- ---------- ---------

    default                  172.16.5.1              UG        1        829

    1. 169.254.0.0          169.254.230.26       U         1         65 aggr1:1
    2. 172.16.5.0           172.16.6.16          U         1     330956 igb0
    3. 172.16.5.0           172.16.6.16          U         1          0 igb0:1
    4. 172.16.5.0           172.16.6.16          U         1          0 igb0:2
    5. 172.16.5.0           172.16.6.16          U         1          0 igb0:3
    6. 172.16.5.0           172.16.6.16          U         1          0 igb0:4
    7. 172.16.5.0           172.16.6.16          U         1         10 igb1
    8. 172.16.6.16          172.16.6.16          UGH       1          0
    9. 172.16.6.19          172.16.6.19          UGH       1         17

     

    That's it the issue has been fixed.

     

     


Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points