8 Replies Latest reply: Feb 28, 2014 1:44 AM by Rijo Roy RSS

No sessions in node1 in a 2 node RAC 10g Database

Rijo Roy Newbie
Currently Being Moderated

Dear Sir's

 

As mentioned in the Subject my environment Details are shown below Oracle 10g Enterprise Edition RAC 2node in RHEL 4.7 Itanium processor.

 

At times(mostly Monday morning), I am facing a problem where no sessions or only one session in node 1 where there are >1000 sessions in my node 2 but after a shutdown immediate the load gets balanced across the instances.

 

The output of netstat -s shows UDP packet receive errors. I am not able to find any errors in listener.log neither in alert log of the instance nor in the crs logs even..

 

Please help me to analyse the situation.

 

Warm Regards

 

Roy

  • 1. Re: No sessions in node1 in a 2 node RAC 10g Database
    ABOracle Explorer
    Currently Being Moderated

    Roy,

     

    Have few queries:

     

    1. do you know what is tns configuration being used by the application end users?

    2. do you use any Oracle RAC Services ? If yes, does the service is load balanced to both the nodes ?

     

    These might help you to understand the problem better.

     

    Run query like below:

     

    1/ Login into Linix server and run

     

    srvctl status service -d dbname < Enter>

     

    please change dbname with your database Name

     

    what is the output?

     

    2/ Run a query against gv$session where group SERVICE_NAME say

     

    select count(*), SERVICE_NAME from gv$session group by SERVICE_NAME

    /

    select count(*), MACHINE,SERVICE_NAME from gv$session group by MACHINE,,SERVICE_NAME

    /

    select count(*), INST_ID,MACHINE,SERVICE_NAME from gv$session group by INST_ID,MACHINE,,SERVICE_NAME

    /

     

    Provide the output.

     

    Thanks

  • 2. Re: No sessions in node1 in a 2 node RAC 10g Database
    Rijo Roy Newbie
    Currently Being Moderated

    Hi

     

    Thanks for the reply.

     

    These are the info that I can provide as of now:

    1. do you know what is tns configuration being used by the application end users? Yes, For SQL Developer, individual instances are pointed using the hostname of the DB machine. However the Data Source used by the application servers(Oracle 10g AS/ Weblogic 11g) has entries of the vip of the nodes


    2. do you use any Oracle RAC Services ? If yes, does the service is load balanced to both the nodes ? Sorry , I didn't understand. My DB is in Oracle 10g RAC with node so it is having Oracle RAC Services. The nodes are load balanced.


    Run query like below:

     

    1/ Login into Linix server and run

     

    srvctl status service -d dbname < Enter>

     

    please change dbname with your database Name

     

    what is the output?

     

    Sir, the above query is irrelevant to my query as I am the "sys"/dba who is shutting down the instance which is not receiving any sessions.

    and the services are online

     

    Rest of the queries can be run only when a similar situation comes up. Clients are complaining saying the DB is slow so I believe they are not able to make connections with the node(with problem). By this it doesn't mean that problem exists only with a particular node. Both nodes are having this issue but at a time only one gets afftected with 0 or no sessions and the other with the entire number of session making my DB extremely slow. But after executing

    shutdown immediate (in my problematic node) followed by a startup the sessions are restored.

     

    Thanks & regards

    Roy

  • 3. Re: No sessions in node1 in a 2 node RAC 10g Database
    Dude! Guru
    Currently Being Moderated

    Perhaps it might be a good idea to determine first whether this is a database configuration or OS/network related problem. Nobody can really guess how you configured your load balancing or where you are seeing network problems.

     

    For instance, are you using client side load balancing using tnsnames.ora or server side load balancing. What is the output of lsnrctl status, for example? Does each node know about the service?

     

    Where are you seeing the UDP problems? If this is on the RAC interconnect channel, or any network adapter as such, I suggest to verify you are using proper settings. A common configuration mistake, for instance, was to have a Switch port set to fixed-full duplex and the server NIC set to auto-negotiate, which is usually the default. The result are very low performance and communications errors. You can check setting on the server using the ethtool. Also, it often helps to replace the network cable or re-crimp the connections if you have the proper tool.

  • 4. Re: No sessions in node1 in a 2 node RAC 10g Database
    ABOracle Explorer
    Currently Being Moderated

    Can you please provide below:

     

    1/

    Login into Linux/Unix server where database is running and run below command from Unix/Linux prompt(below command is not an Oracle query):

    $ srvctl status service -d dbname < Enter>

    what is the output?

     

    2/ Login into Linux/Unix server where database running and run below command from Unix/Linux prompt: ( as Dude suggested above)

    $ lsnrctl status

    what is the output?


    [[

    $ ps -ef|grep inh

    might provide you listener name in case listener name is used.

    $ lsnrctl status <Listener_name> 

    ]]


    3/ what is being used by application or Data Source is used by the application servers(Oracle 10g AS/ Weblogic 11g) ?


    I suspect that due to issue in one node, Oracle service fails over to other node and all services started running in one node might lead to your issue.


    Considering that all queries are running perfectly fine in the database from applications side. There is no query or instance related performance issue present in the database.

    You might collect a global awr report for the database before and during problem time and upload to see what going on in the database.


    Thanks

  • 5. Re: No sessions in node1 in a 2 node RAC 10g Database
    Rijo Roy Newbie
    Currently Being Moderated

    Dear Sir

     

    Sorry for the delay in replying. Busy with discussions with my network team.. As asked these are the details(sorry names and path are protected)

     

    1.

    srvctl status service -d dbname

    Service dbnameSVC is running on instance(s) dbname2, dbname1

     

    2.

    lsnrctl status

     

     

    LSNRCTL for Linux IA64: Version 10.2.0.3.0 - Production on 22-JAN-2014 12:28:53

     

     

    Copyright (c) 1991, 2006, Oracle.  All rights reserved.

     

     

    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))

    STATUS of the LISTENER

    ------------------------

    Alias                     LISTENER_inst1

    Version                   TNSLSNR for Linux IA64: Version 10.2.0.3.0 - Production

    Start Date                06-JAN-2014 09:14:17

    Uptime                    16 days 3 hr. 14 min. 36 sec

    Trace Level               off

    Security                  ON: Local OS Authentication

    SNMP                      OFF

    Listener Parameter File   /oracle/ora10g/product/1020/db_1/network/admin/listener.ora

    Listener Log File         /oracle/ora10g/product/1020/db_1/network/log/listener_inst1.log

    Listening Endpoints Summary...

      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=vip of node 1)(PORT=1521)))

      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=public ip of node 1)(PORT=1521)))

    Services Summary...

    Service "+ASM" has 1 instance(s).

      Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...

    Service "+ASM_XPT" has 1 instance(s).

      Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...

    Service "PLSExtProc" has 1 instance(s).

      Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...

    Service "dbname.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 2 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbnameSVC.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 2 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbnameXDB.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 1 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbname_XPT.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 2 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    The command completed successfully

     

    3.

    ps -ef|grep inh

    oracle   14111 12506  0 12:29 pts/2    00:00:00 grep inh

    oracle   24052     1  0 Jan06 ?        01:11:02 /oracle/ora10g/product/1020/db_1/bin/tnslsnr LISTENER_inst1 -inherit

     

    4.

    from node 1

    lsnrctl status LISTENER_inst1

     

     

    LSNRCTL for Linux IA64: Version 10.2.0.3.0 - Production on 22-JAN-2014 12:30:28

     

     

    Copyright (c) 1991, 2006, Oracle.  All rights reserved.

     

     

    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=inst1-vip)(PORT=1521)(IP=FIRST)))

    STATUS of the LISTENER

    ------------------------

    Alias                     LISTENER_inst1

    Version                   TNSLSNR for Linux IA64: Version 10.2.0.3.0 - Production

    Start Date                06-JAN-2014 09:14:17

    Uptime                    16 days 3 hr. 16 min. 11 sec

    Trace Level               off

    Security                  ON: Local OS Authentication

    SNMP                      OFF

    Listener Parameter File   /oracle/ora10g/product/1020/db_1/network/admin/listener.ora

    Listener Log File         /oracle/ora10g/product/1020/db_1/network/log/listener_inst1.log

    Listening Endpoints Summary...

      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=vip-node1)(PORT=1521)))

      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=publicip-node1(PORT=1521)))

    Services Summary...

    Service "+ASM" has 1 instance(s).

      Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...

    Service "+ASM_XPT" has 1 instance(s).

      Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...

    Service "PLSExtProc" has 1 instance(s).

      Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...

    Service "dbname.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 2 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbnameSVC.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 2 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbnameXDB.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 1 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbname_XPT.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 2 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    The command completed successfully

     

     

    from node 2

    lsnrctl status LISTENER_inst2

     

     

    LSNRCTL for Linux IA64: Version 10.2.0.3.0 - Production on 22-JAN-2014 12:30:52

     

     

    Copyright (c) 1991, 2006, Oracle.  All rights reserved.

     

     

    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=inst2-vip)(PORT=1521)(IP=FIRST)))

    STATUS of the LISTENER

    ------------------------

    Alias                     LISTENER_inst2

    Version                   TNSLSNR for Linux IA64: Version 10.2.0.3.0 - Production

    Start Date                20-JAN-2014 17:33:43

    Uptime                    1 days 18 hr. 57 min. 8 sec

    Trace Level               off

    Security                  ON: Local OS Authentication

    SNMP                      OFF

    Listener Parameter File   /oracle/ora10g/product/1020/db_1/network/admin/listener.ora

    Listener Log File         /oracle/ora10g/product/1020/db_1/network/log/listener_inst2.log

    Listening Endpoints Summary...

      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=vip-node2)(PORT=1521)))

      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=publicip-node2(PORT=1521)))

    Services Summary...

    Service "+ASM" has 1 instance(s).

      Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...

    Service "+ASM_XPT" has 1 instance(s).

      Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...

    Service "PLSExtProc" has 1 instance(s).

      Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...

    Service "dbname.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 1 handler(s) for this service...

      Instance "dbname2", status READY, has 2 handler(s) for this service...

    Service "dbnameSVC.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 1 handler(s) for this service...

      Instance "dbname2", status READY, has 2 handler(s) for this service...

    Service "dbnameXDB.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 1 handler(s) for this service...

      Instance "dbname2", status READY, has 1 handler(s) for this service...

    Service "dbname_XPT.sahaj.co.in" has 2 instance(s).

      Instance "dbname1", status READY, has 1 handler(s) for this service...

      Instance "dbname2", status READY, has 2 handler(s) for this service...

    The command completed successfully

     

    3/ what is being used by application or Data Source is used by the application servers(Oracle 10g AS/ Weblogic 11g) ?

    Entry of Datasource connection pool in Weblogic 11g

    jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=vip-node1)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=vip-node2)(PORT=1521))(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dbname.co.in)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))))


    Entry of Datasource / Connection Pool in Oracle 10g AS

    <connection-pool name="xxxConPool" abandoned-connection-timeout="120" inactivity-timeout="120" initial-limit="20" max-connections="500" min-connections=

    "20" num-cached-statements="200" property-check-interval="600" time-to-live-timeout="60" used-connection-wait-timeout="120" validate-connection="true">

          <connection-factory factory-class="oracle.jdbc.pool.OracleDataSource" user="xxx" password="xxxx123" url="jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS = (PR

    OTOCOL = TCP)(HOST = vip-node1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = vip-node2)(PORT = 1521))  (LOAD_BALANCE = yes)  (CONNECT_DATA = (SERVER

    = DEDICATED) (SERVICE_NAME = dbname.co.in)  ) )" commit-record-table-name=""/>

       </connection-pool>

     

    Since I am working in Oracle 10g , I am not able to generate a global AWR report, but from the node I generated awr report during the problematic period and found no such big SQL's that slows my database.

     

    Kindly help.

     

    Regards

    Roy

  • 6. Re: No sessions in node1 in a 2 node RAC 10g Database
    ABOracle Explorer
    Currently Being Moderated

    No decision could be made with above details.

     

    would you be able to upload the awrs (total four) for both the instances during good time and bad time say for 1 hr duration.

     

    Thanks,

  • 7. Re: No sessions in node1 in a 2 node RAC 10g Database
    Rijo Roy Newbie
    Currently Being Moderated

    Dear All

     

    Thanks for all the suggestions and guidance's. After a long discussion with my network team they have done a workaround in the private-ip network and now the problem seems to be resolved by that. However, we are on rigorous monitoring.

     

    Regards

     

    Roy

  • 8. Re: No sessions in node1 in a 2 node RAC 10g Database
    Rijo Roy Newbie
    Currently Being Moderated

    Dear All

     

    As I said we were monitoring the issue after the work around applied but still affected with the same issue. " No sessions in a single node".

    Again we had to bounce the affected instance to resume the sessions.

     

    Today morning also it happened. While going through both the nodes I got this below error in crsd.log of the node which was receiving session:

     

    2014-02-28 09:29:07.518: [  CRSEVT][497332864]0CAAMonitorHandler :: 0:Could not join /oracle/crs/product/1020/crs_1/bin/racgwrap(check)

    category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

     

     

    2014-02-28 09:29:07.735: [  CRSEVT][497332864]0CAAMonitorHandler :: 0:Action Script /oracle/crs/product/1020/crs_1/bin/racgwrap(check) timed out for ora.node2.vip! (timeout=60)

    2014-02-28 09:29:07.736: [  CRSAPP][497332864]0CheckResource error for ora.node2.vip error code = -2

     

    Does this trigger the problem?

    I have googled the same but found some saying its a bug and others saying bouncing will resolve the problem..

     

    Please help

     

    Thanks & Regards

     

    Roy

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points