3 Replies Latest reply on May 9, 2014 7:43 PM by budachst

    Virtual Server abruptly down with the Server Stopped Event, Description: <000> {SYSTEM} {SERVER STOPPED} Server has disconnected

    user12278081

      Virtual Server abruptly down with the Server Stopped Event, Description: <000> {SYSTEM} {SERVER STOPPED} Server has disconnected.

       

      One of Virtual Server of the 4 servers in the pool was abruptly down. This happened all of a sudden and is happening exactly at 11.30 AM. I am not able to trace what exactly is the process running and crashing the system. The exerts from the var/log/ovs-agent.log is as below:

       

      [2014-05-03 16:22:06 4742] ERROR (notificationserver:124) Error sending stats notification: 'NoneType' object has no attribute 'strip'

      [2014-05-03 16:23:38 4747] ERROR (ha:104) Error in HA process: Lock file /poolfsmnt/0004fb00000500002e65000e8f79d644/db/server_pool failed: timeout occured.

      Traceback (most recent call last):

        File "/usr/lib64/python2.4/site-packages/agent/daemon/ha.py", line 99, in serve_forever

          if is_clustered() and is_master():

        File "/usr/lib64/python2.4/site-packages/agent/lib/settings.py", line 133, in is_master

      get_cluster_db_home())

        File "/usr/lib64/python2.4/site-packages/agent/lib/db.py", line 90, in read_item

          db = AgentDB(db_name, db_home)

        File "/usr/lib64/python2.4/site-packages/agent/lib/db.py", line 45, in __init__

      self.lock.acquire(wait=10, delay=0.1)

        File "/usr/lib64/python2.4/site-packages/agent/lib/filelock.py", line 56, in acquire

          raise LockError("Lock file %s failed: timeout occured." % self.filename)

      LockError: Lock file /poolfsmnt/0004fb00000500002e65000e8f79d644/db/server_pool failed: timeout occured.

      [2014-05-03 16:29:40 4743] ERROR (remaster:148) Error in remaster process: Lock file /poolfsmnt/0004fb00000500002e65000e8f79d644/db/server_pool failed: timeout occured.

      Traceback (most recent call last):

        File "/usr/lib64/python2.4/site-packages/agent/daemon/remaster.py", line 146, in serve_forever

          remaster()

        File "/usr/lib64/python2.4/site-packages/agent/daemon/remaster.py", line 114, in remaster

          cluster_remaster()

        File "/usr/lib64/python2.4/site-packages/agent/daemon/remaster.py", line 70, in cluster_remaster

          pool_vip = read_item("server_pool", "pool_virtual_ip", db_home)

        File "/usr/lib64/python2.4/site-packages/agent/lib/db.py", line 90, in read_item

          db = AgentDB(db_name, db_home)

        File "/usr/lib64/python2.4/site-packages/agent/lib/db.py", line 45, in __init__

      self.lock.acquire(wait=10, delay=0.1)

        File "/usr/lib64/python2.4/site-packages/agent/lib/filelock.py", line 56, in acquire

          raise LockError("Lock file %s failed: timeout occured." % self.filename)

      LockError: Lock file /poolfsmnt/0004fb00000500002e65000e8f79d644/db/server_pool failed: timeout occured.

      [2014-05-03 16:31:10 4747] ERROR (ha:104) Error in HA process: Lock file /poolfsmnt/0004fb00000500002e65000e8f79d644/db/server_pool failed: timeout occured.

      Traceback (most recent call last):

        File "/usr/lib64/python2.4/site-packages/agent/daemon/ha.py", line 99, in serve_forever

          if is_clustered() and is_master():

        File "/usr/lib64/python2.4/site-packages/agent/lib/settings.py", line 133, in is_master

          get_cluster_db_home())

        File "/usr/lib64/python2.4/site-packages/agent/lib/db.py", line 90, in read_item

          db = AgentDB(db_name, db_home)

        File "/usr/lib64/python2.4/site-packages/agent/lib/db.py", line 45, in __init__

      self.lock.acquire(wait=10, delay=0.1)

        File "/usr/lib64/python2.4/site-packages/agent/lib/filelock.py", line 56, in acquire

      *********************************************************** ***********************************************************  ***********************************************************

      I feel that something related to Multipathing or HA is creating the issue.

      To brief about the environment:

      OVM:  3.2.7

      OS: Oracle Enterprise Linux 5.8

      Storage: EMC VNX 5300

       

       

      Thanks in advance,

      SaiRam.J