6 Replies Latest reply: Jul 1, 2013 9:09 AM by wowwow RSS

    Rediscovery fail after dropping ovs schema in OVM3.2.2

    wowwow

      Hi,

       

      I have problem with rediscovering 1 of 3 servers in a pool in OVM3.2.2.

       

      It all started with the storage administrator removing the LUN from the disk array before I delete the storage repository. The OVM manager is left with a repository which I cannot delete. Then a guru suggested me dropping the ovs schema on the manager and have it re-discovering the servers.

       

      The re-discovery went fine for 2 servers. Discovery of the third server failed with the following in the event log:

       

      Job Construction Phase

      ----------------------

      Job ID: 1371694497598

       

      begin()

      Appended operation 'Discover Manager Server Discover' to object 'OVM Foundry : Discover Manager'.

      commit()

      Completed Step: COMMIT

       

      Objects and Operations

      ----------------------

      Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager

      Operation: Discover Manager Server Discover

       

      Job Running Phase at 2013-06-20 12:14:57,598

      ----------------------------------------------

      Job Participants: []

       

       

      Actioner

      --------

      12:14:57,901: Starting operation 'Discover Manager Server Discover' on object 'OVM Foundry : Discover Manager'

      Setting Context to model only in job with id=1371694497598

      ...

      Job Internal Error (Operation)com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Attempt to send command: discover_repository_db to server: DCOVM4S failed. OVMAPI_4004E Server Failed Command: discover_repository_db , Status: org.apache.xmlrpc.XmlRpcException: agent.lib.filelock.LockError:Lock file /var/run/ovs-agent/discover_repository_db.lock failed: timeout occured. [Thu Jun 20 12:17:19 EST 2013] [Thu Jun 20 12:17:19 EST 2013]

      at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:512)

      at com.oracle.ovm.mgr.action.ActionEngine.sendUndispatchedServerCommand(ActionEngine.java:464)

      at com.oracle.ovm.mgr.action.ActionEngine.sendServerCommand(ActionEngine.java:390)

      at com.oracle.ovm.mgr.action.ActionEngine.sendDiscoverCommand(ActionEngine.java:312)

      at com.oracle.ovm.mgr.action.RepositoryAction.getRepositories(RepositoryAction.java:39)

      at com.oracle.ovm.mgr.discover.ovm.RepositoryDbDiscoverHandler.query(RepositoryDbDiscoverHandler.java:42)

      at com.oracle.ovm.mgr.discover.ovm.RepositoryDbDiscoverHandler.query(RepositoryDbDiscoverHandler.java:27)

      at com.oracle.ovm.mgr.discover.ovm.DiscoverHandler.execute(DiscoverHandler.java:57)

      at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:400)

      at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:385)

      at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:367)

      at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDefaultDiscover(DiscoverEngine.java:323)

      at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverNewServer(DiscoverEngine.java:300)

      at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverServer(DiscoverEngine.java:203)

      at com.oracle.ovm.mgr.op.manager.DiscoverManagerServerDiscover.action(DiscoverManagerServerDiscover.java:48)

      at com.oracle.ovm.mgr.api.collectable.ManagedObjectDbImpl.executeCurrentJobOperationAction(ManagedObjectDbImpl.java:1156)

      at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:356)

      at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:333)

      at com.oracle.odof.core.storage.Transaction.invokeMethod(Transaction.java:865)

      at com.oracle.odof.core.Exchange.invokeMethod(Exchange.java:244)

      at com.oracle.ovm.mgr.api.manager.DiscoverManagerProxy.executeCurrentJobOperationAction(Unknown Source)

      at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:230)

      at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:322)

      at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1340)

      at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:356)

      at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:333)

      at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:106)

      at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:92)

      at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)

      at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:752)

      at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:467)

      at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:525)

      at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)

      at java.lang.Thread.run(Thread.java:662)

      Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: discover_repository_db , Status: org.apache.xmlrpc.XmlRpcException: agent.lib.filelock.LockError:Lock file /var/run/ovs-agent/discover_repository_db.lock failed: timeout occured. [Thu Jun 20 12:17:19 EST 2013]

      at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:803)

      at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:508)

      ... 40 more

      ...

       

      Anthony

        • 1. Re: Rediscovery fail after dropping ovs schema in OVM3.2.2
          WadhahDaouehi

          Hi,

          can you tell us the output of  /var/log/ovs-agent.log and /var/log/messages in the third server (Not discovered)

           

          Best Regards

          • 2. Re: Rediscovery fail after dropping ovs schema in OVM3.2.2
            wowwow

            On closer look at ovs-agent.log, the lock problem is already there before I made any change.

             

            [2013-06-19 14:51:46 22316] DEBUG (service:74) call start: discover_physical_luns('3600601600aa02c0008e5f1ee0e7be211 36006

            01600aa02c00565158c8670be111 3600601600aa02c002a574726117be211 3600601600aa02c007e9b8c0a6b0be111 36006016011a02c000c7e4db2

            dd3ce111 36006016011a02c00d6e89bc5dd3ce111 36006016011a02c00a8b4e2e7ce55e111 3600601600aa02c0018cd2b8f670be111 3600601600a

            a02c002a574726117be211 3600601600aa02c0008e5f1ee0e7be211 36006016011a02c000c7e4db2dd3ce111 3600601600aa02c00565158c8670be1

            11 36006016011a02c00d6e89bc5dd3ce111 3600601600aa02c007e9b8c0a6b0be111 36006016011a02c00a8b4e2e7ce55e111 36006016011a02c00

            0c7e4db2dd3ce111 3600601600aa02c0018cd2b8f670be111 3600601600aa02c007e9b8c0a6b0be111 3600601600aa02c002a574726117be211 360

            06016011a02c00d6e89bc5dd3ce111 3600601600aa02c00565158c8670be111 36006016011a02c00a8b4e2e7ce55e111 3600601600aa02c0008e5f1

            ee0e7be211 3600601600aa02c0018cd2b8f670be111 3600601600aa02c0008e5f1ee0e7be211 36006016011a02c00a8b4e2e7ce55e111 360060160

            0aa02c0018cd2b8f670be111 3600601600aa02c007e9b8c0a6b0be111 36006016011a02c000c7e4db2dd3ce111 3600601600aa02c00565158c8670b

            e111 36006016011a02c00d6e89bc5dd3ce111 3600601600aa02c002a574726117be211',)

            [2013-06-19 14:51:47 22316] DEBUG (service:76) call complete: discover_physical_luns

            [2013-06-19 14:51:47 22382] DEBUG (service:76) call start: discover_repository_db

            [2013-06-19 15:00:19 23641] DEBUG (service:74) call start: discover_repositories(' 0004fb00000300005c933b8fbd2fd3b2 ',)

            [2013-06-19 15:02:19 23641] ERROR (service:96) catch_error: Lock file /var/run/ovs-agent/discover_repositories.lock failed

            : timeout occured.

            Traceback (most recent call last):

              File "/usr/lib64/python2.4/site-packages/agent/lib/service.py", line 94, in wrapper

                return func(*args)

              File "/usr/lib64/python2.4/site-packages/agent/api/repository.py", line 143, in discover_repositories

                lock.acquire(wait=120)

              File "/usr/lib64/python2.4/site-packages/agent/lib/filelock.py", line 90, in acquire

                raise LockError("Lock file %s failed: timeout occured." % self.filename)

            LockError: Lock file /var/run/ovs-agent/discover_repositories.lock failed: timeout occured.

             

            After I dropped the ovs schema, I keep getting the following:

             

            [2013-06-20 13:26:13 9193] ERROR (notification:44) Unable to send notification: (111, 'Connection refused')

            [2013-06-20 13:26:33 9193] ERROR (notification:44) Unable to send notification: (111, 'Connection refused')

            [2013-06-20 13:26:43 9181] DEBUG (notificationserver:237) Trying to connect to manager.

             

             

            and the lock problem still persists:

             

            [2013-06-20 14:13:29 5357] ERROR (service:96) catch_error: Lock file /var/run/ovs-agent/discover_repository_db.lock failed: timeout occured.

            Traceback (most recent call last):

              File "/usr/lib64/python2.4/site-packages/agent/lib/service.py", line 94, in wrapper

                return func(*args)

              File "/usr/lib64/python2.4/site-packages/agent/api/repository.py", line 265, in discover_repository_db

                lock.acquire(wait=120)

              File "/usr/lib64/python2.4/site-packages/agent/lib/filelock.py", line 90, in acquire

                raise LockError("Lock file %s failed: timeout occured." % self.filename)

            LockError: Lock file /var/run/ovs-agent/discover_repository_db.lock failed: timeout occured.

            • 3. Re: Rediscovery fail after dropping ovs schema in OVM3.2.2
              WadhahDaouehi

              Hi,

              Your server has discovered the LUNs (so the server can access to SAN storage), so the problem is about the server cant discover the repository.

              I recommended you :

              - To unpresent the storage repository with Oracle VM Manager

              - Rediscover your Server

              - Create a new repository and present it to the Oracle VM Server.

               

               

              To solve this:

              I keep getting the following:

               

              [2013-06-20 13:26:13 9193] ERROR (notification:44) Unable to send notification: (111, 'Connection refused')

              [2013-06-20 13:26:33 9193] ERROR (notification:44) Unable to send notification: (111, 'Connection refused')

              [2013-06-20 13:26:43 9181] DEBUG (notificationserver:237) Trying to connect to manager.

              you can restart your Oracle VM Manager

                   # service ovmm restart

               

               

              I hope this can help you

              Best Regards

              • 4. Re: Rediscovery fail after dropping ovs schema in OVM3.2.2
                user12273962

                I've seen this issue before on 3.2 but not on 3.1. A reboot of the OVM server always cleared it up for me.

                • 5. Re: Rediscovery fail after dropping ovs schema in OVM3.2.2
                  user13361870

                  It is because the oprarion is locked,Maybe your server is doing other job.Also you can select the server and select the event option,then select the Acknowledge now.It will be OK! If it is also lock,you can Delete the file /var/run/ovs-agent/discover_repository_db.lock.

                  • 6. Re: Rediscovery fail after dropping ovs schema in OVM3.2.2
                    wowwow

                    I got around the problem. Here is my theory:

                     

                    For some reason one of the servers in my pool think it still has the LUN which was removed from the disk array. multipath -ll shows the LUN still exists although it was removed. During re-discovery, it tries to talk to this phantom LUN and timed out. This persists even after restarting multipathd or reboot. I had to flush it with multipath -F. After I flushed it, the re-discovery was fine.