today one of my SRs suddenly became "unowned". I tried to regain ownership of that SR, but since it's still mounted on a couple of VM servers, this fails.
Does anyone know, how to resolve this?
Normally the storage repository will not be mounted on any Oracle VM Servers once the repository becomes "unowned". If the repository is still mounted on a subset of servers while the manager reports the repo as unowned, then something is very out-of-sync.
Of course, we need much more information to determine the root cause. Please return the output from the following command on you Oracle VM Manager (command line):
1. cat /etc/sysconfig/ovmm
Run the following command on each of your Oracle VM servers:
1. ovs-agent-db dump_db server
2. ovs-agent-db dump_db repository
3. df -h
4. ls -l /OVS/Repositories/<UUID of problem repo>
Edited by: grking on May 6, 2013 9:48 AM
Had the same thing happen in my 3.2.3 test environment. I ended up reinstalling the VM manager and rediscovering the environment. Then I had to run a fsck.ocfs on the SR and let the check update the cluster info. I could then rediscover the SR and all its associated data. A royal pain in the beehind.
I'm sorry you had such a difficult time with the storage repository.
You should have been able to accomplish the "rediscovery" without having to re-install the Oracle VM manager and rediscover the servers. First, you would have to verify the problem is one of the two following things:
1. The ocfs2 cluster ID on the disk matches the cluster ID in the /etc/ocfs2/cluster.conf file on each server
2. The value for OVS_REPO_MGR_UUID in the .ovsrepo file on the repo matches the UUID found in the /etc/sysconfig/ovmm file on the Oracle VM management server (the server where Oracle VM Manager is installed)
The solution would then depend on the results of checking those two things (a detailed recipe seems moot at this point). Either solution would only take a few minutes to accomplish and would not involve reinstalling the Oracle VM Manager. The high level explanation of the two solutions are:
1. Do as you did, run fcsk.ocfs2 on the repo (which is probably not what the first user is experiencing since ownership is really the next issue)
2. Update the value of OVS_REPO_MGR_UUID in the .ovsrepo file and then refresh the repository
Well… I actually ended up wiping the manager database and rediscover the cluster again. I tried to avoid that, since I would loose all my custom edits of all my virtual disk files, but anyway.
While I was at it I also migrated from Oracle SE to MySQL and in that process discovered that before I rediscovered the cluster I exported 49k objects from the old manager database and trying that again after rediscovering the cluster yielded only 14k9 objects on export.
So, it seems that there have been objects piling up and at some point this had to happen, I guess.
As a side note, my SR for this issue has been silent as of Apr. 26th when I uploaded the requested output of vmpinfo…
I gave the solution I found. I didn't detail what caused it. Either way, I couldn't delete the repo. I couldn't add anything to the repo. I couldn't update ownership of the repo. I couldn't do ANYTHING to the repo. I couldn't even get my guests out of the repo even though I could see it mounted locally. I had a manager problem. It wasn't a problem with the server itself or the storage.
Edited by: user12273962 on May 7, 2013 12:07 PM
I don't know what is wrong with Oracle Support and them ignoring SRs. Its really troubling. I get mad ever time I have to open an SR. The only advice I can give is to make is a Severity 1 and force them to drop the severity.
My problem was with 3.2.3 running on MySql. I don't think the database type had anything to do with my issue. Just an FYI.
Again, sorry for all the problems you guys encountered with the repositories...
Just to make sure I don't drop the ball, you both figured out how to recover on your own and you don't need any additional help - right?
Well… unfortuanetly, not yet. It looked good after I deleted the manager db and rediscovered the server pools, but know something really weird happened: on the vm servers, that had the "lost" storage repo mounted, upon rediscovering, the newly created sr is mounted under the path of the old one using along to its regular mount. What is really weird, is that it gets mounted under its old UUID, but connected to the same device in /dev/mapper as the new sr.
I have no clue where that one comes from.
P.S. I finally sorted that out. Seems that the repository file inside the db folder had been messed up. That is, there were still references to the old SR in that file and for whatever reason, OVS had decided to mount that old reference again. Since these LUNs are created by SCST and I reckon that the scsi id might have led to this…
Afaik, I do have it sorted out now.
Edited by: budachst on May 8, 2013 3:38 PM
I don't think you are out of the woods yet and I think you might be getting deeper and deeper into repositories that are out of sync between the manager and the servers.
The SCSI ID does not determine the name/WWID that udev assigns to the device special file for the disk (/dev/mapper/<WWID>) - udev uses the SCSI Vital Product Data (VPD) page83 ID to determine the WWID - the SCSI ID does not come into play at all.
I'm sort of flying blind without the information I was asking for in my first reply and I can't really help without the data. Just add the requested data to your next reply if you are still having problems with the repository.
Oh well… I actually missed your first post - sorry for that.
I will have a very close look at my VM servers but so far it really looks good. I have had each VM server, that exhibited this behaviour, removed from the cluster, deleted the server and removed all in /etc/ovs-agent/db. Afterwards I rediscovered the servers and added them to the cluster again, no more strange 2nd mount anymore after another rediscover.
Also, I was referring to the PVD, but mixed that up with the scsi id…