so let's share some of my feelings about this crap. After so many issues during implementation this platform was finally set up. And was running for only one week with small number of errors. Then during friday motherboard in one of two Oracle VM servers in clustered pool failed and HP replaced it. Fine. Server runs, but Oracle VM manager is remembering old MAC addresses (!!!!!!!!!!!!!!!!!!!!!!!) and I'm not able to get rid off it. Maintenance mode doesn't help, removing from pool doesn't help, can't rediscover because of repository (which is not presented to this server), restart of ovmm doesn't help, restart of that node doesn't help, can't change MAC in web interface and can't remove that server from pool.
So what can I do? If I unpresent second node from storage repository, remove it from pool, destroy pool and create pool again will be there those virtual machines on SAN disk as before of they will be deleted as well? Which "clever" person decided to go through web interface only? How can I remove server from cluster via CLI? And why it remembers old MAC addresses??????????
I'm trying not be ugly, but this is not enterprise ready project. It's parody. Most of the garage-origin project are much more enterprise ready then this one :-(
Check out the "cleanup" hack I got for an answer to a missing repository. ( How do delete non-existing Repository )
If that doesn't clean it up, you can reinstall the Manager pretty simply. To "claim" the repository, mount it somewhere (/mnt comes to mind), change the UUID in .osrepo to 000000...) then unmount it. Unown the server by killing /etc/ovs-agent/db/server. Now, when you discover server and discover repository, all your VMs will be there. You'll lose the "pretty" names of the disk files, but that's a small price to pay. I have a Pretty Name display routine I run nightly so I can rename them back if necessary.
Pingable from where? I can get to server directly, but manager has issues because of old MAC addresses in DB. Hardcoded MAC addresses where? In /etc/sysconfig/network-scripts/ there are new MAC addresses in files so why manager is not reading them or why I can't change them in manager?
Ok, cleanup doesn't help.
By repository you mean cluster repository or storage repository? For unown you mean to kill this process /opt/ovs-agent-3.0/OVSAgentServer.py ? Or some different one?
Thanks a lot
By repository I meant Storage Repository. Isn't that what's not working? For "unown" I mean delete the file /etc/ovs-agent/db/server. However, if you reinstall the manager using the -u UUID parameters, you won't need to unown anything. It will all discover fine.
"pingable" means get on the console of the host with the new motherboard and verify the network is working. If not, clean up the files in /etc/sysconfig/network-scripts and restart the network. If you can't use the network, ovmm will not see the server to discover it. The network-scripts have to be correct/consistent for networking to start.
I mentioned four items, possibly somewhat unclearly.
Try to "cleanup" the database you have.
Failing that, reinstall the Manager using the original UUID.
If reinstalling the manager won't allow you to claim the repository, unown the repository by editing the .ovsrepo file.
If reinstalling the manager won't allow you to let you own the server, unown the server by deleting a file and cycling the ovs-agent service.
None of these steps will lose the VMs, but some of them will lose the disks simple_names.
No. Oracle VM manager itself is broken, because for some reason it still shows all MAC addresses for node which has new motherboard. It means that it shows wrong MAC for mgmt interface and all other interfaces. Because of that I can't rediscover server or delete it from Oracle VM manager, because it says that cluster filesystem is still connected even as it is showed as unregistered server.
Changing MAC in scripts to old ones doesn't help because then it says that MAC is not valid/different. If leaved then they are set correctly and in script there's a comment that it was set by manager, but in manager it still shows old ones. So it's getting those old from somewhere. But why and from where?
Reboot of Oracle VM server or Oracle VM manager doesn't help.
BTW this is exact message when trying to delete server from manager. Server is in unassigned servers.
OVMRU_002041E Cannot delete/remove pool file system: Poolfile system for POOLNAME. Server pool: POOLNAME, still has servers in it. (of course that there are servers in pool, because we need to have that pool running :D)
In reviewing the thread, I don't see where you ever established that the network is working. We suggested you check the network-scripts, and you say that they were correct (new Mac addresses) and then you changed them to incorrect (the old Mac addresses) for some reason. Of course the network didn't work then. With the network-scripts containing correct Mac addresses, then you can troubleshoot the networking. Only after you know the network is functioning can Manager discover that the Mac addresses have changed. The "discovery" process is by IP address, not by Mac address.
Is the network working? (IE, can you ssh to the Oracle VM host from another machine, like the manager console?
I'm working with the same case, I'll try to explain this once more + additional steps we tried.
We had fully functional cluster with 2 nodes, Node1 and Node2. Node1 overheated, shut down, all VM's migrated to Node2 and are working.MB was replaced by technical personnel. When server was brought up again, we see that new MAC addresses are in place in config files.
We can ssh to that box, no issue at all, network IS working properly.
On other server we have installed Oracle VM Manager. That "smart" manager can not get that Node1 has now different MAC, because that was the only thing which changed. We tried to rediscover the server for Node1, it complains that IP has changed and we need to delete Node1 from cluster and then rediscover the server (Job Internal Error (Operation)com.oracle.ovm.mgr.api.exception.IllegalOperationException: If the IP address of this server has changed, please delete the server: Node1, and re-discover.). Fine, we go do that, but now it complains that we can not do that because " OVMRU_002041E Cannot delete/remove pool file system: Pool filesystem for POOLNAME. Server pool: POOLNAME, still has servers in it". What the hell is that? It asks to remove whole pool filesystem just to remove the server which we need to add back again.
I tried to fixup the environment by this method (sh ./ovm_upgrade.sh dbuser=ovs dbpass=<password> --fixup) which did not work.
Then we tried to mess with MAC addresses in config files, we tried to set them to the OLD ones, which worked, our attempt was unsuccessful. The question how to resolve this? To me this seems to be basic functionality, server burned down we need to replace it, we add it back and works. In Oracle VM solution this is not working, sad.
Edited by: user12122836 on May 22, 2012 7:19 AM
Edited by: user12122836 on May 22, 2012 7:20 AM
Edited by: user12122836 on May 22, 2012 7:20 AM
Edited by: user12122836 on May 22, 2012 7:39 AM
Glad the network is working. Sorry about all the other pain. I feel it. I know you want to delete the server and don't want to reinstall OVMM with a blank database, but there are two possibilities I've not heard mentioned:
a) Open an SR with Oracle. They should be able to walk you through deleting the bad records. or
b) Change the hostname and IP and just discover that! Not ideal, but might buy you some time.
Best of luck!
Yep, that's exactly what is happening to us. I thought that I described that enough, but as I'm already tired by a lot of issues with this platform then excuse me :-) We don't have official support contract yet, it was asked for and we are waiting so after that I can open SR. We just build environment for initial tests before production use, but SR can't be done without support contract of course.
Technically I'm curious from where is Oracle VM manager reading those old MAC addresses. They must be stored somewhere in schema in DB (or in cluster file system????). Because all other steps doesn't work and don't think that upgrade/reinstall of Oracle VM manager can help, because it will still be getting old MAC addresses from somewhere.
I had an issue with not being able to delete OVM servers completely. This showed up on 3.0.3 build 150. I received the same notes about the SERVER POOL not being empty.
What I did to finally solve, even after working through an SR, was simply to create a new server pool.
I removed all the VM guests one server at a time for the servers I wanted to keep, then migrated that server out of the pool, into the new pool, then moved the Guests back.
Once I had the 2 servers I wanted to keep in the new pool. I removed both of the other 2 servers I wanted to destroy from the pool, and finally was able to have the "delete" option work from the menu without an error on the hardware tab.
Was a royal pain, but that seemed to finally clean everything up. ovm_shell could show you what might still be "hooked" and not letting it delete too, but better to use that with Oracle Support.
We've also experienced having to replace a motherboard in one of our physical servers. After not being able to rediscover it or able to delete it, we created a SR. Using tools provided by Oracle support, they walked us through the manual deletion of the server from the database...HOWEVER, at the same time it also removed the repository from the database. We went from missing one server from the pool to missing the entire pool's repository. In the end we had to rebuild the entire database, which only takes us a few hours (actually had to rebuild twice b/c HP support installed the wrong motherboard the first time). Here are the steps we used to rebuild the database:
As root on the OVM server run:
service ovmm stop
/u01/app/oracle/ovm-manager-3/bin/ovm_upgrade.sh dbuser=<user> dbpass=<pass> dbhost=<host> dbport=<port> dbsid=<sid> deletedb
service ovmm start
1. Discover Server(s) -> Pool/OVM Server(s) will be visible again
2. Register Storage Array -> Storage Array Repositories and FS's will be visible again
3. Present Repository to Server(s) -> Repository will be visible again
4. Refresh Repository in Home section
5. Rediscover Server(s) -> VM's will reappear under the OVM Server(s). The not running VM's you will find under "Unassigned Virtual Machines"
Note: these steps are probably verbatim from a previous post by Avi Miller...I just found them in my notes.
this seems like most usable way for repair and we probably try that one. Do we need to shutdown virtual machines first on functional node or this procedure will work even with virtual machines running and nothing will happen to them?
The next one is changing hostname/IP and discover it as new server, but then of course old one will be in pool and must be deleted somewhat from pool.