Oracle support never did help with this. I eventually figured out what was going wrong. Here is my explanation and workaround:
Oracle server identifies itself to the manager via a UUID. It creates the UUID during the discover process, and on the initial discovery, the UUID is stored away by the manager so it can identify the server on future discoveries. The kinky part is:
It re-creates the UUID on every discovery, that is, after every reboot. If it doesn't get the same UUID every time, you're screwed. And in my case, it DOESN"T.
The algorithm for computing the UUID is:
1. Ask the BIOS for a UUID. Apparently, SOME BIOSes have one, and if it does, that becomes the UUID presented to the manager.
2. If the BIOS doesn't have one (my old test servers DON'T), then it enumerates the first few Ethernet ports, and concatenates the MAC addresses together, and THIS becomes the UUID. Again, you okay, as long as you get the same MAC addresses every time.
(Imagine what happens when you replace a motherboard, or a NIC.)
In my case, it turns out, for some reason unknown to me, the enumeration of the Ethernet ports was NOT always returning them in the same order. If it happened to get them in the same order as it did the first time the server was discovered, then all is well. Otherwise, the server hangs in that "starting" state. You can reboot the server again and again, and eventually, you'll get the UUID that matches what is stored in the manager, and you're okay again.
The workaround: The file /etc/ovs-agent/agent.ini file has a parameter called "fakeuuid". I discovered this from reading the python code that creates the UUID. If the fakeuuid parameter is set to something, that is the official UUID, and the BIOS and MAC addresses are not looked at. I simply looked in the manager, found the UUIDs for the two servers, hard-coded these into the agent.ini file, rebooted them, and haven't had a problem since.
One can wonder why Oracle designed it to create a new UUID every time, and hope it never changes, instead of computing during the installaton, and storing it.
I sent the above explanation to Oracle as an attachment to my unresolved SR, which I'm about to close. Maybe the information will get to someone who cares.
thanks for posting your solution to this. I actually stumbled across this issue today, when I was applying updates to my VM servers. Ironically, only the Oracle branded server showed this behaviour - my Sun Fire X4170 M2, while neither of my Dell or HP servers did.
Luckily I was able to apply the workaround you discovered an now I am still struggling with myself whether to open a SR and file a bug, or not. ;)
Anyway, your post was extremely helpful.