This content has been marked as final. Show 5 replies
this sounds very familiar, see my [recent blog post|http://www.leidinger.net/blog/2013/01/15/complete-network-loss-on-solaris-10u10-cpu-2012-10-on-virtualized-t4-2/] about it.
You didn't tell if you lost the network attached to this or not. We lost the possibility to use the complete network of the entire machine, only a reboot solved the problem.
Short: known bug, I was told we are the only ones (except internal Oracle testing in a lab) which stumbled over this problem.
In case you are able to reproduce this problem, I would be very interested how. We don't know how to reproduce it, and Oracle didn't want to tell me how ("only reproducable in the Oracle lab").
Sorry, I guess I did a poor job of fully describing my issue. So the answer to your question is yes, we completely lost connection to the network. But as of later on in the day yesterday, I think I figured out what happened. I'm about %75 sure I know what happened to cause this error message :)
At the time, I was under the impression that I could create an aggr using my vnet's in my guest ldoms. I have a dual or split io domain configuration and wanted to have redundant NIC's. So I asked the network folks to enable LACP on the switch for those 2 ports fed to the guest ldom's by both of the io domains. I have subsequently learned that this isn't possible. In fact, I'm not even sure how to have redundant NIC's in the ldom's...? IPMP? Redundant in that I want to be able to reboot an io domain and I don't want it to impact the guest ldom's. So these error messages only occured when the ldom's booted and found the switch ports in the LACP configuration. (Which isn't supported).
Does this make sense? I don't know if your systems were temporarily configured like this. But if you were really curious to know what caused this, try this in your environment.
Now, wrt redundant NIC's in our guest ldoms. According to:
I can setup an aggregation at the io / service domain level which is fine. But when I reboot my control/primary/service domain, my guest looses network. It still has it's disk from the other io domain. BTW, I used http://seriousbirder.com/blogs/solaris-ldom-split-io-domain-configuration-example as a configuration and it worked great. Just now I want redundant NIC's in my guest ldom. One nic per io domain.
I noticed on your blog, you used phy-state link properties. I wonder if that allows for link state IPMP at the vnet level in the guest ldom's...? Short of plumbing individual vnet NIC's in the guest, I don't know how else to do it.
Sorry, I entered a lot of information here.
You haven't seen these errors before have you?
Jan 14 08:02:27 mybox.ca ip: vnet0: DL_UNBIND_REQ failed: DL_OUTSTATE
SIOCSLIFNAME for ip: vnet0: Device busy
I have one guest ldom where vnet0 and vnet1 won't come up at all, and I get this error.
I should probably start a new thread for this one. I've even opened a ticket with Oracle, 4 days ago and no word from them :(
seems your systems where configured like
phys NICs | HV | (vswitch) - vnets | guest LDOM | aggr
mine were configured like
4 phys NICs - 2 aggr | HV | 2 vswitch - 2 vnets + vlan_tagging | guest LDOM or primary | IPMP
I was told our above scenario was verified by Oracle (this was before I joined this project).
Now they are configured like
4 phys NICs - 2 aggr | HV | 2 vswitch - 2 vnets + vlan_tagging | guest LDOM | IPMP
4 phys NICs - 2 aggr | HV | 2 vswitch - 2 vsw | primary | iface with vlan_tagging - IPMP
We use link based IPMP in the guest LDOMs and in the primary domain.
In the previous setup were we used vlanX instead of vswX we only had the phys-state linkprop on the vnets. We verified that IPMP was working in this setup for the guest LDOMs and in the primary domain.
Now with the vsw-setup for the primary domain we also added the phys-state linkprop to the vswitches. I verified that link based IPMP is working in this setup for the primary.