6 Replies Latest reply on Sep 12, 2019 3:00 PM by Jens

    NFS problems after migrate of ldom to new host


      As we all know NFS mounted filesytems can cause all sorts of problems when in vfstab.


      We've migrated an ldom to a new host - worked fine on old host with no problems, all NFS mounts work fine. On the new host, not so, completely screws up. Will only book with every NFS entry in vfstab commented out.

      Even then when trying to mount manually, some work some of the time, some timeout.


      Question is:-


      1. What options in vfstab to ensure a failing NFS mount does not screw the entire system?

      2. What options are advisable for flakey NFS mounts in general?

      3. I gather TCP is the default, is it worth trying UDP?

        • 1. Re: NFS problems after migrate of ldom to new host


          1.  You need trubleshoot NFS problem.

               - Check networks error, logs, etc

          2.  You can add bg options for avoid server hangs at boot time.


          2.  In case NFS share is not mandatory for server, You can use autmount (auto_direct).



               This allow resolve some problem with  NFS avalability ( but can add some other problems)




          • 2. Re: NFS problems after migrate of ldom to new host

            Thanks Nic. Yes bg has allowed us to, at least, get the server to boot.


            BUT some NFS mounts still wont mount at random. Sometimes they work, sometimes they don't. Even when trying to mount manually (mount <mountpoint>)


            Are there other things to try? UDP maybe instead of TCP?

            • 3. Re: NFS problems after migrate of ldom to new host


              In case it's possible, You should not masqurade problem. You should troubleshoot problem.

                1. Check logs errors on NFS server/client side.

                2. Analyse  network statistics

                3. What changed. 


              I not see reason why UDP can resolve your problem.





              • 4. Re: NFS problems after migrate of ldom to new host

                Did you actually migrate the ldom (ldm migrate-domain ...) or did you head a more unusal route migrating the contents (flar archive, ..)?

                If you did the ldm migrate, I'd wonder what might have changed from primary-old to primary-new. Do you have any kind of firewalling or tagged interface in the game, or maybe some static config (static arp entries, routing, mtu sizes,...)?

                In most cases, those things should give you a rather consistent yet unwanted experience.

                Tackling nfs: Which version of nfs are you using, do you have it e.g. kerberized?

                Do you get proper outputs from rpcinfo -p <filer> and showmount -e <filer>?
                Do you have multiple filer adresses/names or just one, do you have structurally different entries/mounts or basically just multiple of the same type?


                By chance, do you have multiple ip adresses 'up'  without constraining the use for outbound traffic (e.g. all but one interface marked 'deprecated', elaborated routing table)?

                As I happened to trip over it some time ago: If there is some firewall involved and you did a cold migration, you might also want to check wether you ended up with stale data in the connection/flow tables. Since source and destination are typically identical (src+port), you might hit outdated state and see e.g. syn packets being dropped as they come unexpected. If this happens for multiple filers, sequence and state might give you rather puzzling results.

                • 5. Re: NFS problems after migrate of ldom to new host

                  Did ldm migrate...

                  • 6. Re: NFS problems after migrate of ldom to new host

                    live migration between similar gear? I vaguely recall ntp drift being a source of issues while doing live migration for domains in compat mode across different cpus/speeds. That's where kerberos might make a crucial difference after a while ;-)

                    I suggest: to be able to help you faster and better: Rather than lettings us come up with lots of questions, you might give a more detailed overview of your setup to reduce the degrees of freedom.

                    You are positive that the issue is within your instance rather than the surrounding infrastructure/filer for what reason? Do you have capability and opportunity to switch back to the former host system?