1 2 Previous Next 15 Replies Latest reply on Feb 7, 2017 3:38 PM by Juergens-Oracle

    ha zone

    1031554

      Hi,

      I have 2 nodes solaris cluster 4.3 on solaris 11 nodes.I followed step by step everything in How to Create a Failover Zone in a Cluster .

      I have :

      root@test6:~# /usr/cluster/bin/clrs status

       

      === Cluster Resources ===

       

      Resource Name      Node Name   State                  Status Message

      -------------                 ---------         -----                  --------------

      solarisfz1-rs              test5           Starting              Unknown-Starting

                                        test6          Offline               Offline

       

      ha-zones-hasp-rs      test5        Online                   Online

                                        test6       Offline                    Offline

       

       

      # /usr/cluster/bin/clrg status

       

      === Cluster Resource Groups ===

       

      Group Name       Node Name       Suspended      Status

      ----------       ---------                         ---------      ------

      zone-rg          test5                              No             Offline

                           test6                               No             Pending_offline

       

      Resource ha-zones-hasp-rs is online on test5:

      root@test5:~# zpool list

      NAME       SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT

      ha-zones  49.8G   719M  49.0G   1%  1.00x  ONLINE  /

      rpool     15.6G  12.9G  2.74G  82%  1.00x  ONLINE  -

       

      Resource solarisfz1-rs is online on test5

      root@test5:~# zoneadm list -cv

        ID NAME             STATUS      PATH                         BRAND      IP

         0 global           running     /                            solaris    shared

         - solarisfz1       installed   /ha-zones/solaris/solarisfz1 solaris    shared

       

      If i try it:

      root@test5:~# /usr/cluster/bin/clrg switch -n test6 zone-rg

      clrg:  (C667636) zone-rg: resource group is undergoing a reconfiguration, try again later

       

      tnx.

        • 1. Re: ha zone
          Juergens-Oracle

          Hello 1031554,

           

          the step9 (Assign the UUID) is not longer necessary for SC4.3 on 11.3.

           

          To install the failover zone please follow

          How to Install a solaris Branded Zone and Perform the Initial Internal Zone Configuration

          of Oracle Solaris Cluster 4.3 Data Service for Oracle Solaris Zones Guide.

           

          I would suggest to delete the resource after it went into offline state because no primary node was found. Then recreate it by using the mentioned guide.

           

          Hth,

             Juergen

          • 2. Re: ha zone
            1031554

            Hi Juergens,

            I done what you suggested.Zpool resource failover fine but when i enable zone failover resource i have:

            /usr/cluster/bin/clrs status

             

            === Cluster Resources ===

             

            Resource Name        Node Name     State        Status Message

            -------------                ---------               -----        --------------

            solarisfz1-rs               test5                Offline      Offline

                                              test6                Starting     Unknown - Starting

             

            ha-zones-hasp-rs     test5         Offline      Offline

                                             test6         Online       Online

             

            sczbt_config.solarisfz1-rs file is:

            RS=solarisfz1-rs

            RG=zone-rg

            PARAMETERDIR=/ha-zones/solaris/solarisfz1/params

            SC_NETWORK=false

            SC_LH=

            FAILOVER=true

            HAS_RS=ha-zones-hasp-rs

            Zonename="solarisfz1"

            Zonebrand="solaris"

            Zonebootopt=""

            Milestone="svc:/milestone/multi-user-server"

            LXrunlevel="3"

            SLrunlevel="3"

            Mounts=""

            Migrationtype="cold"

             

            i done ./sczbt_register -f ./sczbt_config.solarisfz1-rs

             

             

            /usr/cluster/bin/cluster status -t node

             

            === Cluster Nodes ===

             

            --- Node Status ---

             

            Node Name                                       Status

            ---------                                       ------

            test5                                           Online

            test6                                           Online

             

             

            tnx

            • 3. Re: ha zone
              Juergens-Oracle

              Hello 1031554,

               

              a couple of things to check.

              - Look into the /var/adm/messages files of the nodes to get more details about the issue.

              - Does the step7 "For a failover configuration, verify that the resource group can switch over." in the mentioned link work? Which means when switching the HAStoragePlus resource between the nodes, every node can start the failover zone?

              - If yes, try the following config file:

              RS=solarisfz1-rs

              RG=zone-rg

              PARAMETERDIR=

              SC_NETWORK=false

              SC_LH=

              FAILOVER=true

              HAS_RS=ha-zones-hasp-rs

              Zonename=solarisfz1

              Zonebrand=solaris

              Zonebootopt=

              Milestone=multi-user-server

              LXrunlevel=

              SLrunlevel=

              Mounts=

              Migrationtype=cold

               

              and monitor the consoles and the /var/adm/messages file of both nodes to get more details about the issue.

               

              Hth,

                 Juergen

              • 4. Re: ha zone
                1031554

                Hi,

                 

                 

                I tried with your setting after i deleted solarisfz1-rs.

                Next i run ./sczbt_register -f /opt/SUNWsczone/sczbt/util/sczbt_config.solarisfz1-rs and clrs enable solarisfz1-rs.

                the output in /var/adm/messages:

                Feb  2 15:40:15 test6 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <../../SUNWsczone/sczbt/bin/validate_ha-zone_sczbt> for resource <solarisfz1-rs>, resource group <zone-rg>, node <test6>, timeout <300> seconds

                Feb  2 15:40:16 test6 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <../../SUNWsczone/sczbt/bin/validate_ha-zone_sczbt> completed successfully for resource <solarisfz1-rs>, resource group <zone-rg>, node <test6>, time used: 0% of timeout <300 seconds>

                Feb  2 15:40:16 test6 Cluster.CCR: [ID 973933 daemon.notice] resource solarisfz1-rs added.

                Feb  2 15:41:36 test6 Cluster.RGM.global.rgmd: [ID 314356 daemon.notice] resource solarisfz1-rs enabled.

                 

                /usr/cluster/bin/clrs status

                 

                === Cluster Resources ===

                 

                Resource Name         Node Name     State       Status Message

                -------------         ---------     -----       --------------

                solarisfz1-rs         test5         Offline     Offline

                                      test6         Offline     Offline

                 

                ha-zones-hasp-rs      test5         Offline     Offline

                                      test6         Offline     Offline

                 

                root@test6:/opt/SUNWsczone/sczbt/util# /usr/cluster/bin/clrg status

                 

                === Cluster Resource Groups ===

                 

                Group Name       Node Name       Suspended      Status

                ----------       ---------       ---------      ------

                zone-rg          test5           No             Offline

                                   test6           No             Offline

                /usr/cluster/bin/cluster status -t node

                 

                === Cluster Nodes ===

                 

                --- Node Status ---

                 

                Node Name                                       Status

                ---------                                       ------

                test5                                           Online

                test6                                           Online

                 

                Next clrg online -eM  zone-rg

                I have on test5:

                 

                /usr/cluster/bin/clrs status

                 

                === Cluster Resources ===

                 

                Resource Name        Node Name     State        Status Message

                -------------        ---------     -----        --------------

                solarisfz1-rs        test5         Starting     Unknown - Starting

                                     test6         Offline      Offline

                 

                ha-zones-hasp-rs     test5         Online       Online

                                     test6         Offline      Offline

                 

                root@test5:~# zpool list

                NAME       SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT

                ha-zones  49.8G   720M  49.0G   1%  1.00x  ONLINE  /

                rpool     15.6G  13.6G  1.99G  87%  1.00x  ONLINE  -

                root@test5:~# zoneadm list -cv

                  ID NAME             STATUS      PATH                         BRAND      IP

                   0 global           running     /                            solaris    shared

                   2 solarisfz1       running     /ha-zones/solaris/solarisfz1 solaris    shared

                 

                The overall behavior is that the zpool resource and ha zone resource fail over automatically and the status of solarisfz1-rs is unknown-starting on that node.The zone is in running status but after a while the status of solarisfz1-rs change automatically to unknowing-stopping.Then the zone and zpool resources are brought up on the other node with the same status -unknown-starting and after a while the status change to unknown-stopping.Then everything goes offline.

                /usr/cluster/bin/clrs status

                 

                === Cluster Resources ===

                 

                Resource Name         Node Name     State       Status Message

                -------------         ---------     -----       --------------

                solarisfz1-rs         test5         Offline     Offline

                                      test6         Offline     Offline

                 

                ha-zones-hasp-rs      test5         Offline     Offline

                                      test6         Offline     Offline

                 

                /usr/cluster/bin/clrg status

                 

                === Cluster Resource Groups ===

                 

                Group Name       Node Name       Suspended      Status

                ----------       ---------       ---------      ------

                zone-rg          test5           No             Offline

                                 test6           No             Offline

                 

                 

                tnx,

                Marius

                • 5. Re: ha zone
                  Juergens-Oracle

                  Hello Marius,

                  now check the /var/adm/messages of test5 and test6. There should be some messages why the solarisfz1-rs does not come online. It is per default design that Solaris Cluster tries to start the resource on each node twice.

                   

                  If messages does not give enough information you can enable debug for HA Zone/Container agent as described in:

                  Solaris Cluster Resource How to Enable Data Service Debug Mode (Doc ID 1010497.1)

                   

                  Hth,

                    Juergen

                  • 6. Re: ha zone
                    1031554

                    Hi Juergens,

                    Just a quick info.

                    Initial i have

                    root@test5:~# /usr/cluster/bin/clrs status

                     

                    === Cluster Resources ===

                     

                    Resource Name         Node Name     State       Status Message

                    -------------         ---------     -----       --------------

                    solarisfz1-rs         test5         Offline     Offline

                                          test6         Offline     Offline

                     

                    ha-zones-hasp-rs      test5         Offline     Offline

                                          test6         Offline     Offline

                     

                    root@test5:~# zpool list

                    NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT

                    rpool  15.6G  13.9G  1.72G  88%  1.00x  ONLINE  -

                    I online zone-rg and in /var/adm/messages

                    Feb  2 16:20:35 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_prenet_start> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeout <1800> seconds

                    Feb  2 16:20:39 test5 zfs: [ID 249136 kern.info] imported version 37 pool ha-zones using 37

                    Feb  2 16:20:42 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_prenet_start> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <1800 seconds>

                    Feb  2 16:20:42 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_start> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeout <90> seconds

                    Feb  2 16:20:42 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_start> for resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>, timeout <300> seconds

                    Feb  2 16:20:42 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_start> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <90 seconds>

                    Feb  2 16:20:48 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': Overlay mount detected at /ha-zones/solaris/solarisfz1

                    Feb  2 16:20:48 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': zonepath is not a mountpoint for a zfs file system

                    Feb  2 16:20:48 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': could not verify zonepath /ha-zones/solaris/solarisfz1 because of the above errors.

                    Feb  2 16:22:25 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': could not verify zonepath /ha-zones/solaris/solarisfz1 because of the above errors.

                    Feb  2 16:22:25 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': must be installed before boot.

                    Feb  2 16:22:25 test5 Cluster.PMF.pmfd: [ID 887656 daemon.notice] Process: tag="zone-rg,solarisfz1-rs,0.svc", cmd="/bin/ksh -c /opt/SUNWsczone/bin/control_ha-zone sczbt start -R solarisfz1-rs -G zone-rg -T ORCL.ha-zone_sczbt:2", Failed to stay up.

                    Feb  2 16:22:26 test5 Cluster.PMF.pmfd: [ID 534408 daemon.notice] "zone-rg,solarisfz1-rs,0.svc" restarting too often ... sleeping 30 seconds.

                    Feb  2 16:23:04 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': Overlay mount detected at /ha-zones/solaris/solarisfz1

                    Feb  2 16:23:04 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': zonepath is not a mountpoint for a zfs file system

                    Feb  2 16:23:04 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': could not verify zonepath /ha-zones/solaris/solarisfz1 because of the above errors.

                    Feb  2 16:23:04 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': must be installed before boot.

                    Feb  2 16:23:04 test5 Cluster.PMF.pmfd: [ID 887656 daemon.notice] Process: tag="zone-rg,solarisfz1-rs,0.svc", cmd="/bin/ksh -c /opt/SUNWsczone/bin/control_ha-zone sczbt start -R solarisfz1-rs -G zone-rg -T ORCL.ha-zone_sczbt:2", Failed to stay up.

                    Feb  2 16:23:04 test5 Cluster.PMF.pmfd: [ID 534408 daemon.notice] "zone-rg,solarisfz1-rs,0.svc" restarting too often ... sleeping 30 seconds.

                    Feb  2 16:23:43 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': Overlay mount detected at /ha-zones/solaris/solarisfz1

                    Feb  2 16:23:43 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': zonepath is not a mountpoint for a zfs file system

                    Feb  2 16:23:43 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': could not verify zonepath /ha-zones/solaris/solarisfz1 because of the above errors.

                    Feb  2 16:23:43 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': must be installed before boot.

                    Feb  2 16:23:43 test5 Cluster.PMF.pmfd: [ID 887656 daemon.notice] Process: tag="zone-rg,solarisfz1-rs,0.svc", cmd="/bin/ksh -c /opt/SUNWsczone/bin/control_ha-zone sczbt start -R solarisfz1-rs -G zone-rg -T ORCL.ha-zone_sczbt:2", Failed to stay up.

                    Feb  2 16:23:44 test5 Cluster.PMF.pmfd: [ID 534408 daemon.notice] "zone-rg,solarisfz1-rs,0.svc" restarting too often ... sleeping 30 seconds.

                     

                    If i try to do things manually:

                    root@test5:~# zpool import ha-zones

                    cannot mount 'ha-zones' on '/ha-zones': directory is not empty

                    cannot mount 'ha-zones' on '/ha-zones': directory is not empty

                    cannot mount 'ha-zones/solaris' on '/ha-zones/solaris': failure mounting parent dataset

                    cannot mount 'ha-zones' on '/ha-zones': directory is not empty

                    cannot mount 'ha-zones/solaris' on '/ha-zones/solaris': failure mounting parent dataset

                    cannot mount 'ha-zones/solaris/solarisfz1' on '/ha-zones/solaris/solarisfz1': failure mounting parent dataset

                    root@test5:~# zfs mount -O rpool/ha-zones

                    root@test5:~# zfs mount -O rpool/ha-zones/solaris

                    root@test5:~# zfs mount -O rpool/ha-zones/solaris/solarisfz1

                     

                    then i have

                    root@test5:~# zpool list

                    NAME       SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT

                    ha-zones  49.8G   719M  49.0G   1%  1.00x  ONLINE  -

                    rpool     15.6G  13.9G  1.72G  88%  1.00x  ONLINE  -

                    root@test5:~# zoneadm list -cv

                      ID NAME             STATUS      PATH                         BRAND      IP

                       0 global           running     /                            solaris    shared

                       - solarisfz1       configured  /ha-zones/solaris/solarisfz1 solaris    shared

                     

                    /usr/cluster/bin/clrs status

                     

                    === Cluster Resources ===

                     

                    Resource Name        Node Name     State        Status Message

                    -------------        ---------     -----        --------------

                    solarisfz1-rs        test5         Starting     Unknown - Starting

                                         test6         Offline      Offline

                     

                    ha-zones-hasp-rs     test5         Online       Online

                                         test6         Offline      Offline

                     

                     

                    Now zone solarisfz1 is running on test5 and /var/adm/messages in test5 has:

                    Feb  2 16:32:24 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - zone 'solarisfz1': warning: net0: no matching subnet found in netmasks(4): 192.

                    168.114.88; using default of 255.255.255.0.

                    Feb  2 16:36:47 test5 Cluster.RGM.global.rgmd: [ID 764140 daemon.error] Method <gds_svc_start> on resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>: Timeout.

                    Feb  2 16:36:47 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_stop> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeo

                    ut <90> seconds

                    Feb  2 16:36:47 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_stop> for resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>, timeout <300> seconds

                    Feb  2 16:36:47 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_stop> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <

                    test5>, time used: 0% of timeout <90 seconds>

                    Feb  2 16:39:53 test5 Cluster.RGM.fed: [ID 605976 daemon.notice] SCSLM zone <solarisfz1> down

                    Feb  2 16:39:56 test5 SC[SUNWsczone.stop_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] stop_sczbt rc<0> - Forcibly detach the non-global zone solarisfz1

                    Feb  2 16:39:56 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <gds_svc_stop> completed successfully for resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>, time used

                    : 62% of timeout <300 seconds>

                    Feb  2 16:39:56 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeo

                    ut <1800> seconds

                    Feb  2 16:39:56 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_postnet_stop> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <

                    test5>, time used: 0% of timeout <1800 seconds>

                     

                    Then zone-rg fails over to test6 where is brought online  and after a while is brought offline forever.

                     

                     

                    tnx,

                    Marius

                    • 7. Re: ha zone
                      Juergens-Oracle

                      Hello Marius,

                      it seems you mixed up the mount points somehow:

                       

                      Feb  2 16:20:48 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': Overlay mount detected at /ha-zones/solaris/solarisfz1

                      Feb  2 16:20:48 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.error] start_sczbt rc<1> - zoneadm: zone 'solarisfz1': zonepath is not a mountpoint for a zfs file system

                       

                      The problem is with your zonepath:

                      zone 'solarisfz1': zonepath is not a mountpoint for a zfs file system

                       

                      Check your zonepath and correct it.
                      Otherwise check if you can start the zone solarisfz1 manually as already mentioned with step7 "For a failover configuration, verify that the resource group can switch over."

                       

                      Hth,

                      Juergen

                      • 8. Re: ha zone
                        1031554

                        Hi Juergens,

                        I found this:

                        root@test6:~# zfs mount

                        rpool/ROOT/solaris-4            /

                        rpool/ROOT/solaris-4/var        /var

                        rpool/VARSHARE                  /var/share

                        rpool/export                    /export

                        rpool/export/home               /export/home

                        rpool/ha-zones                  /ha-zones

                        rpool/ha-zones/solaris          /ha-zones/solaris

                        rpool/ha-zones/solaris/solarisfz1  /ha-zones/solaris/solarisfz1

                        rpool                           /rpool

                        rpool/VARSHARE/zones            /system/zones

                        rpool/VARSHARE/pkg              /var/share/pkg

                        rpool/VARSHARE/pkg/repositories  /var/share/pkg/repositories

                        ha-zones                        /ha-zones

                        ha-zones/solaris                /ha-zones/solaris

                        ha-zones/solaris/solarisfz1     /ha-zones/solaris/solarisfz1

                        rpool/ha-zones                       95K  6.21G    32K  /ha-zones

                        rpool/ha-zones/solaris               63K  6.21G    32K  /ha-zones/solaris

                        rpool/ha-zones/solaris/solarisfz1    31K  6.21G    31K  /ha-zones/solaris/solarisfz1

                        So in rpool i have /ha-zones like in ha-zones pool.

                        oot@test5:~# zfs destroy -r rpool/ha-zones

                        Then things went without pointing to this /ha-zones mount point.

                        in /var/adm/messages after i rebooted the 2 nodes i have:

                        Feb  3 13:07:26 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test5 change to RG_OFF_BOOTED

                        Feb  3 13:07:26 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test5 change to RG_OFFLINE

                        Feb  3 13:07:26 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test5 change to RG_PENDING_ONLINE

                        Feb  3 13:07:26 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_UNKNOWN

                        Feb  3 13:07:26 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <Starting>

                        Feb  3 13:07:27 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launchi

                        ng method <hastorageplus_prenet_start> for resource <ha-zones-hasp-rs>, resource

                        group <zone-rg>, node <test5>, timeout <1800> seconds

                        Feb  3 13:07:28 test5 mac: [ID 736570 kern.info] NOTICE: e1000g2 unregistered

                        Feb  3 13:07:32 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastora

                        geplus_prenet_start]: [ID 843100 daemon.error] Failed to import zpool ha-zones:

                        Cannot import 'ha-zones' : pool may be in use from other system, it was last acc

                        essed by 'test6'.

                        Feb  3 13:07:32 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastora

                        geplus_prenet_start]: [ID 246563 daemon.error] Failed to import:ha-zones

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 938318 daemon.error] Method <

                        hastorageplus_prenet_start> failed on resource <ha-zones-hasp-rs> in resource gr

                        oup <zone-rg> [exit code <1>, time used: 0% of timeout <1800 seconds>]

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.error] resource

                        ha-zones-hasp-rs state on node test5 change to R_START_FAILED

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource

                        group zone-rg state on node test5 change to RG_PENDING_OFF_START_FAILED

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_FAULTED

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <>

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_UNKNOWN

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <Stopping>

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launchi

                        ng method <hastorageplus_postnet_stop> for resource <ha-zones-hasp-rs>, resource

                        group <zone-rg>, node <test5>, timeout <1800> seconds

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method

                        <hastorageplus_postnet_stop> completed successfully for resource <ha-zones-hasp-

                        rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <1800 seco

                        nds>

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test5 change to R_OFFLINE

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_OFFLINE

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <>

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource

                        group zone-rg state on node test5 change to RG_OFFLINE_START_FAILED

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test5 change to RG_OFFLINE

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test5 change to RG_PENDING_ONLINE

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_UNKNOWN

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <Starting>

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launchi

                        ng method <hastorageplus_prenet_start> for resource <ha-zones-hasp-rs>, resource

                        group <zone-rg>, node <test5>, timeout <1800> seconds

                        Feb  3 13:07:32 test5 Cluster.RGM.global.rgmd: [ID 663692 daemon.error] failback

                        attempt failed on resource group <zone-rg> with error <resource group failed to

                        start on chosen node; it may end up failing over to other node(s)>

                        Feb  3 13:07:35 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test6 change to RG_OFF_PENDING_BOOT

                        Feb  3 13:07:35 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test6 change to RG_OFF_BOOTED

                        Feb  3 13:07:35 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test6 change to RG_OFFLINE

                        Feb  3 13:07:37 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastora

                        geplus_prenet_start]: [ID 843100 daemon.error] Failed to import zpool ha-zones:

                        Cannot import 'ha-zones' : pool may be in use from other system, it was last acc

                        essed by 'test6'.

                        Feb  3 13:07:37 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastora

                        geplus_prenet_start]: [ID 246563 daemon.error] Failed to import:ha-zones

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 938318 daemon.error] Method <

                        hastorageplus_prenet_start> failed on resource <ha-zones-hasp-rs> in resource gr

                        oup <zone-rg> [exit code <1>, time used: 0% of timeout <1800 seconds>]

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.error] resource

                        ha-zones-hasp-rs state on node test5 change to R_START_FAILED

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource

                        group zone-rg state on node test5 change to RG_PENDING_OFF_START_FAILED

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_FAULTED

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <>

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_UNKNOWN

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <Stopping>

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launchi

                        ng method <hastorageplus_postnet_stop> for resource <ha-zones-hasp-rs>, resource

                        group <zone-rg>, node <test5>, timeout <1800> seconds

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method

                        <hastorageplus_postnet_stop> completed successfully for resource <ha-zones-hasp-

                        rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <1800 seco

                        nds>

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test5 change to R_OFFLINE

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test5 change to R_FM_OFFLINE

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test5 change to <>

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource

                        group zone-rg state on node test5 change to RG_OFFLINE_START_FAILED

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test5 change to RG_OFFLINE

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not att

                        empting to start resource group <zone-rg> on node <test5> because this resource

                        group has already failed to start on this node 2 or more times in the past 3600

                        seconds

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test6 change to RG_PENDING_ONLINE

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <Starting>

                        Feb  3 13:07:44 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test6 change to R_ONLINE_UNMON

                        Feb  3 13:07:44 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test6 change to R_FM_ONLINE

                        Feb  3 13:07:44 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <>

                        Feb  3 13:07:44 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e solarisfz1-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:07:44 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e solarisfz1-rs status msg on node test6 change to <Starting>

                        Feb  3 13:07:44 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e solarisfz1-rs state on node test6 change to R_STARTING

                        Feb  3 13:07:45 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test6 change to R_ONLINE

                        Feb  3 13:07:45 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <>

                        Feb  3 13:07:59 test5 pseudo: [ID 129642 kern.info] pseudo-device: devinfo0

                        Feb  3 13:07:59 test5 genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo

                        @0

                        Feb  3 13:09:14 test5 sendmail[830]: [ID 702911 mail.alert] unable to qualify my

                        own domain name (test5) -- using short name

                        Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.error] resource

                        solarisfz1-rs state on node test6 change to R_START_FAILED

                        Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e solarisfz1-rs status on node test6 change to R_FM_FAULTED

                        Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e solarisfz1-rs status msg on node test6 change to <>

                        Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource

                        group zone-rg state on node test6 change to RG_PENDING_OFF_START_FAILED

                        Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e solarisfz1-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e solarisfz1-rs status msg on node test6 change to <Stopping>

                        Feb  3 13:12:48 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e solarisfz1-rs state on node test6 change to R_STOPPING

                        Feb  3 13:12:49 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test6 change to R_ONLINE_UNMON

                        Feb  3 13:16:10 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e solarisfz1-rs state on node test6 change to R_OFFLINE

                        Feb  3 13:16:10 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e solarisfz1-rs status on node test6 change to R_FM_OFFLINE

                        Feb  3 13:16:10 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e solarisfz1-rs status msg on node test6 change to <>

                        Feb  3 13:16:10 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:16:10 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <Stopping>

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test6 change to R_OFFLINE

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test6 change to R_FM_OFFLINE

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <>

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource

                        group zone-rg state on node test6 change to RG_OFFLINE_START_FAILED

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test6 change to RG_OFFLINE

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not att

                        empting to start resource group <zone-rg> on node <test5> because this resource

                        group has already failed to start on this node 2 or more times in the past 3600

                        seconds

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resourc

                        e group zone-rg state on node test6 change to RG_PENDING_ONLINE

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:16:12 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <Starting>

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test6 change to R_ONLINE_UNMON

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e ha-zones-hasp-rs status on node test6 change to R_FM_ONLINE

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <>

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                        e solarisfz1-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e solarisfz1-rs status msg on node test6 change to <Starting>

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e solarisfz1-rs state on node test6 change to R_STARTING

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resourc

                        e ha-zones-hasp-rs state on node test6 change to R_ONLINE

                        Feb  3 13:16:14 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                        e ha-zones-hasp-rs status msg on node test6 change to <>

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.error] resource solarisfz1-rs state on node test6 change to R_START_FAILED

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test6 change to R_FM_FAULTED

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test6 change to <>

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group zone-rg state on node test6 change to RG_PENDING_OFF_START_FAILED

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test6 change to <Stopping>

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource solarisfz1-rs state on node test6 change to R_STOPPING

                        Feb  3 13:21:18 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test6 change to R_ONLINE_UNMON

                        Feb  3 13:24:26 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource solarisfz1-rs state on node test6 change to R_OFFLINE

                        Feb  3 13:24:26 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test6 change to R_FM_OFFLINE

                        Feb  3 13:24:26 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test6 change to <>

                        Feb  3 13:24:26 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test6 change to R_FM_UNKNOWN

                        Feb  3 13:24:26 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test6 change to <Stopping>

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test6 change to R_OFFLINE

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test6 change to R_FM_OFFLINE

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test6 change to <>

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group zone-rg state on node test6 change to RG_OFFLINE_START_FAILED

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group zone-rg state on node test6 change to RG_OFFLINE

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <zone-rg> on node <test5> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <zone-rg> on node <test6> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds

                        Feb  3 13:24:28 test5 Cluster.RGM.global.rgmd: [ID 674214 daemon.notice] rebalance: no primary node is currently found for resource group <zone-rg>.

                         

                         

                        the zone was up and running on test6 for a while ,means that solarisfz1-rs and ha-zones-hasp-rs was brought online.

                        i have done manually failover and all was fine.

                        tnx,

                        arius

                        • 9. Re: ha zone
                          Juergens-Oracle

                          Hello Marius,

                           

                          in case of start failure on test5:

                          Feb  3 13:07:32 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastora

                          geplus_prenet_start]: [ID 843100 daemon.error] Failed to import zpool ha-zones:

                          Cannot import 'ha-zones' : pool may be in use from other system, it was last acc

                          essed by 'test6'.

                           

                          the issue is with the ha-zones-hasp-rs and the ha-zones zpool.

                          Are you able to switch the ha-zones pools between test5 and test6?

                           

                          Test if you can export and import the ha-zones zpool on test5 and test6:

                          Set the zone-rg to unmanaged by doing.

                          # clrs disable solarisfz1-rs

                          # clrs disable ha-zones-hasp-rs 

                          # clrg offline zone-rg

                          # clrg unmanage zone-rg

                          Now use ‘zpool export’ and ‘zpool import’ on test5 and test6 to verify that ha-zones zpool can be hosted on both nodes. If this works ensure that ha-zones zpool exported on both nodes.

                          Afterwards:

                          # clrg manage zone-rg

                          # clrg online zone-rg

                          # clrs enable ha-zones-hasp-rs 

                          This should import the zpool on one host.

                          Then try to switch the zone-rg between test5 and test6.

                          # clrg switch -n test6 zone-rg  (verify that ha-zones pool is imported on test6)

                          # clrg switch -n test5 zone-rg  (verify that ha-zones pool is imported on test5)

                           

                          When all above worked successfully we can continue with the failed startup of solarisfz1-rs on test6.

                          Feb  3 13:07:37 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resourc

                          e ha-zones-hasp-rs status msg on node test6 change to <Starting>

                          Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.error] resource

                          solarisfz1-rs state on node test6 change to R_START_FAILED

                          Feb  3 13:12:47 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resourc

                          e solarisfz1-rs status on node test6 change to R_FM_FAULTED

                           

                          The zone is not starting within 5min. which is the default start timeout. Did the zone start in 5min. when you start the zone manually?

                           

                          When the switch of ha-zones zpool was successful on test5 and test6 as described above then try to enable the solarisfz1-rs with

                          # clrs enable solarisfz1-rs

                           

                          if this still fails you should enable debug of HA Zone agent

                          Solaris Cluster Resource How to Enable Data Service Debug Mode (Doc ID 1010497.1)

                          to identify why the zone does not start within 5min.

                           

                          Hth,

                            Juergen

                          • 10. Re: ha zone
                            1031554

                            Hi Juergens,

                            I have done every step you suggested including switch zone-rg between test5 and test6  and everything is right.Just a question here-if i put zone-rg in unmanaged status it means that there is no control of cluster on zone-rg,right?The zone start immediately when i start it manually.

                            root@test5:~# zoneadm list -cv

                              ID NAME             STATUS      PATH                         BRAND      IP

                               0 global           running     /                            solaris    shared

                               - solarisfz1       configured  /ha-zones/solaris/solarisfz1 solaris    shared

                            root@test5:~# zoneadm -z solarisfz1 attach

                            Progress being logged to /var/log/zones/zoneadm.20170204T101323Z.solarisfz1.attach

                                Installing: Using existing zone boot environment

                                  Zone BE root dataset: ha-zones/solaris/solarisfz1/rpool/ROOT/solaris

                                                 Cache: Using /var/pkg/publisher.

                              Updating non-global zone: Linking to image /.

                            Processing linked: 1/1 done

                              Updating non-global zone: Auditing packages.

                            No updates necessary for this image. (zone:solarisfz1)

                             

                              Updating non-global zone: Zone updated.

                                                Result: Attach Succeeded.

                            Log saved in non-global zone as /ha-zones/solaris/solarisfz1/root/var/log/zones/zoneadm.20170204T101323Z.solarisfz1.attach

                            root@test5:~# /usr/cluster/bin/clrs status

                             

                            === Cluster Resources ===

                             

                            Resource Name         Node Name     State       Status Message

                            -------------         ---------     -----       --------------

                            solarisfz1-rs         test5         Offline     Offline

                                                  test6         Offline     Offline

                             

                            ha-zones-hasp-rs      test5         Online      Online

                                                  test6         Offline     Offline

                             

                            root@test5:~# /usr/cluster/bin/clrg status

                             

                            === Cluster Resource Groups ===

                             

                            Group Name       Node Name       Suspended      Status

                            ----------       ---------       ---------      ------

                            zone-rg          test5           No             Online

                                             test6           No             Offline

                             

                            Next:

                            root@test5:~# /usr/cluster/bin/clrs status

                             

                            === Cluster Resources ===

                             

                            Resource Name        Node Name     State        Status Message

                            -------------        ---------     -----        --------------

                            solarisfz1-rs        test5         Offline      Offline

                                                 test6         Starting     Unknown - Starting

                             

                            ha-zones-hasp-rs     test5         Offline      Offline

                                                 test6         Online       Online

                             

                            I enabled debug on ha zone resource.

                            next i enable zone-rg and i have:

                            root@test5:~# tail /var/adm/clusterlog

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test6 change to R_JUST_STARTED

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test6 change to R_ONLINE_UNMON

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test6 change to R_FM_ONLINE

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test6 change to <>

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test6 change to R_FM_UNKNOWN

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test6 change to <Starting>

                            Feb  4 12:20:44 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource solarisfz1-rs state on node test6 change to R_STARTING

                            Feb  4 12:20:45 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test6 change to R_ONLINE

                            Feb  4 12:20:45 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test6 change to <>

                            Feb  4 12:21:00 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                            root@test5:~# tail -f /var/adm/clusterlog

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test6 change to R_POSTNET_STOPPING

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test6 change to R_OFFLINE

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test6 change to R_FM_OFFLINE

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test6 change to <>

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group zone-rg state on node test6 change to RG_OFFLINE_START_FAILED

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group zone-rg state on node test6 change to RG_OFFLINE

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <zone-rg> on node <test5> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <zone-rg> on node <test6> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds

                            Feb  4 12:26:42 test5 Cluster.RGM.global.rgmd: [ID 674214 daemon.notice] rebalance: no primary node is currently found for resource group <zone-rg>.

                            Feb  4 12:27:00 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                            Feb  4 12:27:00 test5 last message repeated 3 times

                            Feb  4 12:27:30 test5 Cluster.scdpmd: [ID 871842 daemon.debug] dpm_device_io: path = /dev/did/rdsk/d12s0, status = 1

                            Feb  4 12:27:30 test5 Cluster.scdpmd: [ID 532212 daemon.debug] dpm_timeout_io: path = /dev/did/rdsk/d12s0, status = 1

                            Feb  4 12:27:30 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                            ^C

                             

                            root@test5:~# tail -f /var/adm/clusterlog

                            Feb  4 12:28:30 test5 last message repeated 4 times

                            Feb  4 12:28:30 test5 Cluster.scdpmd: [ID 871842 daemon.debug] dpm_device_io: path = /dev/did/rdsk/d2s0, status = 1

                            Feb  4 12:28:30 test5 Cluster.scdpmd: [ID 532212 daemon.debug] dpm_timeout_io: path = /dev/did/rdsk/d2s0, status = 1

                            Feb  4 12:28:30 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                            Feb  4 12:29:00 test5 last message repeated 6 times

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group zone-rg state on node test5 change to RG_PENDING_ONLINE

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test5 change to R_PRENET_STARTING

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test5 change to R_FM_UNKNOWN

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test5 change to <Starting>

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_prenet_start> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeout <1800> seconds

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 944595 daemon.debug] 39 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_prenet_start>:tag=<zone-rg.ha-zones-hasp-rs.10>: Calling security_clnt_connect(..., host=<test5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 474256 daemon.info] Validations of all specified global device services complete.

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 123984 daemon.info] All specified global device services are available.

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 702911 daemon.debug] fs_publish_start: /ha-zones:3:1

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 703969 daemon.debug] smb_door_call[share_publish_admin]: No such file or directory

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 483273 daemon.debug] smb_share_publish_admin: No such file or directory

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 702911 daemon.debug] fs_publish_stop: /ha-zones: ok

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 702911 daemon.debug] fs_publish_start: /ha-zones/solaris:3:1

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 703969 daemon.debug] smb_door_call[share_publish_admin]: No such file or directory

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 483273 daemon.debug] smb_share_publish_admin: No such file or directory

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 702911 daemon.debug] fs_publish_stop: /ha-zones/solaris: ok

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 702911 daemon.debug] fs_publish_start: /ha-zones/solaris/solarisfz1:3:1

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 703969 daemon.debug] smb_door_call[share_publish_admin]: No such file or directory

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 483273 daemon.debug] smb_share_publish_admin: No such file or directory

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_prenet_start]: [ID 702911 daemon.debug] fs_publish_stop: /ha-zones/solaris/solarisfz1: ok

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_prenet_start> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <1800 seconds>

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test5 change to R_JUST_STARTED

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test5 change to R_ONLINE_UNMON

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test5 change to R_FM_ONLINE

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test5 change to <>

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource solarisfz1-rs state on node test5 change to R_STARTING

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test5 change to R_MON_STARTING

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test5 change to R_FM_UNKNOWN

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test5 change to <Starting>

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_start> for resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>, timeout <300> seconds

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_start> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeout <90> seconds

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 944595 daemon.debug] 39 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_start>:tag=<zone-rg.solarisfz1-rs.0>: Calling security_clnt_connect(..., host=<test5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 703477 daemon.debug] 38 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_start>:tag=<zone-rg.ha-zones-hasp-rs.7>: Calling security_clnt_connect(..., host=<test5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_monitor_start]: [ID 354821 daemon.info] Attempting to start the fault monitor under process monitor facility.

                            Feb  4 12:29:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_monitor_start]: [ID 425271 daemon.info] Started the monitor-method successfully

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_start> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <90 seconds>

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test5 change to R_ONLINE

                            Feb  4 12:29:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_start]: [ID 661560 daemon.info] All the SUNW.HAStoragePlus resources that this resource depends on are online on the local node. Proceeding with the checks for the existence and permissions of the start/stop/probe commands.

                            Feb  4 12:29:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_start]: [ID 268646 daemon.info] Extension property <network_aware> has a value of <0>

                            Feb  4 12:29:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_start]: [ID 887138 daemon.info] Extension property <Child_mon_level> has a value of <-1>

                            Feb  4 12:29:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_start]: [ID 833212 daemon.info] Attempting to start the data service under process monitor facility.

                            Feb  4 12:29:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_start]: [ID 569559 daemon.info] Start of /opt/SUNWsczone/bin/control_ha-zone sczbt start -R solarisfz1-rs -G zone-rg -T ORCL.ha-zone_sczbt:2 completed successfully.

                            Feb  4 12:29:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_start]: [ID 268646 daemon.info] Extension property <network_aware> has a value of <0>

                            Feb  4 12:29:05 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test5 change to <>

                            Feb  4 12:29:23 test5 Cluster.RGM.zonesd: [ID 893639 daemon.info] Invalid zone state: initialized

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Progress being logged to /var/log/zones/zoneadm.20170204T122909Z.solarisfz1.attach

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Installing: Using existing zone boot environment

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Zone BE root dataset: ha-zones/solaris/solarisfz1/rpool/ROOT/solaris

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Cache: Using /var/pkg/publisher.

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Updating non-global zone: Linking to image /.

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Finished processing linked images.

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Updating non-global zone: Auditing packages.

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Planning: Consolidating action changes ... Done

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Planning: Evaluating mediators ... Done

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Planning: Planning completed in 0.10 seconds

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - No updates necessary for this image. (zone:solarisfz1)

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Updating non-global zone: Zone updated.

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Result: Attach Succeeded.

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - Log saved in non-global zone as /ha-zones/solaris/solarisfz1/root/var/log/zones/zoneadm.20170204T122909Z.solarisfz1.attach

                            Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - zone 'solarisfz1': warning: net0: no matching subnet found in netmasks(4): 192.168.114.88; using default of 255.255.255.0.

                            Feb  4 12:29:30 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                             

                             

                             

                             

                            root@test6:/opt/SUNWsczone/sczbt/util# /usr/cluster/bin/clrs status

                             

                            === Cluster Resources ===

                             

                            Resource Name        Node Name     State        Status Message

                            -------------        ---------     -----        --------------

                            solarisfz1-rs        test5         Starting     Unknown - Starting

                                                 test6         Offline      Offline

                             

                            ha-zones-hasp-rs     test5         Online       Online

                                                 test6         Offline      Offline

                             

                            root@test6:/opt/SUNWsczone/sczbt/util# /usr/cluster/bin/clrs status

                             

                             

                             

                            The resource solarisfz1-rs is stopping and the content of /var/adm/clusterlog is below

                             

                            === Cluster Resources ===

                             

                            Resource Name      Node Name   State                  Status Message

                            -------------      ---------   -----                  --------------

                            solarisfz1-rs      test5       Stopping               Unknown - Stopping

                                               test6       Offline                Offline

                             

                            ha-zones-hasp-rs   test5       Online_not_monitored   Online_not_monitored

                                               test6       Offline                Offline

                            Feb  4 12:30:30 test5 last message repeated 11 times

                            Feb  4 12:31:00 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                            Feb  4 12:34:00 test5 last message repeated 27 times

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 764140 daemon.error] Method <gds_svc_start> on resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>: Timeout.

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.error] resource solarisfz1-rs state on node test5 change to R_START_FAILED

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group zone-rg state on node test5 change to RG_PENDING_OFF_START_FAILED

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test5 change to R_FM_FAULTED

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test5 change to <>

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource solarisfz1-rs state on node test5 change to R_STOPPING

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test5 change to R_MON_STOPPING

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test5 change to R_FM_UNKNOWN

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test5 change to <Stopping>

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_stop> for resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>, timeout <300> seconds

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_stop> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeout <90> seconds

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 944595 daemon.debug] 39 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_stop>:tag=<zone-rg.solarisfz1-rs.1>: Calling security_clnt_connect(..., host=<test5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 703477 daemon.debug] 38 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_stop>:tag=<zone-rg.ha-zones-hasp-rs.8>: Calling security_clnt_connect(..., host=<test5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

                            Feb  4 12:34:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_monitor_stop]: [ID 227820 daemon.info] Attempting to stop the data service running under process monitor facility.

                            Feb  4 12:34:05 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_monitor_stop]: [ID 542894 daemon.info] Stopped the monitor-method successfully

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_stop> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <90 seconds>

                            Feb  4 12:34:05 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test5 change to R_ONLINE_UNMON

                            Feb  4 12:34:05 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_stop]: [ID 721263 daemon.info] Extension property <stop_signal> has a value of <9>

                            Feb  4 12:34:30 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                             

                            here the resources are offline and the content of /var/adm/clusterlog is below.

                            root@test6:~# /usr/cluster/bin/clrs status

                             

                            === Cluster Resources ===

                             

                            Resource Name         Node Name     State       Status Message

                            -------------         ---------     -----       --------------

                            solarisfz1-rs         test5         Offline     Offline

                                                  test6         Offline     Offline

                             

                            ha-zones-hasp-rs      test5         Offline     Offline

                                                  test6         Offline     Offline

                            Feb  4 12:37:10 test5 ibmgmtd[47]: [ID 702911 daemon.debug] open /devices/ib:devctl: No such file or directory

                            Feb  4 12:37:11 test5 Cluster.RGM.fed: [ID 605976 daemon.notice] SCSLM zone <solarisfz1> down

                            Feb  4 12:37:14 test5 SC[SUNWsczone.stop_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] stop_sczbt rc<0> - Forcibly detach the non-global zone solarisfz1

                            Feb  4 12:37:14 test5 SC[,ORCL.ha-zone_sczbt:2,zone-rg,solarisfz1-rs,gds_svc_stop]: [ID 401400 daemon.info] Successfully stopped the application

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <gds_svc_stop> completed successfully for resource <solarisfz1-rs>, resource group <zone-rg>, node <test5>, time used: 62% of timeout <300 seconds>

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource solarisfz1-rs state on node test5 change to R_OFFLINE

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource solarisfz1-rs status on node test5 change to R_FM_OFFLINE

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource solarisfz1-rs status msg on node test5 change to <>

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.debug] resource ha-zones-hasp-rs state on node test5 change to R_POSTNET_STOPPING

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test5 change to R_FM_UNKNOWN

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test5 change to <Stopping>

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, timeout <1800> seconds

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 944595 daemon.debug] 39 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_postnet_stop>:tag=<zone-rg.ha-zones-hasp-rs.11>: Calling security_clnt_connect(..., host=<test5>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_start: /ha-zones/solaris/solarisfz1/root/var:3:1

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_stop: /ha-zones/solaris/solarisfz1/root/var: ok

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_start: /ha-zones/solaris/solarisfz1/root:3:1

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_stop: /ha-zones/solaris/solarisfz1/root: ok

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_start: /ha-zones/solaris/solarisfz1:3:1

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_stop: /ha-zones/solaris/solarisfz1: ok

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_start: /ha-zones/solaris:3:1

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_stop: /ha-zones/solaris: ok

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_start: /ha-zones:3:1

                            Feb  4 12:37:14 test5 SC[,SUNW.HAStoragePlus:11,zone-rg,ha-zones-hasp-rs,hastorageplus_postnet_stop]: [ID 702911 daemon.debug] fs_unpublish_stop: /ha-zones: ok

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_postnet_stop> completed successfully for resource <ha-zones-hasp-rs>, resource group <zone-rg>, node <test5>, time used: 0% of timeout <1800 seconds>

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource ha-zones-hasp-rs state on node test5 change to R_OFFLINE

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ha-zones-hasp-rs status on node test5 change to R_FM_OFFLINE

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ha-zones-hasp-rs status msg on node test5 change to <>

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group zone-rg state on node test5 change to RG_OFFLINE_START_FAILED

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group zone-rg state on node test5 change to RG_OFFLINE

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <zone-rg> on node <test5> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <zone-rg> on node <test6> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds

                            Feb  4 12:37:14 test5 Cluster.RGM.global.rgmd: [ID 674214 daemon.notice] rebalance: no primary node is currently found for resource group <zone-rg>.

                            Feb  4 12:37:30 test5 Cluster.scdpmd: [ID 871842 daemon.debug] dpm_device_io: path = /dev/did/rdsk/d12s0, status = 1

                            Feb  4 12:37:30 test5 Cluster.scdpmd: [ID 532212 daemon.debug] dpm_timeout_io: path = /dev/did/rdsk/d12s0, status = 1

                            Feb  4 12:37:30 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                             

                            Feb  4 12:38:00 test5 last message repeated 4 times

                            Feb  4 12:38:00 test5 Cluster.scdpmd: [ID 871842 daemon.debug] dpm_device_io: path = /dev/did/rdsk/d4s0, status = 1

                            Feb  4 12:38:00 test5 Cluster.scdpmd: [ID 532212 daemon.debug] dpm_timeout_io: path = /dev/did/rdsk/d4s0, status = 1

                            Feb  4 12:38:00 test5 Cluster.scdpmd: [ID 290165 daemon.debug] returned from wait_for_dpm_timeout_io_tasks

                             

                            tnx,

                            Marius

                            • 11. Re: ha zone
                              Juergens-Oracle

                              Hallo Marius,

                               

                              to your question:

                              Just a question here-if i put zone-rg in unmanaged status it means that there is no control of cluster on zone-rg,right?

                              Correct.

                               

                               

                              I noticed in the logs:

                              Feb  4 12:29:29 test5 SC[SUNWsczone.start_sczbt]:zone-rg:solarisfz1-rs: [ID 567783 daemon.notice] start_sczbt rc<0> - zone 'solarisfz1': warning: net0: no matching subnet found in netmasks(4): 192.168.114.88; using default of 255.255.255.0.

                              before the start timeout occurred.

                               

                              Please add the missing entry for the netmask into the /etc/netmasks file.

                              Then try if the solarisfz1 is starting correct.

                               

                              Hth,

                                 Juergen

                              • 12. Re: ha zone
                                1031554

                                Hi Juergens,

                                 

                                # /usr/cluster/bin/clrs status

                                 

                                === Cluster Resources ===

                                 

                                Resource Name         Node Name     State       Status Message

                                -------------         ---------     -----       --------------

                                solarisfz1-rs         test5         Offline     Offline

                                                            test6         Online      Online - Service is online.

                                 

                                ha-zones-hasp-rs      test5         Offline     Offline

                                                      test6         Online      Online

                                 

                                root@test6:~# /usr/cluster/bin/clrg status

                                 

                                === Cluster Resource Groups ===

                                 

                                Group Name       Node Name       Suspended      Status

                                ----------       ---------       ---------      ------

                                zone-rg          test5           No             Offline

                                                       test6           No             Online

                                 

                                Is the above right?

                                tnx

                                • 13. Re: ha zone
                                  Juergens-Oracle

                                  Hello Marius,

                                   

                                  yes, this looks good. All online on test6.

                                   

                                  Thanks,

                                     Juergen

                                  • 14. Re: ha zone
                                    1031554

                                    tnx

                                    1 2 Previous Next