4 Replies Latest reply on Sep 5, 2019 12:38 PM by user6387903

    Add node doesn't work when OOP patching strategy enabled

    user6387903

      We have a 12.2.0.1 RAC in SE2 with Oracle Linux 7

      The history of the cluster has been, during some months:

      - initial configuration with two nodes

      - delete one node and add another one

      - implement Out Of Place (OOP) patching so that with had initially one ORACLE_HOME path, then during one patching phase it became ORACLE_HOME_1 new path and then ORACLE_HOME_2 new path

       

      As an hw replacement / refreshing strategy we are going to delete a node and then add a new one.

      Deletion node completed successfully following official docs:

      - oracle rdbms https://docs.oracle.com/database/122/RACAD/adding-and-deleting-oracle-rac-from-nodes-on-linux-and-unix-systems.htm#RACAD0072

      - oracle grid https://docs.oracle.com/database/122/CWADD/adding-and-deleting-cluster-nodes.htm#CWADD90992

      Now we are trying to add a new node racsvi2 with the golden image strategy as described in official docs here:

      https://docs.oracle.com/database/122/CWADD/cloning-oracle-clusterware.htm#CWADD92137

      But in the step of root.sh of new node racsvi2 we have the CRS stack that is not started

      It seems something doesn't work as expected.

       

      add node command output has been this one:

      [grid@testrac1 grid_home_12201_1]$ /opt/oracle/grid_home_12201_1/addnode/addnode.sh -silent -noCopy "CLUSTER_NEW_NODES={racsvi2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={clracsvi2.mydomain}" "CLUSTER_NEW_NODE_ROLES={HUB}"

       

      Prepare Configuration in progress.

       

      Prepare Configuration successful.

      .................................................. 29% Done.

       

      Setup Oracle Base in progress.

       

      Setup Oracle Base successful.

      .................................................. 57% Done.

       

      Update Inventory in progress.

       

      Update Inventory successful.

      .................................................. 86% Done.

       

      As a root user, execute the following script(s):

        1. /opt/oracle/grid_home_12201_1/root.sh

       

      Execute /opt/oracle/grid_home_12201_1/root.sh on the following nodes: 

      [racsvi2]

       

      The scripts can be executed in parallel on all the nodes.

       

      .................................................. 100% Done.

      Successfully Setup Software.

      [grid@testrac1 grid_home_12201_1]$ 

       

      The problem is the root.sh on racsvi2 that gives immediately:

      [root@racsvi2 ~]#  /opt/oracle/grid_home_12201_1/root.sh

      Check /opt/oracle/grid_home_12201_1/install/root_racsvi2.mydomain_2019-09-02_16-09-42-952793359.log for the output of root script

      [root@racsvi2 ~]#

       

      and inside the log file I have only

       

      Performing root user operation.

       

       

      The following environment variables are set as:

          ORACLE_OWNER= grid

          ORACLE_HOME=  /opt/oracle/grid_home_12201_1

         Copying dbhome to /usr/local/bin ...

         Copying oraenv to /usr/local/bin ...

         Copying coraenv to /usr/local/bin ...

       

       

      Entries will be added to the /etc/oratab file as needed by

      Database Configuration Assistant when a database is created

      Finished running generic part of root script.

      Now product-specific root actions will be performed.

       

       

      To configure Grid Infrastructure for a Cluster or Grid Infrastructure for a Stand-Alone Server execute the following command as grid user:

      /opt/oracle/grid_home_12201_1/gridSetup.sh

      This command launches the Grid Infrastructure Setup Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

       

      Instead in a previous add node work ,I see start of crs and so on...

       

      Posting here to get any suggestion to verify or crosscheck.

      Thanks in advance,

      Gianluca

        • 1. Re: Add node doesn't work when OOP patching strategy enabled
          user6387903

          During another "add node" activity (the first one in time, executed in the workflow explained above, so before implementing OOP) with the same golden image strategy, the root.sh on the node to be added produced instead this kind of contents inside the log file:

           

          Performing root user operation.

           

           

          The following environment variables are set as:

              ORACLE_OWNER= grid

              ORACLE_HOME=  /opt/oracle/grid_home_12201

             Copying dbhome to /usr/local/bin ...

             Copying oraenv to /usr/local/bin ...

             Copying coraenv to /usr/local/bin ...

           

           

           

           

          Creating /etc/oratab file...

          Entries will be added to the /etc/oratab file as needed by

          Database Configuration Assistant when a database is created

          Finished running generic part of root script.

          Now product-specific root actions will be performed.

          Relinking oracle with rac_on option

          Using configuration parameter file: /opt/oracle/grid_home_12201/crs/install/crsconfig_params

          The log of current session can be found at:

            /opt/oracle/grid_base_12201/crsdata/testrac3/crsconfig/rootcrs_testrac3_2017-08-24_05-45-47PM.log

          2017/08/24 17:45:51 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.

          . . .

          CRS-4123: Oracle High Availability Services has been started.

          2017/08/24 17:51:18 CLSRSC-343: Successfully started Oracle Clusterware stack

          .....

           

          I think in some way there are problems in the internal mgmt by Oracle of the files:

          $GRID_HOME/crs/install/crsconfig_params --> that still contains original two nodes information and seems not to be updated

          $GRID_HOME/crs/install/crsconfig_addparams --> that initially doesn't exist and it is instantiated after the first add node operation

           

          In the first node the stage of pre-add verification is ok:

           

          [grid@testrac1 grid_home_12201_1]$ cluvfy stage -pre nodeadd -n racsvi2

           

           

          Verifying Physical Memory ...PASSED

          Verifying Available Physical Memory ...PASSED

          Verifying Swap Size ...PASSED

          Verifying Free Space: testrac1:/usr,testrac1:/var,testrac1:/etc,testrac1:/sbin,testrac1:/tmp ...PASSED

          Verifying Free Space: testrac1:/opt/oracle/grid_home_12201_1 ...PASSED

          Verifying Free Space: racsvi2:/usr,racsvi2:/sbin ...PASSED

          Verifying Free Space: racsvi2:/var ...PASSED

          Verifying Free Space: racsvi2:/etc ...PASSED

          Verifying Free Space: racsvi2:/opt/oracle/grid_home_12201_1 ...PASSED

          Verifying Free Space: racsvi2:/tmp ...PASSED

          Verifying User Existence: grid ...

            Verifying Users With Same UID: 1030 ...PASSED

          Verifying User Existence: grid ...PASSED

          Verifying User Existence: root ...

            Verifying Users With Same UID: 0 ...PASSED

          Verifying User Existence: root ...PASSED

          Verifying User Existence: oracle12 ...

            Verifying Users With Same UID: 1023 ...PASSED

          Verifying User Existence: oracle12 ...PASSED

          Verifying Group Existence: asmadmin ...PASSED

          Verifying Group Existence: asmoper ...PASSED

          Verifying Group Existence: asmdba ...PASSED

          Verifying Group Existence: oinstall ...PASSED

          Verifying Group Membership: oinstall ...PASSED

          Verifying Group Membership: asmdba ...PASSED

          Verifying Group Membership: asmadmin ...PASSED

          Verifying Group Membership: asmoper ...PASSED

          Verifying Run Level ...PASSED

          Verifying Hard Limit: maximum open file descriptors ...PASSED

          Verifying Soft Limit: maximum open file descriptors ...PASSED

          Verifying Hard Limit: maximum user processes ...PASSED

          Verifying Soft Limit: maximum user processes ...PASSED

          Verifying Soft Limit: maximum stack size ...PASSED

          Verifying Architecture ...PASSED

          Verifying OS Kernel Version ...PASSED

          Verifying OS Kernel Parameter: semmsl ...PASSED

          Verifying OS Kernel Parameter: semmns ...PASSED

          Verifying OS Kernel Parameter: semopm ...PASSED

          Verifying OS Kernel Parameter: semmni ...PASSED

          Verifying OS Kernel Parameter: shmmax ...PASSED

          Verifying OS Kernel Parameter: shmmni ...PASSED

          Verifying OS Kernel Parameter: shmall ...PASSED

          Verifying OS Kernel Parameter: file-max ...PASSED

          Verifying OS Kernel Parameter: ip_local_port_range ...PASSED

          Verifying OS Kernel Parameter: rmem_default ...PASSED

          Verifying OS Kernel Parameter: rmem_max ...PASSED

          Verifying OS Kernel Parameter: wmem_default ...PASSED

          Verifying OS Kernel Parameter: wmem_max ...PASSED

          Verifying OS Kernel Parameter: aio-max-nr ...PASSED

          Verifying OS Kernel Parameter: panic_on_oops ...PASSED

          Verifying Package: binutils-2.23.52.0.1 ...PASSED

          Verifying Package: compat-libcap1-1.10 ...PASSED

          Verifying Package: libgcc-4.8.2 (x86_64) ...PASSED

          Verifying Package: libstdc++-4.8.2 (x86_64) ...PASSED

          Verifying Package: libstdc++-devel-4.8.2 (x86_64) ...PASSED

          Verifying Package: sysstat-10.1.5 ...PASSED

          Verifying Package: ksh ...PASSED

          Verifying Package: make-3.82 ...PASSED

          Verifying Package: glibc-2.17 (x86_64) ...PASSED

          Verifying Package: glibc-devel-2.17 (x86_64) ...PASSED

          Verifying Package: libaio-0.3.109 (x86_64) ...PASSED

          Verifying Package: libaio-devel-0.3.109 (x86_64) ...PASSED

          Verifying Package: nfs-utils-1.2.3-15 ...PASSED

          Verifying Package: smartmontools-6.2-4 ...PASSED

          Verifying Package: net-tools-2.0-0.17 ...PASSED

          Verifying Users With Same UID: 0 ...PASSED

          Verifying Current Group ID ...PASSED

          Verifying Root user consistency ...PASSED

          Verifying Package: cvuqdisk-1.0.10-1 ...PASSED

          Verifying Node Addition ...

            Verifying CRS Integrity ...PASSED

            Verifying Clusterware Version Consistency ...PASSED

            Verifying '/opt/oracle/grid_home_12201_1' ...PASSED

          Verifying Node Addition ...PASSED

          Verifying Node Connectivity ...

            Verifying Hosts File ...PASSED

            Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED

            Verifying subnet mask consistency for subnet "10.4.4.0" ...PASSED

            Verifying subnet mask consistency for subnet "192.168.15.0" ...PASSED

          Verifying Node Connectivity ...PASSED

          Verifying Multicast check ...PASSED

          Verifying ASM Integrity ...

            Verifying Node Connectivity ...

              Verifying Hosts File ...PASSED

              Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED

              Verifying subnet mask consistency for subnet "10.4.4.0" ...PASSED

              Verifying subnet mask consistency for subnet "192.168.15.0" ...PASSED

            Verifying Node Connectivity ...PASSED

          Verifying ASM Integrity ...PASSED

          Verifying Device Checks for ASM ...PASSED

          Verifying Database home availability ...PASSED

          Verifying ASMLib installation and configuration verification. ...

            Verifying '/etc/init.d/oracleasm' ...PASSED

            Verifying '/dev/oracleasm' ...PASSED

            Verifying '/etc/sysconfig/oracleasm' ...PASSED

          Verifying ASMLib installation and configuration verification. ...PASSED

          Verifying OCR Integrity ...PASSED

          Verifying Time zone consistency ...PASSED

          Verifying Network Time Protocol (NTP) ...

            Verifying '/etc/chrony.conf' ...PASSED

            Verifying '/var/run/chronyd.pid' ...PASSED

            Verifying Daemon 'chronyd' ...PASSED

            Verifying NTP daemon or service using UDP port 123 ...PASSED

            Verifying chrony daemon is synchronized with at least one external time source ...PASSED

          Verifying Network Time Protocol (NTP) ...PASSED

          Verifying User Not In Group "root": grid ...PASSED

          Verifying resolv.conf Integrity ...PASSED

          Verifying DNS/NIS name service ...PASSED

          Verifying User Equivalence ...PASSED

          Verifying /dev/shm mounted as temporary file system ...PASSED

          Verifying /boot mount ...PASSED

          Verifying zeroconf check ...PASSED

           

           

          Pre-check for node addition was successful.

           

           

          CVU operation performed:      stage -pre nodeadd

          Date:                         Sep 2, 2019 4:11:21 PM

          CVU home:                     /opt/oracle/grid_home_12201_1/

          User:                         grid

          [grid@testrac1 grid_home_12201_1]$

           

          Of course instead the post-stage verification on the node to be added is unsuccessful:

           

          [grid@testrac1 grid_home_12201_1]$ cluvfy stage -post nodeadd -n racsvi2

           

           

          Shared resources check for node addition failed

           

           

          Post-check for node addition was unsuccessful on all the nodes.

           

           

          CVU operation performed:      stage -post nodeadd

          Date:                         Sep 2, 2019 4:15:16 PM

          CVU home:                     /opt/oracle/grid_home_12201_1/

          User:                         grid

          [grid@testrac1 grid_home_12201_1]$

           

          And the olsnodes command gives only the first node in output.

          In the hope that anyone else had similar problems.

          Already opened an SR that unfortunately seems stuck, even if already escalated ;-(

           

          Thanks in advance,

          Gianluca

          • 2. Re: Add node doesn't work when OOP patching strategy enabled
            Dude!

            But in the step of root.sh of new node racsvi2 we have the CRS stack that is not started

            What is the actual problem? If the above is the problem, that's not much to work with. Instead of showing all that works, can you show what doesn't work?

            • 3. Re: Add node doesn't work when OOP patching strategy enabled
              Dude!

              If Oracle Clusterware isn't running, perhaps you disabled it.

               

              What do you see when typing "systemctl status oracle-ohasd" or "journalctl -xb" as root?

              • 4. Re: Add node doesn't work when OOP patching strategy enabled
                user6387903

                The problem is that it is responsibility of the root.sh executed on the new node to setup/start its CRS stack for the first time.

                And instead from its log it seems that it think to be in a new cluster not a new member of an existing one...

                Anyway it seems there is this bug, currently targeted only to18.2:

                Bug: 27786699 - 18.2:Add node script failed while adding the cluster node after Out Of Place Patching (NOTE 27786699.8)

                 

                I think I'm encountering it and until now no one verified in 12.2.