4 Replies Latest reply on Feb 20, 2012 9:14 AM by 915071

    the status of SN is always "STARTING"

      I run the script of start SNs, but the status of SNs are always "Starting"

      Verify Information:
      Verify: Storage Node [sn1] on datanode1:5000 Datacenter: Beijing [dc1] Status: RUNNING Ver: 11gR2.1.2.123
      Verify: Admin [admin1]     Status: RUNNING
      Verify: Rep Node [rg1-rn1]     Status: RUNNING,UNKNOWN at sequence number: 861 haPort: 5006
      Verify: == checking storage node sn2 ==
      Verify: Storage Node [sn2] on datanode2:5000 Datacenter: Beijing [dc1] Status: RUNNING Ver: 11gR2.1.2.123
      Verify: rg1-rn2: Expected status RUNNING but was STARTING
      Verify: Rep Node [rg1-rn2]     Status: STARTING,DETACHED at sequence number: 0 haPort: 5005
      Verify: == checking storage node sn3 ==
      Verify: rg1-rn2:     Expected status RUNNING but was STARTING
      Verify: sn3:     Mismatch between metadata in admin service and sn3: Expected these parameter collections: Global, StorageNode but instead, see these collections: storageNodeParams globalParams repNodeParams
      Verify: rg1-rn4:     Expected status RUNNING but was STARTING
      Verify: rg1-rn3:     Expected status RUNNING but was STARTING

      Logs are below:

      02-16-12 18:48:47:98 UTC+8 INFO [rg1-rn2] JE: Request for unknown Service: Acceptor Registered services: [Learner, LogFileFeeder, LDiff, NodeState, BinaryNodeState]
        • 1. Re: the status of SN is always "STARTING"
          Linda Lee-Oracle
          The verify command tells us that there is something invalid with your deployment. perhaps due to earlier testing experiments.

          - SN1 is up, is hosting an Admin service and is hosting a RN named rg1-rn1
          - SN2 is up, and is hosting a RN named rg1-rn2.
          - SN3 is up, is hosting three RNS, named rg1-rn2, rg1-rn3, rg1-rn4

          There are two things wrong:

          1. There are two RNs in your deployment with the same name (rg1-rn2). This should never happen.
          2. The admin service believes that SN3 should not host any RNs at all, so SN3 has some sort of information left over from a different deployment.

          Based on that, I wonder if you had a previous deployment, where you setup three RNs on SN3. Then perhaps you attempted to do a second deployment of a different configuration, using the same root directories and port numbers, and did not completely clean up the first experiment?

          When you are trying different configurations, you will want to remove all files from your KVROOT directories, and kill all processes. You can find these processes with jps -m.

          This is different from experimenting with failover. You may kill processes intentionally, in order to test that NoSQL DB automatically restarts different components, as it is supposed to do. In those cases, you do not need to remove any files, and doing so would have unexpected consequences.
          • 2. Re: the status of SN is always "STARTING"
            Thanks for your answers. But I have some questions:
            When you mean "this should never happen", this situation is a bug ?

            Yes, I have deploy 9 reps on 3 hosts(each has three reps) and I set four in nine as a group, and I do some tests about rm data and recovery.
            According to this situation, can I recognize this as the reps in group can't select a master because some meta data incomplete,like versions ?
            • 3. Re: the status of SN is always "STARTING"
              I would not say that it is a bug. The configuration is mixed up; you appear to have storage nodes that were configured for different stores participating in the same store. In this situation, the system can't be expected to work properly. Please remove all the files in the KVROOT directory on ALL the hosts, and start over with a clean state.