8 Replies Latest reply: Jun 27, 2012 5:53 AM by 945795 RSS

    State of data nodes

    896774
      Hi all,

      Does the client (and other nodes) maintain current the state of all nodes?
      If yes, which information and how do they know such information?
      Thanks
        • 1. Re: State of data nodes
          Charles Lamb
          Yes, the client knows about the topology of the rep groups and rep nodes in the rep groups. That information is initialized when the client first makes contact with one (any) of the nodes in the system on the first request. It is kept updated through the values returned from any calls the client makes to any rep nodes.

          This information includes (among other things): topology (partition -> Rep Groups), rep node state (e.g. master, replica), replica lag behind the master, average response time.

          Charles Lamb
          • 2. Re: State of data nodes
            906230
            Hi Charles,
            That information is initialized when the client first makes contact with one (any) of the nodes in the system on the first request.
            1) If that information is initialized when the client first makes contact with one of the nodes. How does the client do to contact that node if it does not know?
            It is kept updated through the values returned from any calls the client makes to any rep nodes.
            2) Each time there is an answer to a request, the node will send those information in addition to answer? Is that good for latency?

            3) Is that information centralized anywhere? Who is responsible to keep current the topology? How can we be sure all nodes have the same information?
            This information includes (among other things): topology (partition -> Rep Groups), rep node state (e.g. master, replica), replica lag behind the master, average response time.
            4) "replica lag behind the master" is calculated per record per node per primary key ? How can this information be calculated?
            Thanks.
            • 3. Re: State of data nodes
              Charles Lamb
              user962305 wrote:
              Hi Charles,
              That information is initialized when the client first makes contact with one (any) of the nodes in the system on the first request.
              1) If that information is initialized when the client first makes contact with one of the nodes. How does the client do to contact that node if it does not know?
              It must know the ipaddr/name:port of at least one node in the system. It only has to make contact with some node to start up.

              >
              It is kept updated through the values returned from any calls the client makes to any rep nodes.
              2) Each time there is an answer to a request, the node will send those information in addition to answer? Is that good for latency?
              Correct. It is fine for latency. The changes to topology are very infrequent. The size of the topology changes are relatively small in any case so latency would not be an issue. If the size were large (and it's not), then throughput might be an issue. But it is not.
              3) Is that information centralized anywhere? Who is responsible to keep current the topology? How can we be sure all nodes have the same information?
              Yes. Have you read about the "admin" process?

              >
              This information includes (among other things): topology (partition -> Rep Groups), rep node state (e.g. master, replica), replica lag behind the master, average response time.
              4) "replica lag behind the master" is calculated per record per node per primary key ? How can this information be calculated?
              It is per node in the rep group. It is calculated by VLSNs, a topic which is too complex for me to go into here.

              Charles
              • 4. Re: State of data nodes
                896774
                Charles,

                You said the information about topology is centralized. Where is that information centralized? At the client? At the master only for per replication group? Where do the admin process resides? I have never heard about that process.
                • 5. Re: State of data nodes
                  Charles Lamb
                  893771 wrote:
                  You said the information about topology is centralized. Where is that information centralized? At the client? At the master only for per replication group? Where do the admin process resides? I have never heard about that process.
                  The admin is a replicated process backed by a replicated database (residing on Rep Nodes). Yes, it contains the topology. The topology is cached on the client and each of the rep nodes.

                  Charles Lamb
                  • 6. how to remotely connect YCSB to nosqldb
                    945795
                    Hi,

                    i created kstore on 3 nodes,2 Replica with 200 Partition,so can you Please tell me that how to remotely connect ycsb to 3 Nodes setup?
                    • 7. Re: how to remotely connect YCSB to nosqldb
                      Charles Lamb
                      Abu Taiyyab wrote:
                      i created kstore on 3 nodes,2 Replica with 200 Partition,so can you Please tell me that how to remotely connect ycsb to 3 Nodes setup?
                      I'm not sure I understand the question. You always connect to any NoSQL Database cluster the same way: you supply the name of the KVStore, and one or more hostname:port pairs any of the underlying rep nodes.

                      Charles Lamb
                      • 8. Re: how to remotely connect YCSB to nosqldb
                        945795
                        Hi Charles,

                        We have created a setup of 3 storage nodes kvstore (replication factor 3 and one replication group). We wish to run the YCSB tool on this setup from an external node which is not a part of this setup. Could you guide us in how to achieve this? i.e. any changes to config file etc.