This discussion is archived
8 Replies Latest reply: Jun 27, 2012 3:53 AM by 945795 RSS

State of data nodes

896774 Newbie
Currently Being Moderated
Hi all,

Does the client (and other nodes) maintain current the state of all nodes?
If yes, which information and how do they know such information?
Thanks
  • 1. Re: State of data nodes
    Charles Lamb Pro
    Currently Being Moderated
    Yes, the client knows about the topology of the rep groups and rep nodes in the rep groups. That information is initialized when the client first makes contact with one (any) of the nodes in the system on the first request. It is kept updated through the values returned from any calls the client makes to any rep nodes.

    This information includes (among other things): topology (partition -> Rep Groups), rep node state (e.g. master, replica), replica lag behind the master, average response time.

    Charles Lamb
  • 2. Re: State of data nodes
    906230 Newbie
    Currently Being Moderated
    Hi Charles,
    That information is initialized when the client first makes contact with one (any) of the nodes in the system on the first request.
    1) If that information is initialized when the client first makes contact with one of the nodes. How does the client do to contact that node if it does not know?
    It is kept updated through the values returned from any calls the client makes to any rep nodes.
    2) Each time there is an answer to a request, the node will send those information in addition to answer? Is that good for latency?

    3) Is that information centralized anywhere? Who is responsible to keep current the topology? How can we be sure all nodes have the same information?
    This information includes (among other things): topology (partition -> Rep Groups), rep node state (e.g. master, replica), replica lag behind the master, average response time.
    4) "replica lag behind the master" is calculated per record per node per primary key ? How can this information be calculated?
    Thanks.
  • 3. Re: State of data nodes
    Charles Lamb Pro
    Currently Being Moderated
    user962305 wrote:
    Hi Charles,
    That information is initialized when the client first makes contact with one (any) of the nodes in the system on the first request.
    1) If that information is initialized when the client first makes contact with one of the nodes. How does the client do to contact that node if it does not know?
    It must know the ipaddr/name:port of at least one node in the system. It only has to make contact with some node to start up.

    >
    It is kept updated through the values returned from any calls the client makes to any rep nodes.
    2) Each time there is an answer to a request, the node will send those information in addition to answer? Is that good for latency?
    Correct. It is fine for latency. The changes to topology are very infrequent. The size of the topology changes are relatively small in any case so latency would not be an issue. If the size were large (and it's not), then throughput might be an issue. But it is not.
    3) Is that information centralized anywhere? Who is responsible to keep current the topology? How can we be sure all nodes have the same information?
    Yes. Have you read about the "admin" process?

    >
    This information includes (among other things): topology (partition -> Rep Groups), rep node state (e.g. master, replica), replica lag behind the master, average response time.
    4) "replica lag behind the master" is calculated per record per node per primary key ? How can this information be calculated?
    It is per node in the rep group. It is calculated by VLSNs, a topic which is too complex for me to go into here.

    Charles
  • 4. Re: State of data nodes
    896774 Newbie
    Currently Being Moderated
    Charles,

    You said the information about topology is centralized. Where is that information centralized? At the client? At the master only for per replication group? Where do the admin process resides? I have never heard about that process.
  • 5. Re: State of data nodes
    Charles Lamb Pro
    Currently Being Moderated
    893771 wrote:
    You said the information about topology is centralized. Where is that information centralized? At the client? At the master only for per replication group? Where do the admin process resides? I have never heard about that process.
    The admin is a replicated process backed by a replicated database (residing on Rep Nodes). Yes, it contains the topology. The topology is cached on the client and each of the rep nodes.

    Charles Lamb
  • 6. how to remotely connect YCSB to nosqldb
    945795 Newbie
    Currently Being Moderated
    Hi,

    i created kstore on 3 nodes,2 Replica with 200 Partition,so can you Please tell me that how to remotely connect ycsb to 3 Nodes setup?
  • 7. Re: how to remotely connect YCSB to nosqldb
    Charles Lamb Pro
    Currently Being Moderated
    Abu Taiyyab wrote:
    i created kstore on 3 nodes,2 Replica with 200 Partition,so can you Please tell me that how to remotely connect ycsb to 3 Nodes setup?
    I'm not sure I understand the question. You always connect to any NoSQL Database cluster the same way: you supply the name of the KVStore, and one or more hostname:port pairs any of the underlying rep nodes.

    Charles Lamb
  • 8. Re: how to remotely connect YCSB to nosqldb
    945795 Newbie
    Currently Being Moderated
    Hi Charles,

    We have created a setup of 3 storage nodes kvstore (replication factor 3 and one replication group). We wish to run the YCSB tool on this setup from an external node which is not a part of this setup. Could you guide us in how to achieve this? i.e. any changes to config file etc.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points