This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Feb 1, 2012 5:14 AM by Charles Lamb RSS

Network Partitions

user12003335 Newbie
Currently Being Moderated
I am unclear on how ONoSQL should behave in the event of a network partition.
Assuming a replica group is split between two data centers that ceased to communicate. Does the master replica stay where it is (even if it is on a network partition with only a minority of the replicas)? How does the "other" network partition "know" that it shouldn't elect a new master replica?

-- Gwen
  • 1. Re: Network Partitions
    Charles Lamb Pro
    Currently Being Moderated
    user12003335 wrote:
    I am unclear on how ONoSQL should behave in the event of a network partition.
    Assuming a replica group is split between two data centers that ceased to communicate. Does the master replica stay where it is (even if it is on a network partition with only a minority of the replicas)? How does the "other" network partition "know" that it shouldn't elect a new master replica?
    When a network partition occurs, a new election is held. A master is only elected if there is a majority of nodes available to elect one. Hence, on the minority side of a network partition, no master will be elected. The replicas on the minority side may continue to service read requests as long as the consistency properties passed into the requests by the client can be satisfied. e.g. Consistency.NONE requests could be satisfied, but Consistency.ABSOLUTE requests could not.

    Charles Lamb
  • 2. Re: Network Partitions
    Charles Lamb Pro
    Currently Being Moderated
    A colleague has added more details to what I said above:

    "If as a result of a network partition the "minority node partition" had a node in the master state, it will continue to remain in the master state but not be able to process durable writes (since it's not in communication with a simple majority) until the partition is resolved and it notices the presence of the new master. The master on the minority side does not call for an election. This is because the master cannot distinguish between a temporary disconnect of a node and a true network partition.

    A downside of having the node on the minority side continue to think that it's in the master state, is that it thinks it's absolutely consistent and as a result may respond incorrectly to read requests with time based or absolute consistency requirements.

    So it's desirable for the master to relinquish mastership and call for an election, when it's not in touch with a simple majority, that is, it's not authoritative, but we want to avoid doing so on temporary network disconnects, since there is a cost associated with holding an election.

    A solution we have discussed in the past, is for a non-authoritative master to call for an election, when it is consistently in this state for some configurable amount of time."

    Charles Lamb
  • 3. Re: Network Partitions
    user12003335 Newbie
    Currently Being Moderated
    To make sure I understand:

    If I have a 7 node replication group, and 4 crash, the remaining nodes will not elect a new master and will not be able to accept writes until I bring the crashed nodes back?
  • 4. Re: Network Partitions
    Charles Lamb Pro
    Currently Being Moderated
    user12003335 wrote:
    To make sure I understand:

    If I have a 7 node replication group, and 4 crash, the remaining nodes will not elect a new master and will not be able to accept writes until I bring the crashed nodes back?
    That's correct. Break the above into two scenarios: (1) the master was one of the 4 nodes that crashed, and (2) the master was one of the 3 surviving nodes. In case (1), there is no surviving master so no one to accept writes. Further, since there is not a majority, no master can be elected. In case (2), a master survives, but it will not be able to commit any write requests sent to it, assuming that all write requests specify a durability with a replica ack policy of Simple Majority or All. The above post is pointing out that in case (2), an election will not be held, but the mastership will remain in the minority.

    I hope this helps.

    Charles Lamb
  • 5. Re: Network Partitions
    524761 Journeyer
    Currently Being Moderated
    user12003335 wrote:

    If I have a 7 node replication group, and 4 crash, the remaining nodes will not elect a new master and will not be able to accept writes until I bring the crashed nodes back?
    Or, bring back at least one of the crashed nodes. As long as the total number of nodes that are up and communicating forms a majority (assuming the usual default "majority" configuration), you're back in business.
  • 6. Re: Network Partitions
    896774 Newbie
    Currently Being Moderated
    Hi Charles,

    Does that mean for this release no election will be held for non-authoritative master ?
    Thanks
  • 7. Re: Network Partitions
    896774 Newbie
    Currently Being Moderated
    Hi all,

    In such case, what do all others nodes and clients know about the master for that RG?
    Do they have de same master for that RG?
    Can the client know that the master on the minority is not the true master?
    I am wondering the time is sufficient to ask for new election.
    Election should be held when the other partition is already online or where all nodes communicating are in the majority. I think defining just a time will not be sufficient.
    Can you clarify?
    Thanks
  • 8. Re: Network Partitions
    Charles Lamb Pro
    Currently Being Moderated
    893771 wrote:
    Does that mean for this release no election will be held for non-authoritative master ?
    Correct. We have an SR open to provide for an election when the nodes notice they are in this state after some configurable amount of time.

    >
    In such case, what do all others nodes and clients know about the master for that RG?
    They know which node was, and still is, the master. If the master is a node in the minority group, they know which one is the master. Assuming clients can reach that node (i.e. they are not on the wrong side of the network split), then they will continue to send write requests to that node. Assuming the clients specify a durability of simple majority, and assuming that the master can still not reach a majority of the nodes, these write requests will be rejected (because the durability constraints can't be satisfied).
    Do they have de same master for that RG?
    There is only one master at any given time.
    Can the client know that the master on the minority is not the true master?
    Assuming the client specifies a durability of simple majority, and assuming that the master can not reach a majority of the nodes to commit the transaction, the write requests will be rejected. The client will still think that the node is a master because, after all, at some point the other nodes in the group may reappear and write requests will then succeed. Until there is an election, the node that is the master is the only master in the system that any other nodes know about.
    I am wondering the time is sufficient to ask for new election.
    Election should be held when the other partition is already online or where all nodes communicating are in the majority. I think defining just a time will not be sufficient.
    If other nodes come on line, and if those nodes coming on line form a majority, then writes will proceed and there will be no need for an election. The "fix" we are thinking of doing is that when the system notices that there is not a majority, then after some user-configurable time, an election will be called for. This is different from the case you mention where some of the missing nodes come back up. In that case, no election is necessary.

    Charles Lamb
  • 9. Re: Network Partitions
    906230 Newbie
    Currently Being Moderated
    Hi Charles,

    I am missing some thing. I don't understand why there will be no elections.
    The scenario is the following:
    The master is in minority partition. Why can't nodes in the majority partition ask for an election?
    They will see absence of heartbeat. So why can't we have one master on each partition?
    And if it is possible to have a master on each partition, can please re-answer to preceding questions?
    Thanks for your answer.
  • 10. Re: Network Partitions
    Charles Lamb Pro
    Currently Being Moderated
    user962305 wrote:
    Hi Charles,

    I am missing some thing. I don't understand why there will be no elections.
    The scenario is the following:
    The master is in minority partition. Why can't nodes in the majority partition ask for an election?
    They will see absence of heartbeat. So why can't we have one master on each partition?
    And if it is possible to have a master on each partition, can please re-answer to preceding questions?
    Yes, the nodes in the majority partition will hold an election. Yes, there would be two masters. However, assuming the clients request simple_majority for all requests, the master in the minority partition will not ack commits because the rep ack policy can't be met.

    The issue we mentioned earlier is that the nodes in the minority partition should call for an election, but do not. We believe that our fix would be to have them call for an election when they notice that they are in this state after a configurable amount of time.

    Charles Lamb
  • 11. Re: Network Partitions
    896774 Newbie
    Currently Being Moderated
    Thanks Charles.

    That's mean you will have to deal with conflict resolution in some cases (depending of configuration).
    From your point of view, will the client know the new master on the majority partition or the previous master on the minority partition?
  • 12. Re: Network Partitions
    Charles Lamb Pro
    Currently Being Moderated
    893771 wrote:
    Thanks Charles.

    From your point of view, will the client know the new master on the majority partition or the previous master on the minority partition?
    It would depend on how the partitioning affected the client, wouldn't it?

    Charles Lamb
  • 13. Re: Network Partitions
    906230 Newbie
    Currently Being Moderated
    I agree with you.
  • 14. Re: Network Partitions
    906230 Newbie
    Currently Being Moderated
    Hi Charles

    => The client will still think that the node is a master because, after all, at some point the other nodes in the group may reappear and write requests will then succeed.

    Is that sure? Since master in the minority will rollback data not send to replica

    => That's mean you will have to deal with conflict resolution in some cases (depending of configuration).
    Is that sure? Since master in the minority will rollback data not send to replica

    Thanks for your update
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points