Professional Documents
Culture Documents
Redis Cluster Specification-23
Redis Cluster Specification-23
Case 2: When only a minority of masters have flagged a node as FAIL, the slave
promotion will not happen (as it uses a more formal algorithm that makes sure
everybody knows about the promotion eventually) and every node will clear
the FAIL state as per the FAIL state clearing rules above (i.e. no promotion after N
times the NODE_TIMEOUT has elapsed).
The FAIL flag is only used as a trigger to run the safe part of the algorithm for the
slave promotion. In theory a slave may act independently and start a slave promotion
when its master is not reachable, and wait for the masters to refuse to provide the
acknowledgment if the master is actually reachable by the majority. However the added
complexity of the PFAIL -> FAIL state, the weak agreement, and the FAIL message
forcing the propagation of the state in the shortest amount of time in the reachable part
of the cluster, have practical advantages. Because of these mechanisms, usually all the
nodes will stop accepting writes at about the same time if the cluster is in an error state.
This is a desirable feature from the point of view of applications using Redis Cluster.
Also erroneous election attempts initiated by slaves that can't reach its master due to
local problems (the master is otherwise reachable by the majority of other master
nodes) are avoided.