pattern of distributed system 读书笔记-Overview of the Patterns
Overview of the Patterns
1Keeping Data Resilient on a Single Server
1.1This property is important for resilience in a single node scenario but, as we’ll see, it’s also very valuable for replication. If multiple nodes start at the same state, and they all play the same log entries, we know they will end up at the same state too.
1.2Databases use a Write-Ahead Log, as discussed in the above example, to implement transactions.
2Competing Updates
2.1One of the most straightforward approaches is Leader and Followers, where one of the nodes is marked as the leader, and the others are considered followers.
3Dealing with the Leader Failing
3.1A heartbeat is a regular message sent between nodes, just to indicate they are alive and communicating
3.2Heartbeat does not necessarily require a distinct message type. When cluster nodes are already engaged in communication, such as when replicating data, the existing messages can serve the purpose of heartbeats.
3.3As long as a Majority Quorum—that is, a majority—of the nodes in the cluster have successfully replicated the log messages,
4Multiple Failures Need a Generation Clock
4.1Every request is sent to cluster nodes, along with the generation clock. So every node can always choose the requests with the higher generation and reject the ones with the lower generation
5Log Entries Cannot Be Committed until They Are Accepted by a Majority Quorum
5.1When the log is large, moving the log across nodes for leader election can be costly
5.2The most commonly used algorithm for Replicated Log, Raft [Ongaro2014],
optimizes this by electing the leader with the most up-to-date log
6Followers Commit Based on a High-Water Mark
6.1the High-Water Mark is maintained by the leader and is equal to the index of the latest update to be committed.
6.2The leader then adds the High-Water Mark to its HeartBeat.
6.3Whenever a follower receives a HeartBeat, it knows it can commit all its log entries up to the High-Water Mark.
7Leaders Use a Series of Queues to Remain Responsive to Many Clients
7.1Each entry on the Write-Ahead
Log needs to be written and processed in full before we start to write another
7.2use a Singular Update Queue. Most programming languages these days will have some form of in-memory queue object that handles requests from multiple threads.
7.3When the leader receives the request, it adds a callback to Request Waiting List that will contain the behavior of notifying Client/Node when the request succeeds or fails
7.4idempotency(幂等性):The server can then use client ID and unique request number to store the responses of the executed requests.
8Followers Can Handle Read Requests to Reduce Load on the Leader
8.1A way to obtain this is to use Versioned Value, storing a version number with each stored record.
8.2Distributed databases such as MongoDB and CockroachDB use Hybrid Clock
to set a version in the Versioned Value to provide this consistency
8.3In Raft followers all need to ensure
their High-Water Mark is the same as the leader before replying to requests
8.4kafka:If the read request is handled
by a follower, it needs to check that it has that log index, similar to the use
of Versioned Value discussed above.
9A Large Amount of Data Can Be Partitioned over Multiple Nodes
9.1key requirements for any partitioning scheme
9.1.1Data should be evenly distributed across all the cluster nodes.
9.1.2It should be possible to know which cluster node stores a particular data record, without making a request to all the nodes.
9.1.3It should be quick and easy to move part of the data to the new nodes.
9.2A typical partitioned cluster can be hundreds or thousands of logical partitions.
10Partitions Can Be Replicated for Resilience
10.1Replicating a partition is just like replicating unpartitioned data, so we use the same patterns for replication that we discussed earlier, centered around Replicated Log
10.2Three or five replicas strikes a good balance between failure tolerance and performance.
11A Minimum of Two Phases Are Needed to Maintain Consistency across Partitions
11.1have to maintain consistency not just between multiple replicas of the same data, but also between the different partitions.
11.2Consistency across partitions can be solved by using Two-Phase Commit
12In Distributed Systems, Ordering Cannot Depend on System Timestamps
12.1We can use a Lamport Clock to track the order of requests across cluster nodes without relying on system timestamps. The trick is to use a simple integer counter per node, but pass it along in requests and responses from and to clients.
12.1.1One of the issues with basic Lamport Clock is that versions are tracked by integers with no relation to actual timestamps
12.2Hybrid Clock, which combines clock time with a Lamport Clock
12.3Clock-Bound Wait, which waits before storing the value long enough for all of the cluster’s nodes’ clocks to advance beyond the one assigned to the records.
12.3.1Most open source databases use a technique called as “read restart,” as discussed in the section “Read Restart” of the Clock-Bound Wait pattern, to avoid waiting in the write operations.
12.3.2Google developed TrueTime17 in its data centers to guarantee that the clock skew is no more than 7 ms.
12.3.3AWS has a library called Clock Bound which has a similar API to give clock skew across cluster nodes
13A Consistent Core Can Manage the Membership of a Data Cluster
13.1Therefore the management of a large cluster is given to a Consistent Core—a small set of nodes whose responsibility is to manage a larger data cluster.
13.2The Consistent Core tracks the data cluster membership with Lease and State Watch
13.3Examples of such products are Apache ZooKeeper and etcd, which are often utilized as Consistent Core components.
14Gossip Dissemination for Decentralized Cluster Management
14.1tolerate some inconsistency in the cluster metadata provided it converges quickly.
14.2A disease spreads very quickly even if each person comes into contact with only a few individuals at random. An entire population can become infected from very few interactions. Gossip Dissemination is based on these mathematical models. If nodes send information to a few other nodes regularly, even a large cluster will get that information quickly
14.3Gossip Dissemination is a commonly used technique for information dissemination in large clusters. Products like Cassandra, Akka, or Consul use it
14.4One major limitation of gossip protocols is that they only provide eventual consistency
14.5Similar to Consistent Core, an Emergent Leader takes decisions on behalf of the cluster.