Failure and Consensus Introduction
Consensus is among the most fundamental problems in distributed systems.
As an example, consider Dropbox and other file sharing services. Their goal is for the files on my laptop, their cloud server, and any other machines I own to be the same. That is, they want consensus on what files are there and what contents they have.
There are various ways to define consensus and lots of consensus algorithms, some tuned to specific problems. Consensus is easy to achieve when everything works perfectly. Consequently, good algorithms focus on consensus in the presence of failure.
In this lesson, we’ll first consider transactions which allow closely-coordinated processes to achieve consensus on values about failure itself and maintain a consistent state.
Transactions are too expensive for most distributed systems where recognizing failure and handling it correctly requires different techniques.
Beyond outright failure, synchronizing values is an interesting problem. We’ll look at Paxos as an example of a consensus algorithm.
Synchronization when some actors are malicious is an important problem known as the Byzantine Generals Problem. Bitcoin is an example of how Byzantine faults can be overcome.
Lesson Objectives
After completing this lesson, you should be able to
- Describe the relationship between failure and consistency.
- Understand ACID properties and the levels of isolation in transactions.
- Explain the limitations of classic transactions in distributed systems and the options for handling failure.
- Describe the Paxos algorithm, understand how it achieves consensus, and how it might fail.
- Describe Byzantine failure. Understand how the blockchain is designed to overcome byzantine failure.
Required Reading/Viewing
- Failure and Consensus Links to an external site. (Lesson Slidedoc) (PDF Download PDF)
- ACID Links to an external site. from Wikipedia
- Isolation Links to an external site. from Wikipedia
- Starbucks Does Not Use Two-Phase Commit Download Starbucks Does Not Use Two-Phase Commit (PDF) by Gregor Hohpe
- Paxos Made Simple Download Paxos Made Simple (PDF) by Leslie Lamport
- Paxos By Example Links to an external site. by Angus MacDonald
- Byzantine fault tolerance Links to an external site. from Wikipedia
- How the Bitcoin protocol actually works Links to an external site. by Michael Nielsen
Additional Resources
- The Transaction Concept: Virtues and Limitations Links to an external site. (PDF) by Jim Gray
- How to GET a Cup of Coffee Links to an external site. - also assigned in Lesson 5: APIs
- The Part-Time Parliament Links to an external site. (PDF) by Leslie Lamport
-
Lecture 10. Unit 2 Paxos-Algorithm
Links to an external site.
by Seif Haridi
- Bitcoin: A Peer-to-Peer Electronic Cash System Links to an external site. (PDF) by Satoshi Nakamoto
- A Social Operating System Links to an external site. by Stephan Tual