I recently finished watching a tremendous free distributed systems course. The lecturer is a famous Dr. Martin Kleppmann who wrote a well-known book with the red wild boar cover 😄
This course includes 8 lectures split into 23 videos:
- Models of distributed systems
- Time, clocks, and ordering of events
- Broadcast protocols and logical time
- Replica consistency
- Case studies
Martin made a great introduction to distributed systems. He starts with basic concepts and terms definition. Martin explains why it's important to understand how these systems work nowadays and where they're used.
Then, he switches to the main issues related to distributed systems and describes 2 problems:
- The two generals problem
- The Byzantine generals problem
Martin also explains what the system model is and how it's related to faults that can occur. The distributed systems algorithms help to deal with it. He came to the conclusion for distributed systems we need to pick one statement from each part:
- Network: reliable, fair-loss, or arbitrary
- Nodes: crash-stop, crash-recovery, or Byzantine
- Timing: synchronous, partially synchronous, or asynchronous
In the "Time, clocks, and ordering of events" lecture, Martin explains the monotonic clocks and the clock synchronization process. He also shows why we cannot rely on the physical clock in distributed systems.
The "Broadcast protocols and logical time" lecture is about the difference between physical and logical clocks (Lamport and Vector clocks), and broadcast algorithms (FIFO broadcast, Causal broadcast, Total order broadcast, and FIFO-total order broadcast).
The 5th lecture explains "Replication". Martin shows what idempotency means in distributed systems and how to make the system idempotent. He also describes the quorum and replication mechanism.
The next part is all about "Consensus" and related algorithms.
The two best-known consensus algorithms are Paxos and Raft. In its original formulation, Paxos provides only consensus on a single value, and the MultiPaxos algorithm is a generalisation of Paxos that provides FIFO-total order broadcast. On the other hand, Raft is designed to provide FIFO-total order broadcast “out of the box”.
The "Replica consistency" chapter explains the two-phase commit (2PC), linearizability, and eventual consistency model. Martin explains related algorithms and limitations based on the CAP theorem.
The last part ("Case studies") is important because it shows real examples of distributed systems and how they deal with mentioned issues. Martin describes the conflict-free replicated data types (CRDTs) that help to manage the case when several concurrent writes to the same object need to be integrated into a single final state. He also shows the logic that stays behind Google’s Spanner database.
It's a definitive lore gem. Martin explains everything in detail so everyone can understand the major algorithms and rules that stay behind distributed systems.
Of course, it's not so easy to remember all of these concepts, but after finishing the course you will get a good understanding of distributed systems and how they work. You also can always return back and refresh your knowledge if it's needed. The great thing is this knowledge (as well as Martin's most popular book) is fundamental. Thanks, Martin for making it publicly available ❤️!
Lecture videos: https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB
P.S. Interestingly Martin recommends reading the “Distributed Systems” book by van Steen & Tanenbaum. That's something I'm currently into. I started participating in a "Code of Architecture" book reading class organized by an architecture team of one of the biggest tech banks. They run a weekly call with some guests or internal architects where they discuss a particular chapter from the book they're currently reading. Alexander Polomodov (Director of their digital ecosystem development department) also has a tech blog on Medium where he publishes his book reviews. As for me, it's a great format that helps me to keep reading pace and to be a motivated learner. It also allows me to understand better the book material based on real examples and discussions.