by

Backend In the Frontend: Implementing Raft in JS

Background

Lately I’ve been studying consensus algorithms to bolster my understanding of distributed systems. Consensus algorithms achieve agreement on data that is replicated across many nodes.

Consider an online store. At any time, the store has a state that is defined by all of its transactions. The store keeps a transaction log that can be used to recreate that state. The log is replicated across multiple servers so that the store is robust against server failures. A single machine may go down at any time. But when it comes back online, it can use the log to recreate the store’s state.

Before new transactions can be added to the log, the machines must reach consensus on them. If each machine didn’t check with the others before adding something to the log, then their copies of the log would drift apart. Eventually, this would result in adverse behavior like the same widget being sold twice or a successful sale getting forgotten.

There are many consensus algorithms out there. I started with Paxos, since I knew that it powered Spanner. After reading through a whitepaper on implementing Paxos, I found it to be powerful but also relatively enigmatic for a distributed systems layperson.

As it turns out, I was not alone in my confusion. Two students at Stanford who shared my frustration with Paxos’s complexity created a simpler consensus algorithm called Raft. Raft breaks its algorithm down into more discrete steps that could be easily understood. I decided that it would be fun and helpful for my understanding to implement Raft in JavaScript using their whitepaper. Note that the Raft page has a different implementation made by the paper’s author.

This post doesn’t go into depth on how Raft works, but if you’re interested, I recommend checking out the Raft site. Instead, it displays the simulation I made and describes the underlying code.

TL;DR: The Product

Three nodes sit inside of an SVG graphic. Even though Raft would usually be implemented with five nodes, I chose three to optimize simplicity over robustness (remember folks, it’s not real). Raft can operate with just two nodes, but that assumes neither will ever go down.

At initialization, the nodes have no leader. When a leader is finally chosen via a timer-based election, it will remain in power unless manually turned down (i.e. clicked). In the absence of any new data, the leader will reaffirm its reign with the other nodes by periodically sending heartbeats to the other nodes.

Meanwhile, a client perpetually fades in and out of the edges of the graphic, periodically sending new data to the system. The client only speaks to the leader in this simulation, but Raft allows the client to speak any node and be redirected to the leader.

When the client sends new data, the leader persists and attempts to replicate it on the other nodes. Once a majority of nodes report back that they have replicated the data, the leader will commit it. At this point, the leader could report success back to the client, but that is not implemented.

Clicking a node will turn it down until it is clicked again. This allows you to interact with the simulation and explore some specific states. You can also pause to drop (click on) messages.

What I Learned

While reading the white paper can give a decent understanding of Raft, implementing the algorithm crystalizes its nuances. In addition, faking components of the networking stack also made me think a lot more about building blocks that I normally take for granted, like request-response protocols.

Observing Values Purged From a Replica’s Log

One of the most complex situations in Raft is when a replica has a set of log entries that differ from the the committed entries at those indices. This happens in the following scenario:

  1. The leader receives some new values but is not able to replica them on a majority of replicas. So, the values are uncommitted.
  2. The leader goes down.
  3. A new leader is chosen, and it commits a handful of values. Now, at some indices i through k, the old leader has bogus values.
Replica 2 recorded a few values as leader but was not able to commit them. After it was turned down, Replica 0 committed a bunch of new values.

When the old leader comes back online, it will attempt to maintain its leadership reign. Instead, it will learn about the new leader, purge all of its outdated values, and replace them with the committed values. Watching this in action is surprisingly satisfying!

Replica 2 has been corrected. It purged the outdated values and replaced them with Replica 0’s values.

Raft Is For Fault-Tolerance, Not Performance

At first glance, replacing a single machine with a Raft consensus network may seem like a way to boost an application’s performance. In fact, it is the exact opposite: Raft adds more latency, CPU, and storage in exchange for robustness. In the Raft algorithm, only the leader can serve both reads and writes. And, since leader must commit a value before returning success to the client, responses are slowed. The CPU and storage come from the extra machines at work.

Raft Only Works Because of Trust

As I was implementing the algorithm, I thought about how odd it was that the leader would step down from power as soon as it had heard about a leader with a more recent election term. The leader respects this information no matter who tells it. Raft actually guarantees that the leader with the most recent election term (i.e. highest term number) is the true leader.

Raft won’t work without assurance that all of the replicas can be trusted. Algorithms for networks that may include rogue nodes are discussed via the Byzantine General’s Problem. Interestingly, both Paxos (precursor to Raft) and the Byzantine General’s Problem were invented by Leslie Lamport. Consensus algorithms without trust are considerably more complex than Raft, and they power peer-to-peer systems like the famous Bitcoin blockchain. (For those interested, Bitcoin is powered more specifically by a Proof-of-Work system called Hashcash).

Difficulties in Testing Correctness

After just a few hours of work, I had an implementation of Raft that appeared to be “mostly correct”. But, I quickly realized that without extensive unit test coverage (which I was not interested in writing for a learning experience), it was nearly impossible to determine whether I had achieved real correctness. As I watched the simulation, I would sometimes observe odd behavior and note for later investigation. After checking many of the edge-cases in the system using a combination of the pause button and the turn-down features, I believe I have handled most or all of them correctly.

Avoiding Derivative State

Thankfully, the Raft paper describes the necessary state that each replica must hold to function. To avoid a messy soup of instance variables, I derived many other “states” from these variables. For example, state like isLeader can be derived from the bare bones information.

High-Level: The Architecture

The simulation uses object-oriented programming to represent the distributed system. The Replica class handles all logic related to a single node, and hence the bulk of the algorithm. Replicas are responsible for rendering themselves, handling incoming messages, and sending out messages. They also push updates to the TableUpdater class, which reflects those updates in the table.

The Message class is responsible for rendering a message, determining its component velocities, and marking itself dropped when clicked. Messages are created using factories to avoid passing repetitive parameters into each message constructor. The parameters are dynamically determined by simulation-wide constants (FPS, simulation size, etc).

Once a message is in-flight, it is handled by a singleton MessageManager. The MessageManager is responsible for moving messages, delivering messages, and removing dropped messages.

A singleton ClientManager creates and destroys Clients. The ClientManager is given an average client lifetime, a rate of client creation, and a maximum number of clients at any given time. I actually determined that one client was enough for the simulation, so some of these features are unused. Once a client is created, it periodically sends new data to the Raft replicas.

The Animation Loop

Th simulation uses a standard animation loop [1]. It is updated frame-by-frame at a specified FPS. The simulation continuously polls the browser for new animation frames and handles them at the correct rate. Code (see original):

var FPS = 60;
var interval = 1000 / FPS;

function draw() {
    window.requestAnimationFrame(draw);

    var now = new Date().getTime();
    if (now - time > interval) {
        time = now;
        // DO STUFF HERE (ex: myObj.handleFrame())
    }
};

$(document).ready(function() {
    window.requestAnimationFrame(draw);
})

The Messaging Layer

In the real world, the nodes in a distributed system would communicate over an established messaging protocol like RPC. And, they would sit on top of networking technologies that would route requests and responses to the right addresses. In the simulation, this all needs to be faked. The entities (clients or nodes) in the animation need to have unique addresses, and they need to send messages over an agreed-upon protocol.

Establishing Addresses

Each node’s “address” is its coordinates in the graphic. When an entity sends a message to another, the message determines its direction by breaking a predetermined speed (scalar) into x and y component vectors. Code (original here):

getComponentVelocities(sender, receiver) {
    var dx = receiver.x - sender.x;
    var dy = receiver.y - sender.y;
    var theta = Math.atan(Math.abs(dy / dx));

    var v = this.v + (Math.random() * this.jitter);
    var vx = v * Math.cos(theta);
    var vy = v * Math.sin(theta);
    if (dx < 0) { vx *= -1; }
    if (dy < 0) { vy *= -1; }
    return [vx, vy];
}

In order to add realistic randomness, jitter is added to each message’s velocity. This reduces the chances that messages arrive in the exact same frame.

Delivering Messages

On every frame, the MessageManager deletes dropped messages, moves in-flight messages forward, and delivers arrivals. A message has arrived when its overlaps with the recipient.

One interesting quirk here is that delivering a message can actually result in a new in-flight message. A naive message manager would iterate over its messages and remove arrivals. That will not work, since its messages can actually be altered during that process. In a real, multi-threaded backend, the messages would be a global variable protected by a mutex, and this naive implementation would result in deadlock.

Fortunately, we can get around the deadlock by storing the current messages in a temporary variable before handling them. Code (see original):

var tmpMessages = this.messages.slice(0);
this.messages = [];
tmpMessages.forEach(function(msg) {
    var receiver = this.receivers[msg.receiver];
    if (receiver.containsMessage(msg)) {
        receiver.handleMessage(msg);
        msg.cleanup();
    } else {
        inFlightMessages.push(msg);
    }
}.bind(this));

// Concatenation is essential, since this.messages may have
// been appended to since the reset above.
this.messages = this.messages.concat(inFlightMessages);

Handling Arrivals

Instead of inventing a new protocol, we can fake one with JS features. Each message has its own class that extends the base Message class (ex: AppendEntriesRequest). When an entity receives a message, it can use the message’s constructor to determine how to handle it. Code:

    handleMessage(msg) {
        switch (msg.constructor) {
            case RequestVoteRequest:
                this.handleRequestVoteRequest(msg);
                break;
             // More cases left out for brevity.
            default:
                break;
        }
    }

Correlating Responses With Their Respective Requests

Another messaging challenge I faced was that responses were totally disconnected with their respective requests. In most messaging frameworks (RPC, AJAX, etc), the caller has request information handy when processing the response. My solution was hacky but sufficient. Each node stores a map from request ID to request, and responses contain the request ID. Code:

this.pendingRequests = {};

sendRequest() {
  var msg = new Message();
  this.pendingRequests[msg.id] = msg;
  this.messageManager.schedule(msg);
}

handleResponse(msg) {
  var request = this.pendingRequests[msg.requestId];
  // Do some processing...
  delete this.pendingRequests[msg.requestId];
}

Deployment

The simplest way to share this project was via iframe. I exposed the simulation through Github Pages, and you can find the iframe’s link here.

The code uses ES6 modules for organization. I learned that many minifiers can’t operate on multiple JS files or a single file made from modules. Ultimately, I used an npm module called rollup to combine the JS files and terser to minify the result. Before pushing a new version, I update the minified JS with a small bash script:

#!/bin/bash
rollup main.js --file bundle.min.js --format iife
terser bundle.min.js -m -c -o bundle.min.js

Further Work

A number of further features could be added to the simulation:

  • Implement group membership changes (nodes leaving and entering)
  • Responding to the client when its data has been committed
  • Giving the client the ability to perform reads and not just writes
  • Implementing snapshotting

Conclusion

After spending over a year writing exclusively C++, it was fun to combine thinking about the backend with coding in the frontend. Implementing from a whitepaper was also new for me, and the work helped me understand the material on a deeper level. Hopefully this visualization will help some folks learn about consensus!

[1] I learned about the easiest way to make a JS animation loop here.

Write a Comment

Comment