Eliminating the data ping-pong for simple, efficient data transforms.

ByAlexander GallegoonJune 24, 2021

Understanding the Redpanda Wasm engine architecture

Heads up: Some of the information in this post may now be inaccurate or out of date. While we update it, check out what's new with Wasm.

Introduction

~60% of stream processing is spent doing mundane transformation tasks like format unification for ML workloads, filtering for privacy, simple enrichments like geo-ip translations, etc.

Yet the baseline complexity to stand up something “simple” often involves three or four distributed systems, multiple nights reading configurations and man-pages, and a few too many shots of espresso to start seeing the future. And once you are done, you end up ping-ponging the data back and forth between storage and compute, when all you had to do was to remove a field out of a JSON object.

To the data engineer, it feels like an endless game of system whack-a-mole just to start to do the interesting work of actually understanding the data.

classic ping pong

For the last year we have been quietly building Redpanda Transforms to eliminate the data ping-pong for simple scrubbing, cleaning, normalizations, etc. Essentially, it’s a way to tie Policy to Code and Data, for scalable, super low-latency transformations. Redpanda Transforms build on the V8 engine from Chrome, the same engine that powers NodeJS, added to the Redpanda distribution in two forms - embedded and sidecar - to provide an alternative to the classical data back-and-forth that happens in the enterprise when trying to make sense of real-time streaming data.

detailed architecure

What JavaScript did for the web in the late 90’s is what WebAssembly, aka Wasm, can do for server-side applications. JavaScript allowed developers to turn static content into the immersive web experiences of today, fundamentally changing the web by shipping code to the user’s computer. Similarly, Wasm empowers the engineer to transform Redpanda, by shipping computational guarantees (code) to the storage engine. Codifying business practices like GDPR compliance or custom on-disk encryption mechanics, with near native-level performance at runtime.

In brief, Redpanda Transforms allows the developer to tell Redpanda what piece of code (.wasm or .js files) can run on what data (topics), on what location, at a specific time (on event triggers) in an auditable fashion. That is, the Redpanda controller can tell you exactly what code is touching what piece of data, on what computers, and at what offsets. The most exciting part of us sharing this feature is that Redpanda Transforms are largely complementary to the existing tooling and frameworks you already have in the enterprise. It simply inverts the relation of shipping data to compute by shipping code to data.

code to data

Architecture overview

The architecture of Redpanda Transforms is composed of 3 important components:

Client side tooling & frameworks
Storage and distribution of scripts
Execution engines

1. Client-side tooling

Redpanda Transforms and the materialized streams they create are all just standard Kafka-API level constructs (compacted topics for storage, Produce-Request for deploying a new script, etc), which means all of your standard tooling just works out of the box! To get started however, there are a few helpers that we put inside rpk to make the developer experience easy.

rpk wasm generate tiktok will generate a getting started template. You don’t need to use the scaffolding, but rpk makes it trivial to deploy and update arbitrary computations.

The most important mental model for the programmer is the idea of a batch-in-batch-out API:

// convert all a-z to upper case A-Z
//
engine.processRecord((recordBatch) => {
  const result = new Map();
  const transformedRecord = recordBatch.map(({ header, records }) => {
    return {
      header,
      records: records.map(uppercase),
    };
  });

  // destination topic
  //
  result.set("uppercase_result_topic", transformedRecord);
  return Promise.resolve(result);
});

2. Storage and distribution of scripts

Redpanda at its core is a linearizable, transactional storage engine that uses the Raft protocol for data replication. It has a Kafka API translation layer that we codegen at compile time, from a set of manifest files to be able to decode user requests. The storage engine supports the Kafka API in full, which also includes compacted topics.

The storage mechanism for all .wasm and .js scripts are simply a compacted topic. Each script has a name that you give it through an argument as part of the deployment via rpk, for example rpk wasm deploy --name=”allcaps” main.js. A compacted topic will store all the scripts deployed with a last-writer-wins conflict resolution, as is common for most compacted topics. To delete a script you simply emit the same key (--name=allcaps) with an empty value to the compacted topic. This is exposed to the user via rpk wasm remove --name=allcaps as per the example above.

For those of us new to compacted topics, think of them as a table of key-value pairs. After a compaction run, only the last copy of the key-value pair will be kept. Others will be thrown away. This is typically used in Wall St. where you continually ask the value for a particular ticker symbol and you don’t care about the historical prices. You just want the value right now (last known).

compaction topic

The storage mechanics not only give us a distribution of content, but also auditability. At any point in time we can say exactly what scripts are running in the system, on what computers, for what input topics, and we can revoke them and inspect them at runtime! Imagine pairing this mechanism with an out-of-band attestation framework that asserts that the code that touches your most critical data is in fact the actual code that is running against your data. This is typically where the CIO says, "Hold on a second…"

3. The execution engine(s)

There are 2 types of engines inside Redpanda. The first is for synchronous functions that are in the hot request-response path. The other is for stateful transformations that are always asynchronous.

side by side engine comparison

The synchronous transformation engine is exactly what you would expect. A record comes in and, before it hits disk, you apply a function and the result is stored on disk. These kinds of transformations are typically done for pure function mapping or enrichments. Examples include GDPR compliance that involves simple masking rules, custom at-rest encryption, and other general functions typically considered pure - records in & records out, but no external API or state maintained.

Note: Inline engine is not merged in tree yet. Only the async engine below.

sync engine

While the synchronous transformations are critical, they largely allow the framework developers to change the behavior of Redpanda, like custom topic compaction strategy at runtime. The bulk of transformations, however, will typically include either some state (e.g.: geo-ip translations), or some API for enrichments (e.g.: credit score information, fraud detection, budget accounting) that are better suited for the async engine. This is also the engine that is available for you to try today, so we’ll focus on it.

async engine

For stateful transformations, there is an exact 1-1 number of partitions with the parent topic. Imagine this scenario rpk topic create tiktok -p 16 -r 3 will create 16 partitions with 3-way replication. To be precise, that is 16 raft groups. Each raft group will have 3 nodes in the group.

If you deploy a tiktok filter that makes everyone’s stream ALL-CAPS, it too would have exactly 16 partitions, would be called tiktok-$allcaps$, and would be stored in the exact same core, disk, and machine as the parent partition. That is, our Stateful transforms execute at the lowest level of the storage engine - the partition - and inherit the same partition scalability of the parent stream. This is an important mental model, because all transformations happen on every machine locally. There is no back-and-forth between Redpanda and a Spark or a Flink job.

CPU Time - execution engine recommendations

Redpanda Transforms (sync)	Redpanda Transforms (async)	Other tools
microseconds-9ms	10-50ms	50-100ms+
V8::Isolate (Q3 release)	Sidecar - avail today!	Apache Flink / Apache Spark

The multiple execution engines (sync and async) are glued together by a component we call Pacemaker. Pacemaker actively listens for the registration of new scripts (.wasm + .js) via rpk wasm deploy and starts a watch on all new incoming data for the registered topics. The pacemaker is critical for crash recovery. In essence the pacemaker keeps a map of Wasm scripts to offset state. The offset state is a list of partitions on this core and how far into the log is the script at. The most important context for the pacemaker is that it executes only on core-local information. That is, it understands the requests for new script deployments and, after initialization, it only sends data to the execution engine based on the local storage information, making it very low latency as there is no additional coordination on steady state. The only coordinating moment is during new script registration. The rest is all core-local information.

pacemaker for wasm

WebAssembly will change server-side software permanently. Allowing injection of code inside Redpanda at strategic points fundamentally changes what the storage engine can do for you - while giving you isolation, high function density, granularity of execution, ~native performance, multi-tenant by design, all in your favorite programming language.

If you like hacking on this kind of low-latency storage systems, check out our open engineering spots (basically hiring for all eng roles :D ). Let us know what you think on Twitter or join our community slack channel for questions. Can’t wait to see what you all build next!