🇵🇸 Donate eSIMs to Gaza 🇵🇸

Web Assembly Primer

A large red C pointing to the Web Assembly Logo with a

Intended Audience

Me, from a week ago!

This will be pretty technical, and maybe not that interesting unless you’ve spent a lot of time in the Web coding space.

But damn, it would’ve saved me a bunch of time if I’d found something like this!

I’ve been working on reproducing a cool research paper on computational life and ran into some performance problems with my initial Javascript implementation. It was too darn slow!

The simulation described in the paper consists of creating 2 ^ 17 random brainfuck programs, pairing them up, running each pair, and then splitting them up again. 1

Unfortunately, my JS code, even when running in parallel, took over thirty minutes to run a full simulation!

I wanted it to go faster and remembered that WebAssembly was a thing! I’d never used it before, but I’d heard that it was much faster than Javascript.

My code was pretty simple; surely it couldn’t be that hard to port over?

A picture from Spongebob Square Pants of the text 'One Week Later...' on top of a green background.

Oh my god, this was so much harder than I expected.

The documentation for WebAssembly is fragmented across the internet, and what information exists is often out of date. The specification itself is a WIP, with the tooling ecosystem constantly changing to keep up.

You’ll find StackOverflow posts from 2021 saying that something is impossible, only to learn that it’s now possible in 2025, but only if you use these undocumented compiler flags.

Also, fun fact: running WebAssembly in parallel was disabled in browsers between 2018 and 2020 because of Spectre, which put a damper on both the community and the documentation.

In 2025, getting a simple C program compiled to WebAssembly and then running it in parallel is possible, but it requires you to learn about a whole bunch of different web technologies and tools.

I couldn’t find any primers that had everything you need to know in one place, so I figured I’d give back and write one myself!


Primer

Table of Contents

Introduction

Our goal here is to take a C program, compile it into a WebAssembly module, and then run that module in the browser, in parallel, working against a shared set of memory.

We’ll be working with a very simple C program that just squares some numbers.

The hard part of this exercise isn’t the code itself, it’s all the nonsense required to get the code running, so best to keep the code as simple as possible.

The primer is designed to be read in order – each section builds on the last. That said, if you want to jump to the final code to get a sense of where we’ll end up, more power to you!

Getting Started

Our examples are going to be written in HTML, JS, and C.

I’ve set up demo pages for each HTML example, so you don’t need to run the code on your own machine if you don’t want to/can’t (e.g. you’re reading this on a phone or a locked-down computer).

But I do think it’s helpful to get it running on your own, especially if you want to use any of this stuff in a real project.

To compile the C code locally, you’re going to need to install llvm and wasm-ld – depending on your OS these may already have these. If you’re on MacOS, see these extra tips:

MacOS Installation Tips

I had some trouble with installing these on my M1 Macbook.

Firstly, you’ll need the XCode Developer Tools, but the version of Clang they install doesn’t have the wasm32 target.

So to get that, you’ll need to install llvm again using homebrew.

Next, the internet says that llvm normally includes wasm-ld but the version I installed didn’t, so you’ll have to install it directly.

Finally, make sure your PATH is set up to point to the homebrew version of llvm, not the default XCode one.

  brew install llvm;
  brew install wasm-ld;

  # put in .bashrc
  export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

To run HTML that uses threaded WebAssembly locally, you’re going to need a static web server that serves the files with CORS security headers (see MDN).

Unfortunately, this means that just opening the HTML file in your browser won’t work, and the tried and true Python http.server won’t work either, since it doesn’t set those headers.

I’ve written a simple Go server here that serves the files with the proper headers:

// server.go
package main

import (
    "log"
    "net/http"
)

func addCORS(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
        w.Header().Set("Cross-Origin-Embedder-Policy", "require-corp")
        w.Header().Set("Cross-Origin-Opener-Policy", "same-origin")
        next.ServeHTTP(w, req)
    })
}

func main() {
    port := ":5555"
    handler := addCORS(http.FileServer(http.Dir(".")))
    
    log.Print("Server started at localhost" + port)
    
    err := http.ListenAndServe(port, handler)
    if (err != nil) {
        log.Fatal(err)
    }
}

Put that in the same directory as the HTML files you want to serve, run go build server.go, then launch the server with ./server.

Head to localhost:5555 in your browser, and you’ll see your list of files. Click on one, and you’re good to go!

If all that feels daunting, please feel free to just follow along with the demo pages below each HTML code example.


On WebAssembly

WebAssembly (wasm) is a compiled assembly-like language that can be run by browsers at “near-native” speed.

Now, I feel I need to say that most of the time you probably don’t need to use WebAssembly. The V8 Javascript engine is incredible and the days of Javascript being 100 times slower than native code are over.

My JS code for the computational life simulator only ran ~3.5 times slower than the WebAssembly, and that’s with a use-case that’s pretty much perfect for WebAssembly, with no I/O or server requests.

Additionally, two of the most common use-cases for super fast code on the Web are probably graphics and cryptographic functions. With graphics you’d probably be better off using WebGL, and for crypto code… don’t roll your own crypto!

But hey, there are some use-cases for WebAssembly, so that’s all I’ll say about that. On to using it!


Writing WebAssembly

WebAssembly is a stack-based assembly language and while you can write it directly using the WebAssembly text format (wat), most people write their code in another higher-level language and then compile it down into wasm.

I chose C, but from my digging around in the past week, it looks like there’s a lot more discussion/community around using Rust. So maybe give that a go.

The big difference between compiling from C to native OS machine code vs WebAssembly is that many of the standard functions don’t make sense without an operating system. What should printf do when called by code running in the browser? Print to the dev console? Create a new DOM element? What about malloc?

The standard2 (MDN recommended) way to compile C to wasm is to use the compiler toolchain Emscripten, which solves this problem by simulating an entire POSIX OS in Javascript.

With Emscripten, a lot of existing C/C++ code can be compiled straight to WebAssembly, at the cost of a whole bunch of extra Javascript glue code being added. It probably works great, but it’s a complicated dependency and I wanted to see if I could do without it.

The big downside then is that you can’t use the C standard library. For my project, that’s fine, but it may be a non-starter in other situations. I did find this nice blog post about linking parts of the C standard library without using Emscripten, give that a read if you’re interested.

In the next section, we’ll go over compiling a very simple C program into WebAssembly, without Emscripten.


Compiling C to WebAssembly without Emscripten

The C program we’ll be using is very simple, it just squares a given number:

// square.c

__attribute__((export_name("square"))) int square(int number) {
    return number * number;
}

First thing to note is that this is the full program; we don’t have a main function. Since the point of this code is to be called by our JS code, we don’t need it to do anything when it first starts up.

Then there’s the __attribute__((export_name("square"))) bit. This is a Clang attribute which declares that the function square will be exported in the resulting compiled WebAssembly, with no additional compiler flags needed.

We’ll add more to the program later to support shared memory, but we’ll start with this for now.

The Clang command to compile this is:

clang \
  --target=wasm32 \
  -nostdlib \
  -O3 \
  -Wl,--no-entry \
  -o square.wasm \
  square.c

In order to run this, you need llvm clang and wasm-ld installed (see the Getting Started section).

Breaking this down we have:

  • --target=wasm32 declares that we want to compile to WebAssembly and not the local environment’s machine code
  • -nostdlib states that the code use’s no standard library and so it doesn’t try to link to one
  • -O3 makes the resulting code as performant as possible
  • -Wl,--no-entry tells the linker to that there is no entry function (e.g. int main() {})

The -Wl, syntax passes the following flag to the linker, in our case wasm-ld. You can see all the wasm-ld flags here, and we’ll be using some more of them later.

If all that works you should have a WebAssembly binary called square.wasm in your file system! Ok, so what exactly is a .wasm file?


wasm, wat, and wabt

.wasm files are compact binary representations of WebAssembly code, usually called a “module”. They’re not intended to be directly written or read by humans. Instead, they’re compiled from some higher level language (C in our case).

But is there a way to still read a .wasm file? Kinda! You can convert your wasm into a textual representation of the binary format called WebAssembly text format (wat).

wat is designed to be read and written by humans, and it ends up looking like a cross between lisp and the x86 assembly language.

The WebAssembly team maintains a set of tools called the The WebAssembly Binary Toolkit (wabt) that allow you to debug wasm code, including the wasm2wat tool that converts wasm 2 wat.

You can download the wabt toolkit from their Github and run it locally, or you can upload a .wasm file to their online demo page here.

This is the wabt command that I’m using to convert our square.wasm file:

/wabt/bin/wasm2wat --generate-names square.wasm > square.wat

The --generate-names flag gives auto-generated names to any unnamed variables in our code (i.e. all local variables). It’s not necessary, but it makes the resulting .wat easier to read.

And here’s the generated square.wat file:

(module $square.wasm
  (type $t0 (func (param i32) (result i32)))
  (func $square (export "square") (type $t0) (param $p0 i32) (result i32)
    (i32.mul
      (local.get $p0)
      (local.get $p0)))
  (table $T0 1 1 funcref)
  (memory $memory (export "memory") 2)
  (global $__stack_pointer (mut i32) (i32.const 66560)))

You can see our square func there that’s calling mul with our parameter $p0 passed in twice.

The square function is exported, because of the export_name attribute we used in square.c. This means that it’ll be callable from our Javascript environment.

You can also see that we export a memory object – we’ll talk about that in depth later in the tutorial.

We won’t go over the table or the __stack_pointer bits in this primer, so don’t worry about them for now. If you want to learn more about WebAssembly tables, this blog post by Dennis is good!

We’ll be using the wasm2wat tool a few more times – it’s really useful for debugging your compiler commands.

So, now we have a .wasm file and know how to read it. How can we run it?


Running WebAssembly from JS

You can’t just run a wasm file directly like you can a native executable – WebAssembly modules need a host environment to run against.

For us, that host environment will be the browser, but you can also run it with Node JS and on other environments. See this page on WebAssembly Portability for more.

The way it works is the host receives the .wasm file, and then it compiles it again into whatever machine code is appropriate for the environment.

It can then pass that compiled machine code whatever memory or other imports it needs, and then it can run it!

In the browser, there’s a JS API that does this process for you.

const module = await WebAssembly.compileStreaming(fetch("square.wasm"));

// This holds whatever imports the wasm module needs.
// Right now we don't use this, but later on we'll be passing
// in a shared memory object here.
const importObject = {};

const instance = await WebAssembly.instantiate(module, importObject);

const four = instance.exports.square(2);

First, we fetch the square.wasm file and pass it to compileStreaming. This then compiles our WebAssembly code into machine code, and gives us a WebAssembly.Module object representation of it.

This module object can’t be used directly – it’s not stateful. But it can be passed around, e.g. to a Web Worker, which can then “instantiate” it.

The WebAssembly.instantiate function takes a module and an importObject and returns a WebAssembly.Instance object, which is now a stateful representation of our wasm code, with some chunk of your browser’s memory reserved for its use.

This instance has an exports property that contains Javascript representations of anything the wasm code exports – in our case, the square function.

By calling instance.exports.square(2), we run the underlying wasm code, and get back the number 4!

If you don’t need to pass the module around, you can do compiling and instantiation steps in one go with the instantiateStreaming function:

const importObject = {};
const {module, instance} = await WebAssembly.instantiateStreaming(
    fetch("square.wasm"),
    importObject
);

We’ve now run our WebAssembly code from Javascript! Wahoo!

Now, our goal is to run our wasm code in parallel, with each module having access to a shared set of memory.

To that end, below we’ll take a step back from WebAssembly to talk about how parallelism works in Javascript in general.


Web Workers

Web Workers are the JS way of running code in multiple CPU threads, i.e. in parallel.

By default, JS code runs in the so-called “main thread”. The main thread is what has access to the DOM and user input, and is blocking.

Blocking means that if you have a long running operation as a result of some user interaction, no other Javascript will run while the code is still going.

Web Workers, on the other hand, get run in a different thread than the main JS code.

This means you can run your slow code in a worker and asynchronously send a message back to the main thread when it’s done. Meanwhile, the main thread will remain free to handle additional user interaction.

The API to use Web Workers is pretty simple, with Javascript handling a lot of the tricky details that you often have to manage with other libraries.

All you need is another JS file with the code that you want to run in the worker. Once it’s started, you call the postMessage function to pass objects between the main thread and the worker.

Here’s an example where the main thread passes a Web Worker a list of numbers to be squared.

<!-- webworker.html -->

<!DOCTYPE html>

<script type="module">
    const worker = new Worker('example_worker.js');
    worker.postMessage({
        squareThese: [1, 2, 3, 4]
    });
    
    worker.addEventListener('message', (e) => {
        console.log("Main thread - Result");
        console.log(e.data.result);
    });
</script>
// example_worker.js

self.addEventListener('message', (e) => {
    const result = [];
    // The object from postMessage gets set as e.data
    console.log("Web Worker thread - squareThese");
    console.log(e.data.squareThese);

    for(const num of e.data.squareThese) {
        result.push(num * num);
    }
    postMessage({result});
});

Demo page

We use the worker’s message event to pass messages between the main thread and the worker. We listen for the event with an EventListener, and send the event with the postMessage function. The MDN docs have a great page on how all this works if you want to read more.

If you give the above a try, you’ll see that the main thread will print out [1, 4, 9, 16] as expected!

So now that we’ve got standard Javscript running in a Web Worker, let’s change the above code to call our WebAssembly from before. We’ll keep the webworker.html file the same, and only change the example_worker.js file.

// example_worker.js

self.addEventListener('message', async (e) => { // note that we added async here    
    const importObject = {};

    // Fetch and compile our square.wasm code
    const {module, instance} = await WebAssembly.instantiateStreaming(fetch("square.wasm"), importObject);

    const result = [];
    for(const num of e.data.squareThese) {
        // Call our wasm function
        console.log(`Worker thread - calling wasm square with num: ${num}`);
        result.push(instance.exports.square(num));
    }
    postMessage({result});
});

Demo page

Bam! Now our wasm code is running on a different CPU core than the main JS thread.

You’ll note that we instantiate the wasm module inside the worker – you can’t pass wasm instances from the main thread to a worker.

You can pass modules, which would potentially speed things up, but for simplicity we’ll just be doing the whole instantiation in the worker.

Right now we have only one Web Worker, but we can create multiple, and they’ll get assigned to the different CPU cores of your machine. On my laptop that means that I can run four different Web Workers in parallel. If you make more workers than you have cores, that’s fine, just some will get assigned to the same core and take turns executing.

Let’s change our code above one more time, to use multiple Web Workers. New lines are commented!

<!-- webworker.html -->
<!DOCTYPE html>

<script type="module">
    const squareThese = [1, 2, 3, 4];

    // We'll create a worker for each number in squareThese, and wrap its creation in a promise.
    // The promise will get resolved when the main thread gets a message back from the worker.
    // Then we'll wait for all the promises to finish down below.
    const promises = [];
    for(let i = 0; i < squareThese.length; i++) {
        promises.push(new Promise((resolve, reject) => {
            const worker = new Worker('example_worker.js');

            worker.addEventListener('message', (e) => {
                // Mutate squareThese with the new number.
                squareThese[i] = e.data.result;
                
                // Resolve the promise, so we know that this worker has finished.
                resolve();
            }, {once: true})

            worker.postMessage({
                // We only pass one number to the worker now, instead of the whole list.
                squareThis: squareThese[i],
                // Passing an id for console logging purposes.
                workerId: i
            });
        }));
    }

    // Wait for all the workers to be done.
    const results = await Promise.all(promises);

    // Every number in this array should now be squared!
    console.log(squareThese);
</script>
// example_worker.js

self.addEventListener('message', async (e) => {
    const importObject = {};
    const {instance, module} = await WebAssembly.instantiateStreaming(fetch("square.wasm"), importObject);
    
    console.log(`In worker #${e.data.workerId}, going to square ${e.data.squareThis}`);
    // Call the wasm function with our single passed number, and pass it back to the main thread.
    postMessage({
        result: instance.exports.square(e.data.squareThis)
    });
});

Demo page

The big change we made was using Promises to orchestrate creating and calling multiple different workers. Hopefully it’s easy enough to read!

In my console log, I see the following output.

In worker #2, going to square 3

In worker #1, going to square 2

In worker #0, going to square 1

In worker #3, going to square 4

› (4) [1, 4, 9, 16]

Note that the order the Web Workers are running in parallel and may finish at different times between runs.

Therefore, your output will likely be in a different order than mine, and will be different each time you run it.

Now we have multiple workers running our wasm code in parallel, we can talk about how to share memory between them!


Web Assembly’s Linear Memory

WebAssembly code is backed by a chunk of memory represented by a linear range of addresses.

When you create a WebAssembly instance in Javascript, the browser reserves some chunk of your computer’s memory for that instance. Any memory your wasm code reads or writes (variables, data passed in from Javascript, function stacks, etc) exists within that reserved chunk.

The WASM module itself defines the initial amount of memory that it needs.

In our wat representation of our square.wasm file, we can see this line:

  (memory $memory (export "memory") 2)

This declares that our linear memory is two pages long. Each page is 64 KiB, as defined by the WebAssembly spec.

Where does that number, 2 pages, come from? After all, we don’t define it in anywhere in our C code. Turns out, it’s the default that wasm-ld, our compiler linker, assigns.

wasm-ld ends up controlling a lot of the structure of the resulting web assembly code, from how much memory it uses, to what things get exported. You can change a lot of this using wasm-ld’s flags – the full list can be found here.

In this case, we can set the size of the linear memory by using the --initial-memory flag, which takes the size in bytes.

If we wanted to change our program above to use 3 pages of memory, we could do so by adding this to our compiler command:

-Wl,--initial-memory=196608

Where 196608 bytes is 3 64KiB pages (64 * 1024 * 3 == 199608).

Exporting memory

Now, if you look at the .wat again, you’ll see that the memory is marked as an export. This is the default for wasm-ld, and it means that if you instantiate this binary in Javascript, you can access the linear memory through the instance’s exports object.

For example, here’s a simple HTML file that console logs the exported memory:

<!DOCTYPE html>

<html>
<script type="module">
    const { instance, module } = await WebAssembly.instantiateStreaming(fetch("square.wasm"));

    console.log(instance.exports.memory);
</script>

</html>

Demo page

The output I get is as follows:

⌄ Memory(2)

› buffer: ArrayBuffer(131072)

This shows that the Memory object has two pages (131072 bytes). Looking at the objects properties, you see that it has one called buffer, which is an ArrayBuffer with, as expected, 131072 bytes.

You can then use the DataView interface to read and write to that buffer from your JS code – we’ll be doing this later.

So that’s how exported memory works. But WebAssembly also allows you to import the module’s linear memory from the host environment.

This will let us create a set of memory and import it into multiple instances of the same module.

Importing memory

To import memory, we need to pass a different set of flags to wasm-ld. Nothing in the higher level C code needs to change.

In addition to the --initial_memory flag above, we need to pass the --import-memory flag.

The full clang command will be:

clang \
  --target=wasm32 \
  -nostdlib \
  -O3 \
  -Wl,--no-entry \
  -Wl,--initial-memory=131072 \
  -Wl,--import-memory `# this flag is new` \
  -o square.wasm \
  square.c

If we then convert the new square.wasm into a .wat, we get:

(module $square.wasm
  (type $t0 (func (param i32) (result i32)))
  (import "env" "memory" (memory $env.memory 2))
  (func $square (type $t0) (param $p0 i32) (result i32)
    local.get $p0
    local.get $p0
    i32.mul)
  (table $T0 1 1 funcref)
  (global $__stack_pointer (mut i32) (i32.const 66560))
  (export "square" (func $square)))

You’ll see that the export memory line is gone, replaced with a new import:

(import "env" "memory" (memory $env.memory 2))

This states that the program expects a memory object, 2 pages long, to be passed to it, under the namespace env.memory.

If you then reload the HTML from above, with the new square.wasm, you’ll see that you get an error when calling instantiateStreaming, since we’re not passing in that memory:

Uncaught TypeError: WebAssembly.instantiate(): Imports argument must be present and must be an object

So let’s change our Javascript to pass in the correct import object:

<!DOCTYPE html>

<html>
<script type="module">
    const linearMemory = new WebAssembly.Memory({
        initial: 2,
    });

    const importObject = {
        env: {
            memory: linearMemory
        }
    };

    const { instance, module } = await WebAssembly.instantiateStreaming(
        fetch("square.wasm"),
        importObject
    );

    console.log(instance.exports);
</script>

</html>

Demo page

Console log output:

⌄ Object

› square: f $square()

You see that we create a Memory object with an initial 2 pages which we pass in as env.memory.

If you do less than 2 pages (e.g. only one), you’ll get an error like this:

Uncaught LinkError: WebAssembly.instantiate(): Import #0 "env" "memory": memory import has 1 pages which is smaller than the declared initial of 2

If you pass more than 2 pages it works – the wasm spec allows for growing your memory at runtime.

And then if you change the namespace from env.memory to foo.memory, you get the somewhat confusing:

Uncaught TypeError: WebAssembly.instantiate(): Import #0 "env": module is not an object or function

Finally, our console.log statement prints out the module’s exports. You’ll notice that memory isn’t there anymore – it’s no longer being exported, so it’s not on the export object.

Shared memory

Ok, so we’re able to create a Memory object in the JS main thread and pass it to a WebAssembly module, but can we use that Memory object with our Web Worker example above?

Not yet! Let’s try to pass our linear memory object into a Web Worker using postMessage:

<!DOCTYPE html>

<html>
<script type="module">
    const linearMemory = new WebAssembly.Memory({
        initial: 2,
    });

    // You can use any JS file to create the worker, 
    // because this example is going to throw an error 
    // when we call postMessage.
    const worker = new Worker('example_worker.js');

    worker.postMessage({linearMemory});
</script>
</html>

Demo page

If you run this code, you should get the error:

Uncaught DataCloneError: Failed to execute 'postMessage' on 'Worker': #<Memory> could not be cloned.

Normal WebAssembly.Memory's can’t be passed into Web Workers. I assume this is because their underlying data store is an ArrayBuffer, which is an instance of the confusingly named Transferable object. This means that, when passed to a web worker, it gets destroyed at its original reference, and cloned in the web worker.

A whiteboard comic of three men in Star Trek outfits. The man on the left says to the men on the right: "So you dematerialized and were rebuilt atom by atom in a new location. What's the big deal?" The men on the right, especially one wearing a purple shirt, look terrified.
Art by Math with Bad Drawings. 3

Fortunately for Mr. Purple Shirt, there’s a concept of a SharedArrayBuffer which allows for multiple views to the same underlying buffer.

We can create a WebAssembly.Memory that’s backed by a SharedArrayBuffer by setting the shared and maximum properties, like so:

const sharedMemory = new WebAssembly.Memory({
    shared: true,
    initial: 2,
    maximum: 2
});

console.log(sharedMemory);

Console log output:

⌄ Memory(2)

› buffer: SharedArrayBuffer(131072)

We set the maximum to be the same as our initial because we don’t need to grow our memory at runtime.

I’m not 100% sure why maximum is a required property. Based on this design rational, I think it’s so that the browser never has to move memory around that multiple threads might be touching (though given that realloc is thread-safe, I’m not quite sure why that’s a problem). Regardless, it’s part of the spec and you’ll get an error if you don’t set it.

If you now try to pass your sharedMemory to a worker using postMessage, you’ll see that it succeeds!

But if you then try to call instantiateStreaming in the Web Worker, with your sharedMemory, you’ll get a new error.

// main thread
const sharedMemory = new WebAssembly.Memory({
    shared: true,
    initial: 2,
    maximum: 2
});

worker.postMessage({sharedMemory});

// inside worker
onmessage = (e) => {

    const importObject = { env: { memory: e.data.sharedMemory } };

    // will currently throw an error
    const { instance, module } = await WebAssembly.instantiateStreaming(
        fetch('square.wasm'), 
        importObject
    );
}

Error:

LinkError: WebAssembly.instantiate(): Import #0 "env" "memory": mismatch in shared state of memory, declared = 0, imported = 1

This is because, as well as us setting shared: true in the Webassembly.Memory constructor, the wasm module itself needs to mark its memory import as shared.

We can do this by using different wasm-ld flags, which we’ll go over in the next section!

Memory Summary

This section was a lot, so here’s a quick summary:

  • Memory is declared by the WebAssembly Module, which defines whether or not it imports or exports memory.
  • Your module needs to declare the initial size of its memory, and optionally, the maximum.
  • You use wasm-ld flags to declare which type and how much memory you want in your Module.
  • WebAssembly Memories that are created in Javascript cannot be sent to a Web Worker unless that memory is marked as shared.

Compiling a WebAssembly binary that uses shared imported memory

Say that ten times fast!

Before we get into the actual compiling, we’re going to use a new C program. After all, our old square.c doesn’t really have any use for shared memory – it just squares the number passed to it.

Our new program, shared_square.c will have an array of numbers that are read and then squared by multiple WebWorkers at once.

// shared_square.c

#include <stdint.h>

__attribute__((visibility("default"))) uint32_t numbers[100];

__attribute__((visibility("default"))) void initNumbers() {
    for(uint32_t i = 0; i < 100; i++) {
        numbers[i] = i;
    }
}

__attribute__((visibility("default")))
void square(uint32_t start, uint32_t end) {
    for(uint32_t i = start; i < end; i++) {
        numbers[i] *= numbers[i];
    }
}

This program defines a fixed uint32 array called numbers, an initNumbers function to populate that array, and a square function that squares all numbers in the array between start and end.

The new __attribute__((visibility("default"))) line is another Clang attribute that, when combined with the --export-dynamic linker flag, exports a symbol.

At the end of .wat for this module, you’ll see that numbers, initNumbers and square are all exported:

  (func $initNumbers (type $t0)
  ...
  )
  (func $square (type $t1) (param $p0 i32) (param $p1 i32)
  ...
  )
  (global $numbers i32 (i32.const 1024))
  (export "initNumbers" (func $initNumbers))
  (export "numbers" (global $numbers))
  (export "square" (func $square))

We use visibility instead of the earlier export_names attribute because the latter only works with functions. See the wasm-ld page for more information about how to export things.

Compiling with shared memory

The command to compile the above shared_square.c program is:

clang  \
--target=wasm32 \
-nostdlib \
-O3 \
-Wl,--no-entry \
-Wl,--import-memory \
-Wl,--initial-memory=131072  \
`# START_NEW_FLAGS` \
-Wl,--export-dynamic  \
-Wl,--shared-memory, \
-mbulk-memory \
-matomics \
`# END_NEW_FLAGS` \
-o shared_square.wasm \
shared_square.c

Breaking down the new flags, we have:

  • Wl,--export-dynamic, as talked about above, this exports any symbols marked with “default” visibility
  • Wl,--shared-memory sets the shared attribute on the memory line in the wasm code
  • -mbulk-memory: Required compiler feature flag to use shared memory
  • -matomics: Required compiler feature flag to use shared memory

Finally, now that we’re using shared-memory, if we want to generate our .wat file so we can read the compiled code, we need to pass in an extra --enable-threads flag to wabt:

/wabt/bin/wasm2wat --generate-names --enable-threads shared_square.wasm > shared_square.wat

Without that extra flag, you’ll get a error: memory may not be shared: threads not allowed message.

With all these flags set, the wat for our memory will end up being:

(import "env" "memory" (memory $env.memory 2 2 shared))

As you can see, the shared attribute is set. This allows us to, in our Javascript, pass in a WebAssembly.Memory backed by a SharedArrayBuffer without getting an error!


Reading our exported numbers array

wasm-ld Memory Layout

We have our shared memory, and we have the numbers array somewhere inside it. But where exactly is it?

As far as WebAssembly is concerned, memory is just a big linear block, with nothing to differentiate one section of memory from another. But wasm-ld creates its own structured layout, to consistently map our variables and other C concepts (like the heap) onto the linear memory.

Before diving into the layout, I’ll say that I’m not certain that I’ve got this section entirely correct. I can’t find any official documentation for the memory layout – the best one I got was this page on DynamicLinking, which is a WIP and may be emscripten specific. What I say below comes from that page, Surma’s blog from 2020, and my own reverse engineering. You’ve been warned!

With that disclaimer, here’s the layout for our shared_square.wasm when it’s been compiled with an initial memory of 2 pages (131072 bytes).

Note: you can see all of these wasm-ld defined variables for yourself by compiling the program with the --export-all flag instead of --export-dynamic.

We start at 0, and then wasm-ld reserves the first 1024 bytes for its own use.

At 1024 bytes we have the start of the globals section – the fixed memory that we assign at compile time.

For us, since we only have one global variable, that’s our where our numbers array gets stored, which you can see in the wat file with this line:

(global $numbers i32 (i32.const 1024))

The numbers array needs space for 100 4-byte int_32t elements, or a total of 400 (100 * 4) bytes.

So, numbers[0] starts at 1024 and then ends 400 bytes later at 1424 – which is marked by data_end.

This is where the stack begins (stack_low) and the stack can grow up to the heap_base section, at 66960 bytes. Then the heap gets the remaining bytes, up to the maximum of 131072 (heap_end).

That’s the memory layout and where the numbers array fits within it! For more information (and for a 5 line malloc implementation that lets you write data to the heap!) I recommend reading Surya’s blog post.

Reading the numbers array from Javascript

Now that we know the offset in linear memory for our numbers array, we can read it with Javascript!

Of course, we don’t want to hardcode that 1024 number if we can avoid it – which is where exporting the numbers array comes in.

When you an array in WebAssembly, it’s not the contents that get exported – it’s the offset!

So, in our Javascript, the instance.exports.numbers property will be the offset of 1024. Putting this together, we can write Javascript code to read the numbers array from the shared memory object like so:

const sharedMemory = new WebAssembly.Memory({
    shared: true,
    initial: 2,
    maximum: 2
});

const importObject = {
    env: { 
        memory: sharedMemory 
    }
};

const { instance, module } = await WebAssembly.instantiateStreaming(
    fetch('shared_square.wasm'), 
    importObject
);

// Call initNumbers so that our Array has values,
// instead of being set to all zeros.
instance.exports.initNumbers();

// This should be 1024, the offset in linear memory
// where the numbers array starts.
const numbersOffset = instance.exports.numbers;

// Create a DataView from our memory buffer, starting at the 
// offset, that's 400 bytes long.
// Best practice would be to export the array length from our C
// program as well -- an exercise for the reader!
const view = new DataView(sharedMemory.buffer, numbersOffset, 400);

// Should print out: 0 1 2
console.log(
    view.getUint32(0 * 4, true), 
    view.getUint32(1 * 4, true), 
    view.getUint32(2 * 4, true)
);

Demo page

We use the DataView interface to read our imported memory.

DataView takes a buffer, and then optional offset and length within that buffer, and then returns an object with methods you can call to read and write to the buffer.

We use the getUint32 method to read the memory, which, given an offset, will return the next four bytes as a Javascript number.

We pass true to the second parameter (the littleEndian parameter), because all memory in WebAssembly is stored as little-endian.

Unfortunately, this little-endian-ness means that we can’t use the nicer Uint32Array interface, because its methods assume that the memory is big-endian. :(

I find the DataView interface fiddly and easy to get wrong, so I like to create a little helper class to call the methods for me, something like:

class ExportedArray {
    constructor(buffer, offset, { numOfElements, bytesPerElement }) {
        this.bytesPerElement = bytesPerElement;
        this.view = new DataView(buffer, offset, numOfElements * bytesPerElement);
    }

    get(index) {
        return this.view.getUint32(this.bytesPerElement * index, true);
    }
}

const numbers = new ExportedArray(sharedMemory.buffer, instance.exports.numbers, {
    numOfElements: 100,
    bytesPerElement: 4
});
console.log(numbers.get(0), numbers.get(1), numbers.get(2));

This is especially nice if you have lots of exported arrays. You can even make a C macro on the other end to export the array along with its size (bytesPerElement) and length (numOfElements).


Putting it all together

So, we finally have all the pieces we need to write some Javascript that squares the numbers array in parallel! If you made it this far, congratulations!

Below is all the Javascript we need. To sum it up, we’re going to instantiate the wasm module in the main thread, call initNumbers to get our starting data, create four Web Workers, and then have each of them instantiate the module and call square against a different range of numbers. Finally, we’ll read out a few numbers to confirm that they’ve been squared.

<!-- shared_square.html -->
<!DOCTYPE html>

<html>
<script type="module">
    /** ================ Helpers =================== */ 

    // Call postMessage for each worker,
    // and wait for them all to send a message back.
    async function postToEachWorker(workers, getMessage) {
        const promises = [];
        for (let i = 0; i < workers.length; i++) {
            const worker = workers[i];
            promises.push(new Promise((resolve, reject) => {
                worker.addEventListener('message', (e) => {
                    resolve();
                });

                worker.postMessage(getMessage(worker, i));
            }));
        }
        return Promise.all(promises);
    }

    // Make it easier to read a Uint32 array in 
    // WebAssembly memory.
    class ExportedArray {
        constructor(buffer, offset, { numOfElements, bytesPerElement }) {
            this.bytesPerElement = bytesPerElement;
            this.view = new DataView(buffer, offset, numOfElements * bytesPerElement);
        }

        get(index) {
            return this.view.getUint32(this.bytesPerElement * index, true);
        }
    }

    /** ================ Main ================== */ 
    const sharedMemory = new WebAssembly.Memory({
        shared: true,
        initial: 2,
        maximum: 2
    });

    const { instance, module } = await WebAssembly.instantiateStreaming(fetch("shared_square.wasm"), {
        env: {
            memory: sharedMemory
        }
    });

    const lengthOfNumbers = 100;
    // numWorkers must be a factor of lengthOfNumbers
    const numWorkers = 4;
    const rangeLength = lengthOfNumbers / numWorkers;

    // Create the Web Workers
    const workers = [];
    for (let i = 0; i < numWorkers; i++) {
        workers.push(new Worker('shared_square_worker.js'));
    }

    // Tell each Web Worker to create its own
    // shared_square.wasm module
    await postToEachWorker(workers, (worker, i) => {
        return {
            type: 'instantiate',
            workerId: i, // used for console logging
            memory: sharedMemory,
        };
    });

    // Initialize the array with numbers 0 to 99.
    instance.exports.initNumbers();

    // Helper object to read the numbers from the shared memory.
    const numbers = new ExportedArray(sharedMemory.buffer, instance.exports.numbers, {
        numOfElements: 100,
        bytesPerElement: 4
    });

    // Verify initial values
    console.log("initial values");
    console.log("numbers[0]: ", numbers.get(0)); // Should be 0
    console.log("numbers[1]: ", numbers.get(1)); // Should be 1
    console.log("numbers[2]: ", numbers.get(2)); // Should be 2
    console.log("numbers[98]: ", numbers.get(98)); // Should be 98
    console.log("numbers[99]: ", numbers.get(99)); // Should be 99

    // Tell each worker to call the WebAssembly "square" function
    // against a different range of numbers.
    await postToEachWorker(workers, (worker, i) => {
        return {
            type: 'square',
            start: i * rangeLength,
            end: (i+1) * rangeLength
        };
    });

    // Print the new values.
    console.log("values after squaring");
    console.log("numbers[0]: ", numbers.get(0)); // Should be 0² = 0
    console.log("numbers[1]: ", numbers.get(1)); // Should be 1² = 1
    console.log("numbers[2]: ", numbers.get(2)); // Should be 2² = 4
    console.log("numbers[98]: ", numbers.get(98)); // Should be 98² = 9604
    console.log("numbers[99]: ", numbers.get(99)); // Should be 99² = 9801
</script>

</html>
// shared_square_worker.js

let wasmInstance, workerId;

onmessage = (e) => {
    const { type } = e.data;
    switch (type) {
        case 'instantiate':
            const memory = e.data.memory;
            workerId = e.data.workerId;

            WebAssembly.instantiateStreaming(fetch("shared_square.wasm"), {
                env: { memory: memory }
            }).then(
                (obj) => {
                    wasmInstance = obj.instance;
                    postMessage(null);
                }
            );
            break;
        case 'square':
            const start = e.data.start;
            const end = e.data.end;

            wasmInstance.exports.square(start, end);

            console.log(`worker #${workerId}: squared numbers between ${start} and ${end}`);
            
            postMessage(null);    
            break;
        default:
            throw new Error("unexpected type: ", type);
    }
}

Demo page

The only new code we added was a helper, postToEachWorker, to abstract away the code that posts messages to each worker, and waits for them to complete.

When I load the HTML, I get the following in my console:

initial values

numbers[0]: 0

numbers[1]: 1

numbers[2]: 2

numbers[98]: 98

numbers[99]: 99

squaring

worker #2: squared numbers between 50 and 75

worker #3: squared numbers between 75 and 100

worker #0: squared numbers between 0 and 25

worker #1: squared numbers between 25 and 50

values after squaring

numbers[0]: 0

numbers[1]: 1

numbers[2]: 4

numbers[98]: 9604

numbers[99]: 9981

Note that the order the Web Workers work in is non-deterministic – your squaring output might be different than mine.

We’ve successfully compiled our C code into WebAssembly and are running multiple instances of it, in parallel, with a shared set of memory!

And with that… we’re done!

🥳
😮‍💨

Conclusion

If you made it this far, thank you! :D

I will note that we’re doing things the hard way here, with no C standard library, etc. If you’re planning on using C/C++ and WebAssembly I do recommend checking out Emscripten, as it may well save you a lot of effort.

But I’m hoping that with this primer, you’ll have a bit more of a foundational understanding of how WebAssembly works, which you can bring to everything else you learn in the future!

This was my first time writing such a long technical primer, so if you have any feedback, I’d love to hear it! Feel free to contact me with questions or feedback at:

Below I’ve got a few more mini-sections that didn’t fit in elsewhere in the primer, and then a list of suggested readings if you want to learn more about this topic.

Addendum

Atomic operations

In this primer, we don’t risk any of the normal parallel memory access problems (e.g. two threads trying to write the same number at the same time) because we make it so each Web Worker works against a different range of the shared memory.

WebAssembly does have support for atomic operations, but I haven’t tried to get the C code compiled to use them.

I suspect the Clang atomic attributes would work to generate the appropriate WebAssembly atomic operations – but again, I haven’t tried it.

It also looks like Emscripten has an implementation of pthreads (which isn’t quite what we want) – but I bet you could get Emscripten to generate atomic operations.

If anybody has an answer to this, I’d love to hear it!

Zig

When I was learning to compile C code into WebAssembly, I went down the path of trying to use the Zig compiler toolchain.

Zig is a programming language, but the community that maintains it also provides a compiler toolchain that is really nice for doing cross-platform Zig, C, and C++.

A lot of the internet resources recommend using it to compile to wasm, and for simple cases it does work out of the box.

Unfortunately, it didn’t work for me for the more complicated cases where you need to pass in lots of wasm-ld flags (e.g. shared imported memory).

The Zig toolchain views the linker as black box detail.

Rather than set linker flags directly, you set Zig CLI flags which Zig will then internally translate into linker commands.

If they don’t have a path to translate “zig flag” to “the linker flag you care about”, you’re out of luck (as is detailed in this Github issue).

Anyway, maybe there is a way to get it to work, but I found it easier to just use Clang directly. ¯\(ツ)/¯

Debugging

I find that wasm isn’t the easiest thing to debug.

The best tooling I found is actually the Chrome Dev Console, which lets you set breakpoints in your executing wasm code and will give you a stack trace on errors.

While you’re actively writing your C code, I recommend compiling your C to both wasm and a native executable, and then do your debugging with normal C debuggers (e.g. gdb).

Then hopefully you won’t have to do too much debugging on the wasm end.

As for tests… no idea! Best I found is integration tests where you stand up the JS environment and transitively test the wasm code.


Additional Reading

If you want to learn more about this topic, here’s some interesting articles I read while trying to figure all this out.

  • All of Surma’s blog posts about WebAssembly: https://surma.dev/
    • Seriously, this primer wouldn’t exist without his blog!
  • This breakdown of the WASM memory management story by a Unity developer.
    • I think some of this has improved since 2021, but the author juj goes into a ton of great detail about how WebAssembly is being used in the gaming space.
  • nullprogram’s blog post about learning WebAssembly to compile his video game.
    • He goes over a lot of the same topics as this primer, and uses a much more realistic example than my shared_square.c!
    • I only found this post while I was writing this primer – wish I’d found it earlier!

  1. The researchers discovered that, around 60% of the time, after 16,000 iterations of this, self-replicating programs would appear and spread through the ecosystem. Life! Seriously, this is really cool, go read the paper, or for lighter reading, Peter Watt’s breakdown of it.↩︎

  2. Though here’s a Reddit comment from a year ago stating that emscripten is old, so who the hell knows. In WebAssembly land, a year in the past might as well be the dark ages.↩︎

  3. At time of posting, this art was licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. See the sidebar of the page. Web Archive link.↩︎