Web Assembly Primer

Intended Audience
Me, from a week ago!
This will be pretty technical, and maybe not that interesting unless youâve spent a lot of time in the Web coding space.
But damn, it wouldâve saved me a bunch of time if Iâd found something like this!
Iâve been working on reproducing a cool research paper on computational life and ran into some performance problems with my initial Javascript implementation. It was too darn slow!
The simulation described in the paper consists of creating 2 ^ 17 random brainfuck programs, pairing them up, running each pair, and then splitting them up again. 1
Unfortunately, my JS code, even when running in parallel, took over thirty minutes to run a full simulation!
I wanted it to go faster and remembered that WebAssembly was a thing! Iâd never used it before, but Iâd heard that it was much faster than Javascript.
My code was pretty simple; surely it couldnât be that hard to port over?

Oh my god, this was so much harder than I expected.
The documentation for WebAssembly is fragmented across the internet, and what information exists is often out of date. The specification itself is a WIP, with the tooling ecosystem constantly changing to keep up.
Youâll find StackOverflow posts from 2021 saying that something is impossible, only to learn that itâs now possible in 2025, but only if you use these undocumented compiler flags.
Also, fun fact: running WebAssembly in parallel was disabled in browsers between 2018 and 2020 because of Spectre, which put a damper on both the community and the documentation.
In 2025, getting a simple C program compiled to WebAssembly and then running it in parallel is possible, but it requires you to learn about a whole bunch of different web technologies and tools.
I couldnât find any primers that had everything you need to know in one place, so I figured Iâd give back and write one myself!
Primer
Table of Contents
- Introduction
- On WebAssembly
- Writing WebAssembly
- Compiling C to WebAssembly without Emscripten
- wasm, wat, and wabt
- Running WebAssembly from JS
- Web Workers
- Web Assemblyâs Linear Memory
- Compiling a WebAssembly binary that uses shared imported memory
- Reading our exported numbers array
- Putting it all together
- Conclusion
- Addendum
- Additional Reading
Introduction
Our goal here is to take a C program, compile it into a WebAssembly module, and then run that module in the browser, in parallel, working against a shared set of memory.
Weâll be working with a very simple C program that just squares some numbers.
The hard part of this exercise isnât the code itself, itâs all the nonsense required to get the code running, so best to keep the code as simple as possible.
The primer is designed to be read in order â each section builds on the last. That said, if you want to jump to the final code to get a sense of where weâll end up, more power to you!
Getting Started
Our examples are going to be written in HTML, JS, and C.
Iâve set up demo pages for each HTML example, so you donât need to run the code on your own machine if you donât want to/canât (e.g. youâre reading this on a phone or a locked-down computer).
But I do think itâs helpful to get it running on your own, especially if you want to use any of this stuff in a real project.
To compile the C code locally, youâre going to need to install llvm and wasm-ld â depending on your OS these may already have these. If youâre on MacOS, see these extra tips:
MacOS Installation Tips
I had some trouble with installing these on my M1 Macbook.
Firstly, youâll need the XCode Developer Tools, but the version of
Clang they install doesnât have the wasm32
target.
So to get that, youâll need to install llvm
again using
homebrew.
Next, the internet says that llvm
normally includes
wasm-ld
but the version I installed didnât, so youâll have
to install it directly.
Finally, make sure your PATH is set up to point to the homebrew
version of llvm
, not the default XCode one.
brew install llvm;
brew install wasm-ld;
# put in .bashrc
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
To run HTML that uses threaded WebAssembly locally, youâre going to need a static web server that serves the files with CORS security headers (see MDN).
Unfortunately, this means that just opening the HTML file in your
browser wonât work, and the tried and true Python
http.server
wonât work either, since it doesnât set those
headers.
Iâve written a simple Go server here that serves the files with the proper headers:
// server.go
package main
import (
"log"
"net/http"
)
func addCORS(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
.Header().Set("Cross-Origin-Embedder-Policy", "require-corp")
w.Header().Set("Cross-Origin-Opener-Policy", "same-origin")
w.ServeHTTP(w, req)
next})
}
func main() {
:= ":5555"
port := addCORS(http.FileServer(http.Dir(".")))
handler
.Print("Server started at localhost" + port)
log
:= http.ListenAndServe(port, handler)
err if (err != nil) {
.Fatal(err)
log}
}
Put that in the same directory as the HTML files you want to serve,
run go build server.go
, then launch the server with
./server
.
Head to localhost:5555
in your browser, and youâll see
your list of files. Click on one, and youâre good to go!
If all that feels daunting, please feel free to just follow along with the demo pages below each HTML code example.
On WebAssembly
WebAssembly (wasm) is a compiled assembly-like language that can be run by browsers at ânear-nativeâ speed.
Now, I feel I need to say that most of the time you probably donât need to use WebAssembly. The V8 Javascript engine is incredible and the days of Javascript being 100 times slower than native code are over.
My JS code for the computational life simulator only ran ~3.5 times slower than the WebAssembly, and thatâs with a use-case thatâs pretty much perfect for WebAssembly, with no I/O or server requests.
Additionally, two of the most common use-cases for super fast code on the Web are probably graphics and cryptographic functions. With graphics youâd probably be better off using WebGL, and for crypto code⌠donât roll your own crypto!
But hey, there are some use-cases for WebAssembly, so thatâs all Iâll say about that. On to using it!
Writing WebAssembly
WebAssembly is a stack-based assembly language and while you can write it directly using the WebAssembly text format (wat), most people write their code in another higher-level language and then compile it down into wasm.
I chose C, but from my digging around in the past week, it looks like thereâs a lot more discussion/community around using Rust. So maybe give that a go.
The big difference between compiling from C to native OS machine code
vs WebAssembly is that many of the standard functions donât make sense
without an operating system. What should printf
do when
called by code running in the browser? Print to the dev console? Create
a new DOM element? What about malloc
?
The standard2 (MDN recommended) way to compile C to wasm is to use the compiler toolchain Emscripten, which solves this problem by simulating an entire POSIX OS in Javascript.
With Emscripten, a lot of existing C/C++ code can be compiled straight to WebAssembly, at the cost of a whole bunch of extra Javascript glue code being added. It probably works great, but itâs a complicated dependency and I wanted to see if I could do without it.
The big downside then is that you canât use the C standard library. For my project, thatâs fine, but it may be a non-starter in other situations. I did find this nice blog post about linking parts of the C standard library without using Emscripten, give that a read if youâre interested.
In the next section, weâll go over compiling a very simple C program into WebAssembly, without Emscripten.
Compiling C to WebAssembly without Emscripten
The C program weâll be using is very simple, it just squares a given number:
// square.c
((export_name("square"))) int square(int number) {
__attribute__return number * number;
}
First thing to note is that this is the full program; we donât have a
main
function. Since the point of this code is to be called
by our JS code, we donât need it to do anything when it first starts
up.
Then thereâs the __attribute__((export_name("square")))
bit. This is a Clang attribute
which declares that the function square
will be exported in
the resulting compiled WebAssembly, with no additional compiler flags
needed.
Weâll add more to the program later to support shared memory, but weâll start with this for now.
The Clang command to compile this is:
clang \
--target=wasm32 \
-nostdlib \
-O3 \
-Wl,--no-entry \
-o square.wasm \
square.c
In order to run this, you need llvm clang and wasm-ld installed (see the Getting Started section).
Breaking this down we have:
--target=wasm32
declares that we want to compile to WebAssembly and not the local environmentâs machine code-nostdlib
states that the code useâs no standard library and so it doesnât try to link to one-O3
makes the resulting code as performant as possible-Wl,--no-entry
tells the linker to that there is no entry function (e.g.Âint main() {}
)
The -Wl,
syntax passes the following flag to the linker,
in our case wasm-ld
. You can see all the
wasm-ld
flags here, and weâll be
using some more of them later.
If all that works you should have a WebAssembly binary called
square.wasm
in your file system! Ok, so what exactly is a
.wasm
file?
wasm, wat, and wabt
.wasm
files are compact binary representations of
WebAssembly code, usually called a âmoduleâ. Theyâre not intended to be
directly written or read by humans. Instead, theyâre compiled from some
higher level language (C in our case).
But is there a way to still read a .wasm
file? Kinda!
You can convert your wasm
into a textual representation of
the binary format called WebAssembly
text format (wat
).
wat
is designed to be read and written by
humans, and it ends up looking like a cross between lisp and the x86
assembly language.
The WebAssembly team maintains a set of tools called the The WebAssembly Binary
Toolkit (wabt
) that allow you to debug
wasm
code, including the wasm2wat
tool that converts wasm
2 wat
.
You can download the wabt
toolkit from their Github and run it
locally, or you can upload a .wasm
file to their online
demo page here.
This is the wabt command that Iâm using to convert our
square.wasm
file:
/wabt/bin/wasm2wat --generate-names square.wasm > square.wat
The --generate-names
flag gives auto-generated names to
any unnamed variables in our code (i.e. all local variables). Itâs not
necessary, but it makes the resulting .wat
easier to
read.
And hereâs the generated square.wat
file:
(module $square.wasmtype $t0 (func (param i32) (result i32)))
(export "square") (type $t0) (param $p0 i32) (result i32)
(func $square (
(i32.mulget $p0)
(local.get $p0)))
(local.1 1 funcref)
(table $T0 export "memory") 2)
(memory $memory (66560))) (global $__stack_pointer (mut i32) (i32.const
You can see our square
func there thatâs calling
mul
with our parameter $p0
passed in
twice.
The square
function is export
ed, because of
the export_name
attribute we used in square.c
.
This means that itâll be callable from our Javascript environment.
You can also see that we export
a memory
object â weâll talk about that in depth later in the tutorial.
We wonât go over the table
or the
__stack_pointer
bits in this primer, so donât worry about
them for now. If you want to learn more about WebAssembly tables, this
blog post by
Dennis is good!
Weâll be using the wasm2wat
tool a few more times â itâs
really useful for debugging your compiler commands.
So, now we have a .wasm
file and know how to read it.
How can we run it?
Running WebAssembly from JS
You canât just run a wasm
file directly like you can a
native executable â WebAssembly modules need a host environment to run
against.
For us, that host environment will be the browser, but you can also run it with Node JS and on other environments. See this page on WebAssembly Portability for more.
The way it works is the host receives the .wasm
file,
and then it compiles it again into whatever machine code is appropriate
for the environment.
It can then pass that compiled machine code whatever memory or other imports it needs, and then it can run it!
In the browser, thereâs a JS API that does this process for you.
const module = await WebAssembly.compileStreaming(fetch("square.wasm"));
// This holds whatever imports the wasm module needs.
// Right now we don't use this, but later on we'll be passing
// in a shared memory object here.
const importObject = {};
const instance = await WebAssembly.instantiate(module, importObject);
const four = instance.exports.square(2);
First, we fetch
the square.wasm
file and
pass it to compileStreaming
. This then compiles our
WebAssembly code into machine code, and gives us a WebAssembly.Module
object representation of it.
This module
object canât be used directly â itâs not
stateful. But it can be passed around, e.g. to a Web Worker, which can
then âinstantiateâ it.
The WebAssembly.instantiate
function takes a
module
and an importObject
and returns a WebAssembly.Instance
object, which is now a stateful representation of our wasm
code, with some chunk of your browserâs memory reserved for its use.
This instance
has an exports
property that
contains Javascript representations of anything the wasm
code exports â in our case, the square
function.
By calling instance.exports.square(2)
, we run the
underlying wasm
code, and get back the number
4
!
If you donât need to pass the module
around, you can do
compiling and instantiation steps in one go with the
instantiateStreaming
function:
const importObject = {};
const {module, instance} = await WebAssembly.instantiateStreaming(
fetch("square.wasm"),
importObject; )
Weâve now run our WebAssembly code from Javascript! Wahoo!
Now, our goal is to run our wasm
code in parallel, with
each module having access to a shared set of memory.
To that end, below weâll take a step back from WebAssembly to talk about how parallelism works in Javascript in general.
Web Workers
Web Workers are the JS way of running code in multiple CPU threads, i.e. in parallel.
By default, JS code runs in the so-called âmain threadâ. The main thread is what has access to the DOM and user input, and is blocking.
Blocking means that if you have a long running operation as a result of some user interaction, no other Javascript will run while the code is still going.
Web Workers, on the other hand, get run in a different thread than the main JS code.
This means you can run your slow code in a worker and asynchronously send a message back to the main thread when itâs done. Meanwhile, the main thread will remain free to handle additional user interaction.
The API to use Web Workers is pretty simple, with Javascript handling a lot of the tricky details that you often have to manage with other libraries.
All you need is another JS file with the code that you want to run in
the worker. Once itâs started, you call the postMessage
function to pass objects between the main thread and the worker.
Hereâs an example where the main thread passes a Web Worker a list of numbers to be squared.
<!-- webworker.html -->
<!DOCTYPE html>
<script type="module">
const worker = new Worker('example_worker.js');
.postMessage({
workersquareThese: [1, 2, 3, 4]
;
})
.addEventListener('message', (e) => {
workerconsole.log("Main thread - Result");
console.log(e.data.result);
;
})</script>
// example_worker.js
.addEventListener('message', (e) => {
selfconst result = [];
// The object from postMessage gets set as e.data
console.log("Web Worker thread - squareThese");
console.log(e.data.squareThese);
for(const num of e.data.squareThese) {
.push(num * num);
result
}postMessage({result});
; })
We use the workerâs message
event to pass messages
between the main thread and the worker. We listen for the event with an
EventListener, and send the event with the postMessage
function. The MDN docs have a great
page on how all this works if you want to read more.
If you give the above a try, youâll see that the main thread will
print out [1, 4, 9, 16]
as expected!
So now that weâve got standard Javscript running in a Web Worker,
letâs change the above code to call our WebAssembly from before. Weâll
keep the webworker.html
file the same, and only change the
example_worker.js
file.
// example_worker.js
.addEventListener('message', async (e) => { // note that we added async here
selfconst importObject = {};
// Fetch and compile our square.wasm code
const {module, instance} = await WebAssembly.instantiateStreaming(fetch("square.wasm"), importObject);
const result = [];
for(const num of e.data.squareThese) {
// Call our wasm function
console.log(`Worker thread - calling wasm square with num: ${num}`);
.push(instance.exports.square(num));
result
}postMessage({result});
; })
Bam! Now our wasm code is running on a different CPU core than the main JS thread.
Youâll note that we instantiate the wasm module inside the worker â you canât pass wasm instances from the main thread to a worker.
You can pass modules, which would potentially speed things up, but for simplicity weâll just be doing the whole instantiation in the worker.
Right now we have only one Web Worker, but we can create multiple, and theyâll get assigned to the different CPU cores of your machine. On my laptop that means that I can run four different Web Workers in parallel. If you make more workers than you have cores, thatâs fine, just some will get assigned to the same core and take turns executing.
Letâs change our code above one more time, to use multiple Web Workers. New lines are commented!
<!-- webworker.html -->
<!DOCTYPE html>
<script type="module">
const squareThese = [1, 2, 3, 4];
// We'll create a worker for each number in squareThese, and wrap its creation in a promise.
// The promise will get resolved when the main thread gets a message back from the worker.
// Then we'll wait for all the promises to finish down below.
const promises = [];
for(let i = 0; i < squareThese.length; i++) {
.push(new Promise((resolve, reject) => {
promisesconst worker = new Worker('example_worker.js');
.addEventListener('message', (e) => {
worker// Mutate squareThese with the new number.
= e.data.result;
squareThese[i]
// Resolve the promise, so we know that this worker has finished.
resolve();
, {once: true})
}
.postMessage({
worker// We only pass one number to the worker now, instead of the whole list.
squareThis: squareThese[i],
// Passing an id for console logging purposes.
workerId: i
;
});
}))
}
// Wait for all the workers to be done.
const results = await Promise.all(promises);
// Every number in this array should now be squared!
console.log(squareThese);
</script>
// example_worker.js
.addEventListener('message', async (e) => {
selfconst importObject = {};
const {instance, module} = await WebAssembly.instantiateStreaming(fetch("square.wasm"), importObject);
console.log(`In worker #${e.data.workerId}, going to square ${e.data.squareThis}`);
// Call the wasm function with our single passed number, and pass it back to the main thread.
postMessage({
result: instance.exports.square(e.data.squareThis)
;
}); })
The big change we made was using Promises to orchestrate creating and calling multiple different workers. Hopefully itâs easy enough to read!
In my console log, I see the following output.
In worker #2, going to square 3
In worker #1, going to square 2
In worker #0, going to square 1
In worker #3, going to square 4
âş (4)Â [1, 4, 9, 16]
Note that the order the Web Workers are running in parallel and may finish at different times between runs.
Therefore, your output will likely be in a different order than mine, and will be different each time you run it.
Now we have multiple workers running our wasm code in parallel, we can talk about how to share memory between them!
Web Assemblyâs Linear Memory
WebAssembly code is backed by a chunk of memory represented by a linear range of addresses.
When you create a WebAssembly instance in Javascript, the browser reserves some chunk of your computerâs memory for that instance. Any memory your wasm code reads or writes (variables, data passed in from Javascript, function stacks, etc) exists within that reserved chunk.
The WASM module itself defines the initial amount of memory that it needs.
In our wat
representation of our
square.wasm
file, we can see this line:
export "memory") 2) (memory $memory (
This declares that our linear memory is two pages long. Each page is 64 KiB, as defined by the WebAssembly spec.
Where does that number, 2 pages, come from? After all, we donât
define it in anywhere in our C code. Turns out, itâs the default that
wasm-ld
, our compiler linker, assigns.
wasm-ld
ends up controlling a lot of the structure of
the resulting web assembly code, from how much memory it uses, to what
things get exported. You can change a lot of this using
wasm-ld
âs flags â the full list can be found here.
In this case, we can set the size of the linear memory by using the
--initial-memory
flag, which takes the size in bytes.
If we wanted to change our program above to use 3 pages of memory, we could do so by adding this to our compiler command:
-Wl,--initial-memory=196608
Where 196608 bytes is 3 64KiB pages (64 * 1024 * 3 == 199608).
Exporting memory
Now, if you look at the .wat
again, youâll see that the
memory is marked as an export
. This is the default for
wasm-ld
, and it means that if you instantiate this binary
in Javascript, you can access the linear memory through the instanceâs
exports object.
For example, hereâs a simple HTML file that console logs the exported memory:
<!DOCTYPE html>
<html>
<script type="module">
const { instance, module } = await WebAssembly.instantiateStreaming(fetch("square.wasm"));
console.log(instance.exports.memory);
</script>
</html>
The output I get is as follows:
â Memory(2)
âş buffer: ArrayBuffer(131072)
This shows that the Memory object has two pages (131072 bytes).
Looking at the objects properties, you see that it has one called
buffer
, which is an ArrayBuffer with, as expected, 131072
bytes.
You can then use the DataView
interface to read and write to that buffer
from your JS
code â weâll be doing this later.
So thatâs how exported memory works. But WebAssembly also allows you to import the moduleâs linear memory from the host environment.
This will let us create a set of memory and import it into multiple instances of the same module.
Importing memory
To import memory, we need to pass a different set of flags to
wasm-ld
. Nothing in the higher level C code needs to
change.
In addition to the --initial_memory
flag above, we need
to pass the --import-memory
flag.
The full clang command will be:
clang \
--target=wasm32 \
-nostdlib \
-O3 \
-Wl,--no-entry \
-Wl,--initial-memory=131072 \
-Wl,--import-memory `# this flag is new` \
-o square.wasm \
square.c
If we then convert the new square.wasm
into a
.wat
, we get:
(module $square.wasmtype $t0 (func (param i32) (result i32)))
(import "env" "memory" (memory $env.memory 2))
(type $t0) (param $p0 i32) (result i32)
(func $square (get $p0
local.get $p0
local.
i32.mul)1 1 funcref)
(table $T0 66560))
(global $__stack_pointer (mut i32) (i32.const export "square" (func $square))) (
Youâll see that the export
memory line is gone, replaced
with a new import
:
import "env" "memory" (memory $env.memory 2)) (
This states that the program expects a memory object, 2 pages long,
to be passed to it, under the namespace env.memory
.
If you then reload the HTML from above, with the new
square.wasm
, youâll see that you get an error when calling
instantiateStreaming
, since weâre not passing in that
memory:
Uncaught TypeError: WebAssembly.instantiate(): Imports argument must be present and must be an object
So letâs change our Javascript to pass in the correct import object:
<!DOCTYPE html>
<html>
<script type="module">
const linearMemory = new WebAssembly.Memory({
initial: 2,
;
})
const importObject = {
env: {
memory: linearMemory
};
}
const { instance, module } = await WebAssembly.instantiateStreaming(
fetch("square.wasm"),
importObject;
)
console.log(instance.exports);
</script>
</html>
â Object
âş square: f $square()
You see that we create a Memory
object with an initial 2
pages which we pass in as env.memory
.
If you do less than 2 pages (e.g. only one), youâll get an error like this:
Uncaught LinkError: WebAssembly.instantiate(): Import #0 "env" "memory": memory import has 1 pages which is smaller than the declared initial of 2
If you pass more than 2 pages it works â the wasm spec allows for growing your memory at runtime.
And then if you change the namespace from env.memory
to
foo.memory
, you get the somewhat confusing:
Uncaught TypeError: WebAssembly.instantiate(): Import #0 "env": module is not an object or function
Finally, our console.log
statement prints out the
moduleâs exports. Youâll notice that memory
isnât there
anymore â itâs no longer being exported, so itâs not on the export
object.
Shared memory
Ok, so weâre able to create a Memory object in the JS main thread and pass it to a WebAssembly module, but can we use that Memory object with our Web Worker example above?
Not yet! Letâs try to pass our linear memory object into a Web Worker
using postMessage
:
<!DOCTYPE html>
<html>
<script type="module">
const linearMemory = new WebAssembly.Memory({
initial: 2,
;
})
// You can use any JS file to create the worker,
// because this example is going to throw an error
// when we call postMessage.
const worker = new Worker('example_worker.js');
.postMessage({linearMemory});
worker</script>
</html>
If you run this code, you should get the error:
Uncaught DataCloneError: Failed to execute 'postMessage' on 'Worker': #<Memory> could not be cloned.
Normal WebAssembly.Memory's
canât be passed into Web
Workers. I assume this is because their underlying data store is an
ArrayBuffer
, which is an instance of the confusingly named
Transferable
object. This means that, when passed to a web worker, it gets
destroyed at its original reference, and cloned in the web worker.

Fortunately for Mr. Purple Shirt, thereâs a concept of a
SharedArrayBuffer
which allows for multiple views to the
same underlying buffer.
We can create a WebAssembly.Memory
thatâs backed by a
SharedArrayBuffer
by setting the shared
and
maximum
properties, like so:
const sharedMemory = new WebAssembly.Memory({
shared: true,
initial: 2,
maximum: 2
;
})
console.log(sharedMemory);
Console log output:
â Memory(2)
âş buffer: SharedArrayBuffer(131072)
We set the maximum
to be the same as our
initial
because we donât need to grow our memory at
runtime.
Iâm not 100% sure why maximum
is a required property.
Based on this design
rational, I think itâs so that the browser never has to
move memory around that multiple threads might be touching (though given
that realloc
is thread-safe, Iâm not quite sure why thatâs
a problem). Regardless, itâs part of the spec and youâll get an error if
you donât set it.
If you now try to pass your sharedMemory
to a worker
using postMessage
, youâll see that it succeeds!
But if you then try to call instantiateStreaming
in the
Web Worker, with your sharedMemory
, youâll get a new
error.
// main thread
const sharedMemory = new WebAssembly.Memory({
shared: true,
initial: 2,
maximum: 2
;
})
.postMessage({sharedMemory});
worker
// inside worker
= (e) => {
onmessage
const importObject = { env: { memory: e.data.sharedMemory } };
// will currently throw an error
const { instance, module } = await WebAssembly.instantiateStreaming(
fetch('square.wasm'),
importObject;
) }
Error:
LinkError: WebAssembly.instantiate(): Import #0 "env" "memory": mismatch in shared state of memory, declared = 0, imported = 1
This is because, as well as us setting shared: true
in
the Webassembly.Memory
constructor, the wasm
module itself needs to mark its memory import as
shared
.
We can do this by using different wasm-ld
flags, which
weâll go over in the next section!
Memory Summary
This section was a lot, so hereâs a quick summary:
- Memory is declared by the WebAssembly Module, which defines whether or not it imports or exports memory.
- Your module needs to declare the initial size of its memory, and optionally, the maximum.
- You use
wasm-ld
flags to declare which type and how much memory you want in your Module. - WebAssembly Memories that are created in Javascript cannot be sent to a Web Worker unless that memory is marked as shared.
Compiling a WebAssembly binary that uses shared imported memory
Say that ten times fast!
Before we get into the actual compiling, weâre going to use a new C
program. After all, our old square.c
doesnât really have
any use for shared memory â it just squares the number passed to it.
Our new program, shared_square.c
will have an array of
numbers that are read and then squared by multiple WebWorkers at
once.
// shared_square.c
#include <stdint.h>
((visibility("default"))) uint32_t numbers[100];
__attribute__
((visibility("default"))) void initNumbers() {
__attribute__for(uint32_t i = 0; i < 100; i++) {
[i] = i;
numbers}
}
((visibility("default")))
__attribute__void square(uint32_t start, uint32_t end) {
for(uint32_t i = start; i < end; i++) {
[i] *= numbers[i];
numbers}
}
This program defines a fixed uint32
array called
numbers
, an initNumbers
function to populate
that array, and a square
function that squares all numbers
in the array between start
and end
.
The new __attribute__((visibility("default")))
line is
another Clang attribute that, when combined with the
--export-dynamic
linker flag, exports a symbol.
At the end of .wat
for this module, youâll see that
numbers
, initNumbers
and square
are all exported:
type $t0)
(func $initNumbers (
...
)type $t1) (param $p0 i32) (param $p1 i32)
(func $square (
...
)1024))
(global $numbers i32 (i32.const export "initNumbers" (func $initNumbers))
(export "numbers" (global $numbers))
(export "square" (func $square)) (
We use visibility
instead of the earlier
export_names
attribute because the latter only works with
functions. See the wasm-ld page for more
information about how to export things.
Compiling with shared memory
The command to compile the above shared_square.c
program
is:
clang \
\
--target=wasm32 \
-nostdlib \
-O3 \
-Wl,--no-entry \
-Wl,--import-memory \
-Wl,--initial-memory=131072 `# START_NEW_FLAGS` \
\
-Wl,--export-dynamic \
-Wl,--shared-memory, \
-mbulk-memory \
-matomics `# END_NEW_FLAGS` \
\
-o shared_square.wasm shared_square.c
Breaking down the new flags, we have:
Wl,--export-dynamic
, as talked about above, this exports any symbols marked with âdefaultâ visibilityWl,--shared-memory
sets theshared
attribute on thememory
line in the wasm code-mbulk-memory
: Required compiler feature flag to use shared memory-matomics
: Required compiler feature flag to use shared memory
Finally, now that weâre using shared-memory
, if we want
to generate our .wat
file so we can read the compiled code,
we need to pass in an extra --enable-threads
flag to
wabt
:
/wabt/bin/wasm2wat --generate-names --enable-threads shared_square.wasm > shared_square.wat
Without that extra flag, youâll get a
error: memory may not be shared: threads not allowed
message.
With all these flags set, the wat
for our memory will
end up being:
import "env" "memory" (memory $env.memory 2 2 shared)) (
As you can see, the shared
attribute is set. This allows
us to, in our Javascript, pass in a WebAssembly.Memory
backed by a SharedArrayBuffer
without getting an error!
Reading our exported
numbers
array
wasm-ld Memory Layout
We have our shared memory, and we have the numbers
array
somewhere inside it. But where exactly is it?
As far as WebAssembly is concerned, memory is just a big linear
block, with nothing to differentiate one section of memory from another.
But wasm-ld
creates its own structured layout, to
consistently map our variables and other C concepts (like the heap) onto
the linear memory.
Before diving into the layout, Iâll say that Iâm not certain that Iâve got this section entirely correct. I canât find any official documentation for the memory layout â the best one I got was this page on DynamicLinking, which is a WIP and may be emscripten specific. What I say below comes from that page, Surmaâs blog from 2020, and my own reverse engineering. Youâve been warned!
With that disclaimer, hereâs the layout for our
shared_square.wasm
when itâs been compiled with an initial
memory of 2 pages (131072 bytes).
Note: you can see all of thesewasm-ld
defined variables for yourself by compiling the program with the--export-all
flag instead of--export-dynamic
.
We start at 0, and then wasm-ld
reserves the first 1024
bytes for its own use.
At 1024 bytes we have the start of the globals section â the fixed memory that we assign at compile time.
For us, since we only have one global variable, thatâs our where our
numbers
array gets stored, which you can see in the
wat
file with this line:
1024)) (global $numbers i32 (i32.const
The numbers
array needs space for 100 4-byte
int_32t
elements, or a total of 400 (100 * 4) bytes.
So, numbers[0]
starts at 1024 and then ends 400 bytes
later at 1424 â which is marked by data_end
.
This is where the stack begins (stack_low
) and the stack
can grow up to the heap_base
section, at 66960 bytes. Then
the heap gets the remaining bytes, up to the maximum of 131072
(heap_end
).
Thatâs the memory layout and where the numbers
array
fits within it! For more information (and for a 5 line
malloc
implementation that lets you write data to the
heap!) I recommend reading Suryaâs blog
post.
Reading the
numbers
array from Javascript
Now that we know the offset in linear memory for our
numbers
array, we can read it with Javascript!
Of course, we donât want to hardcode that 1024 number if we can avoid
it â which is where exporting the numbers
array comes
in.
When you an array in WebAssembly, itâs not the contents that get exported â itâs the offset!
So, in our Javascript, the instance.exports.numbers
property will be the offset of 1024. Putting this together, we can write
Javascript code to read the numbers
array from the shared
memory object like so:
const sharedMemory = new WebAssembly.Memory({
shared: true,
initial: 2,
maximum: 2
;
})
const importObject = {
env: {
memory: sharedMemory
};
}
const { instance, module } = await WebAssembly.instantiateStreaming(
fetch('shared_square.wasm'),
importObject;
)
// Call initNumbers so that our Array has values,
// instead of being set to all zeros.
.exports.initNumbers();
instance
// This should be 1024, the offset in linear memory
// where the numbers array starts.
const numbersOffset = instance.exports.numbers;
// Create a DataView from our memory buffer, starting at the
// offset, that's 400 bytes long.
// Best practice would be to export the array length from our C
// program as well -- an exercise for the reader!
const view = new DataView(sharedMemory.buffer, numbersOffset, 400);
// Should print out: 0 1 2
console.log(
.getUint32(0 * 4, true),
view.getUint32(1 * 4, true),
view.getUint32(2 * 4, true)
view; )
We use the DataView interface to read our imported memory.
DataView
takes a buffer, and then optional offset and
length within that buffer, and then returns an object with methods you
can call to read and write to the buffer.
We use the getUint32
method to read the memory, which,
given an offset, will return the next four bytes as a Javascript
number
.
We pass true
to the second parameter (the
littleEndian
parameter), because all memory in WebAssembly
is stored as little-endian.
Unfortunately, this little-endian-ness means that we canât use the nicer Uint32Array interface, because its methods assume that the memory is big-endian. :(
I find the DataView interface fiddly and easy to get wrong, so I like to create a little helper class to call the methods for me, something like:
class ExportedArray {
constructor(buffer, offset, { numOfElements, bytesPerElement }) {
this.bytesPerElement = bytesPerElement;
this.view = new DataView(buffer, offset, numOfElements * bytesPerElement);
}
get(index) {
return this.view.getUint32(this.bytesPerElement * index, true);
}
}
const numbers = new ExportedArray(sharedMemory.buffer, instance.exports.numbers, {
numOfElements: 100,
bytesPerElement: 4
;
})console.log(numbers.get(0), numbers.get(1), numbers.get(2));
This is especially nice if you have lots of exported arrays. You can even make a C macro on the other end to export the array along with its size (bytesPerElement) and length (numOfElements).
Putting it all together
So, we finally have all the pieces we need to write some Javascript
that squares the numbers
array in parallel! If you made it
this far, congratulations!
Below is all the Javascript we need. To sum it up, weâre going to
instantiate the wasm module in the main thread, call
initNumbers
to get our starting data, create four Web
Workers, and then have each of them instantiate the module and call
square
against a different range of numbers. Finally, weâll
read out a few numbers to confirm that theyâve been squared.
<!-- shared_square.html -->
<!DOCTYPE html>
<html>
<script type="module">
/** ================ Helpers =================== */
// Call postMessage for each worker,
// and wait for them all to send a message back.
async function postToEachWorker(workers, getMessage) {
const promises = [];
for (let i = 0; i < workers.length; i++) {
const worker = workers[i];
.push(new Promise((resolve, reject) => {
promises.addEventListener('message', (e) => {
workerresolve();
;
})
.postMessage(getMessage(worker, i));
worker;
}))
}return Promise.all(promises);
}
// Make it easier to read a Uint32 array in
// WebAssembly memory.
class ExportedArray {
constructor(buffer, offset, { numOfElements, bytesPerElement }) {
this.bytesPerElement = bytesPerElement;
this.view = new DataView(buffer, offset, numOfElements * bytesPerElement);
}
get(index) {
return this.view.getUint32(this.bytesPerElement * index, true);
}
}
/** ================ Main ================== */
const sharedMemory = new WebAssembly.Memory({
shared: true,
initial: 2,
maximum: 2
;
})
const { instance, module } = await WebAssembly.instantiateStreaming(fetch("shared_square.wasm"), {
env: {
memory: sharedMemory
};
})
const lengthOfNumbers = 100;
// numWorkers must be a factor of lengthOfNumbers
const numWorkers = 4;
const rangeLength = lengthOfNumbers / numWorkers;
// Create the Web Workers
const workers = [];
for (let i = 0; i < numWorkers; i++) {
.push(new Worker('shared_square_worker.js'));
workers
}
// Tell each Web Worker to create its own
// shared_square.wasm module
await postToEachWorker(workers, (worker, i) => {
return {
type: 'instantiate',
workerId: i, // used for console logging
memory: sharedMemory,
;
};
})
// Initialize the array with numbers 0 to 99.
.exports.initNumbers();
instance
// Helper object to read the numbers from the shared memory.
const numbers = new ExportedArray(sharedMemory.buffer, instance.exports.numbers, {
numOfElements: 100,
bytesPerElement: 4
;
})
// Verify initial values
console.log("initial values");
console.log("numbers[0]: ", numbers.get(0)); // Should be 0
console.log("numbers[1]: ", numbers.get(1)); // Should be 1
console.log("numbers[2]: ", numbers.get(2)); // Should be 2
console.log("numbers[98]: ", numbers.get(98)); // Should be 98
console.log("numbers[99]: ", numbers.get(99)); // Should be 99
// Tell each worker to call the WebAssembly "square" function
// against a different range of numbers.
await postToEachWorker(workers, (worker, i) => {
return {
type: 'square',
start: i * rangeLength,
end: (i+1) * rangeLength
;
};
})
// Print the new values.
console.log("values after squaring");
console.log("numbers[0]: ", numbers.get(0)); // Should be 0² = 0
console.log("numbers[1]: ", numbers.get(1)); // Should be 1² = 1
console.log("numbers[2]: ", numbers.get(2)); // Should be 2² = 4
console.log("numbers[98]: ", numbers.get(98)); // Should be 98² = 9604
console.log("numbers[99]: ", numbers.get(99)); // Should be 99² = 9801
</script>
</html>
// shared_square_worker.js
let wasmInstance, workerId;
= (e) => {
onmessage const { type } = e.data;
switch (type) {
case 'instantiate':
const memory = e.data.memory;
= e.data.workerId;
workerId
.instantiateStreaming(fetch("shared_square.wasm"), {
WebAssemblyenv: { memory: memory }
.then(
})=> {
(obj) = obj.instance;
wasmInstance postMessage(null);
};
)break;
case 'square':
const start = e.data.start;
const end = e.data.end;
.exports.square(start, end);
wasmInstance
console.log(`worker #${workerId}: squared numbers between ${start} and ${end}`);
postMessage(null);
break;
default:
throw new Error("unexpected type: ", type);
} }
The only new code we added was a helper,
postToEachWorker
, to abstract away the code that posts
messages to each worker, and waits for them to complete.
When I load the HTML, I get the following in my console:
numbers[0]: 0
numbers[1]: 1
numbers[2]: 2
numbers[98]: 98
numbers[99]: 99
squaringworker #2: squared numbers between 50 and 75
worker #3: squared numbers between 75 and 100
worker #0: squared numbers between 0 and 25
worker #1: squared numbers between 25 and 50
values after squaringnumbers[0]: 0
numbers[1]: 1
numbers[2]: 4
numbers[98]: 9604
numbers[99]: 9981
Note that the order the Web Workers work in is non-deterministic â your squaring output might be different than mine.
Weâve successfully compiled our C code into WebAssembly and are running multiple instances of it, in parallel, with a shared set of memory!
And with that⌠weâre done!
Conclusion
If you made it this far, thank you! :D
I will note that weâre doing things the hard way here, with no C standard library, etc. If youâre planning on using C/C++ and WebAssembly I do recommend checking out Emscripten, as it may well save you a lot of effort.
But Iâm hoping that with this primer, youâll have a bit more of a foundational understanding of how WebAssembly works, which you can bring to everything else you learn in the future!
This was my first time writing such a long technical primer, so if you have any feedback, Iâd love to hear it! Feel free to contact me with questions or feedback at:
Below Iâve got a few more mini-sections that didnât fit in elsewhere in the primer, and then a list of suggested readings if you want to learn more about this topic.
Addendum
Atomic operations
In this primer, we donât risk any of the normal parallel memory access problems (e.g. two threads trying to write the same number at the same time) because we make it so each Web Worker works against a different range of the shared memory.
WebAssembly does have support for atomic operations, but I havenât tried to get the C code compiled to use them.
I suspect the Clang atomic attributes would work to generate the appropriate WebAssembly atomic operations â but again, I havenât tried it.
It also looks like Emscripten has an implementation of pthreads (which isnât quite what we want) â but I bet you could get Emscripten to generate atomic operations.
If anybody has an answer to this, Iâd love to hear it!
Zig
When I was learning to compile C code into WebAssembly, I went down the path of trying to use the Zig compiler toolchain.
Zig is a programming language, but the community that maintains it also provides a compiler toolchain that is really nice for doing cross-platform Zig, C, and C++.
A lot of the internet resources recommend using it to compile to
wasm
, and for simple cases it does work out of the box.
Unfortunately, it didnât work for me for the more complicated cases
where you need to pass in lots of wasm-ld
flags
(e.g. shared imported memory).
The Zig toolchain views the linker as black box detail.
Rather than set linker flags directly, you set Zig CLI flags which Zig will then internally translate into linker commands.
If they donât have a path to translate âzig flagâ to âthe linker flag you care aboutâ, youâre out of luck (as is detailed in this Github issue).
Anyway, maybe there is a way to get it to work, but I found it easier to just use Clang directly. ÂŻ\(ă)/ÂŻ
Debugging
I find that wasm
isnât the easiest thing to debug.
The best tooling I found is actually the Chrome Dev Console, which lets you set breakpoints in your executing wasm code and will give you a stack trace on errors.
While youâre actively writing your C code, I recommend compiling your
C to both wasm
and a native executable, and then do your
debugging with normal C debuggers (e.g. gdb
).
Then hopefully you wonât have to do too much debugging on the
wasm
end.
As for tests⌠no idea! Best I found is integration tests where you
stand up the JS environment and transitively test the wasm
code.
Additional Reading
If you want to learn more about this topic, hereâs some interesting articles I read while trying to figure all this out.
- All of Surmaâs blog posts about WebAssembly: https://surma.dev/
- Seriously, this primer wouldnât exist without his blog!
- This breakdown of the WASM memory
management story by a Unity developer.
- I think some of this has improved since 2021, but the author juj goes into a ton of great detail about how WebAssembly is being used in the gaming space.
- nullprogramâs blog post about
learning WebAssembly to compile his video game.
- He goes over a lot of the same topics as this primer, and uses a
much more realistic example than my
shared_square.c
! - I only found this post while I was writing this primer â wish Iâd found it earlier!
- He goes over a lot of the same topics as this primer, and uses a
much more realistic example than my
The researchers discovered that, around 60% of the time, after 16,000 iterations of this, self-replicating programs would appear and spread through the ecosystem. Life! Seriously, this is really cool, go read the paper, or for lighter reading, Peter Wattâs breakdown of it.âŠď¸
Though hereâs a Reddit comment from a year ago stating that emscripten is old, so who the hell knows. In WebAssembly land, a year in the past might as well be the dark ages.âŠď¸
At time of posting, this art was licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. See the sidebar of the page. Web Archive link.âŠď¸