4

I'm very interested in WebAssembly, yet am dismayed that even a "Hello World" example, coded in C++ & compiled using Emscripten, produces a total of 396KB to load in the browser. What gives? How can this be made more size-efficient?

3 Answers 3

13

Summary

  • For larger projects like a game engine, the Emscripten generated code has proportionally less size overhead compared to a small Hello World example.
  • Emscripten has recently made large improvements in shrinking the code size. Make sure you use a recent Emscripten release.
  • Adding -Os –closure 1 may reduce the size of the generated code by 10x.

Below follows a description to answer the question how can this be made more size-efficient


Why is so much code generated?

The amount of Webassembly generated is proportional to the amount of C++ code written and the dependencies of that code. A C++ program that has a dependency on the standard library is depending on more code than you might expect. A simple add() function like this...

int add(int x, int y) {
    return x + y;
}

..Will generate a short Webassembly function like this:

(func $add (param $x i32 $y i32) (return i32)
  (get_local 0
   get_local 1
   i32.add))

But a call to printf will need to have definitions for functions like strlen, flockfile, funlockfile, memcpy, fwrite, fputs, __stdio_write, i.e. all the functions from the standard library needed for making the printf call. A C++ program running in the native environment would just be linked against the proper libc for the platform, but Webassembly needs to carry those library dependencies along.

In addition to the userspace library dependencies, the tool that generates Webassembly also needs to provide a runtime environment that handle system calls. So a Hello World program needs to have definitions that overrides the system calls for allocating memory and for writing bytes.


How can the compiler shrink the code size?

Alon Zakai, the creator and maintainer of emscripten, has written the Mozilla Hacks article Shrinking Webassembly and Javascript code sizes in Emscripten. I'm gonna summarize the main points from that article here:

Emscripten initially focused on making it easy to port existing C and C++ programs by providing a Posix environment by implementing a libc and a runtime for system calls. In the name of convinience, more code was often included than was needed.

A lot of the runtime was implemented as Javascript code. Emscripten generates code that calls back and forth between the application/library Webassembly code and the Javascript runtime.

Code that is never called, should be removed. In compilers that's handled by an optimization called Dead Code Elimination. Emscripten builds a graph of all functions and removes those parts that are never called from main. Ok, this is not strictly correct but suffices for this explanation.

But the compiler wasn't previously capable of generating that sort of graph for calls that crossed the boundary between Webassembly and Javascript. That changed with the inclusion of the wasm-dce tool. Now, Emscripten can create a graph of both the Webassembly and the Javascript code.


What is the limit of "shrinkage" for a Hello World program

printf is a general function that operates on file descriptors and that is thread-safe. The code that is generated for a printf call pretty much all must be there.

If you want to experiment more with what code is being generated, I recommend the Webassembly Studio online IDE. It provides an example Hello World project with a README that goes over what library code and runtime Javascript code is generated.

2
  • Nice detailed answer with lots of explanation! This should be the accepted answer, not mine.
    – Bee
    Commented Mar 2, 2018 at 12:35
  • I agree this is a great answer, & has given me much direction upon where to read further. I have made this the accepted answer, despite the waffle-penalty.
    – Jack
    Commented Mar 2, 2018 at 21:07
3

The higher optimization levels introduce progressively more aggressive optimization, resulting in improved performance and code size at the cost of increased compilation time. - Source

That's what the emscripten docs say about shrinking file sizes. Be aware of the fact that a higher level of optimization does not mean smaller file sizes. Each optimization option behaves different which is described pretty well in the docs.

The following example makes use of the -Os code optimization option which makes the compiler behave like this:

Like -O3, but with extra optimizations that reduce code size at the expense of performance. This can affect both bitcode generation and JavaScript. - Source

emcc -Os file.cpp

-1

You will get smaller sizes and better performance when wasm implements DOM/web APIs so that you need to callback javascript .

Not the answer you're looking for? Browse other questions tagged or ask your own question.