Bootstrapping Rust Considered Harmful

Have you ever tried to build Rust from source? Why, you might ask?

It shouldn't be that hard, right? Well... this article tries to answer that question, and why I think Rust needs to change it's bootstrapping process significantly. Before we dive into the details, let's see how long it took me to build Rust 1.81 on DragonFly BSD in comparison to other languages (thanks to R and ggplot2 for the diagram):

Warning: This is a "rant" about Rust! Not the language - Rust is beautiful - but it's current implementation. More precisely, it's a rant about the bootstrapping process of Rust's official implementation (not the gcc one) and the huge amount of dependencies that are required to build Rust from source. If you write "minimalist" software - anything that runs on the terminal - please be aware of this "bloat" and consider the possibility of using a less dependency-rich and resource-hungry language - or wait for a more lightweight "gcc" version of Rust. Thank you!

OCaml serves as a reference point in this article, not because it is a better language or anything like that. OCaml is comparably complex, yet its bootstrap process is well implemented: A complete OCaml toolchain can be brought up to live from source in just three minutes using ./configure && make && make install.

Installing Rust from Source

Quoting Rust's README.md about "Installing from Source":

If you really want to install from source (though this is not recommended), see INSTALL.md.

Despite this warning, I download the sources for Rust 1.84.1 rustc-1.84.1-src.tar.gz (note that in the figure above I show the build-time for Rust 1.81.1, as I was unable to build Rust 1.84.1). The compressed tarball is negligible 699 MB in size. Hey, would that still fit on a single CD-ROM? No, it would not!

For comparison, OCaml's sources (version 5.4.0) are 6.0 MB compressed, two orders of magnitudes, I here you cry! Two orders of magnitudes...

To be fair, these 699 MB worth of source code contain all the sources required to build Rust, including some 10 million lines of LLVM code and three different versions of OpenSSL.

Well, it's true. All the sources are included, but not the binary Rust compiler (and the cargo binary), both of which are required to build Rust from source!!! Wait, what? Isn't there a chicken and egg problem? We are actually trying to build Rust from source...!? Yes, there is. Let's talk about this later...

Let's continue and unpack this big beautiful tarball. 2 minutes and 14 seconds later, tar xvzf rustc-1.84.1-src.tar.gz finishes. Side note: That's roughly one minute quicker than it would take to build the whole OCaml toolchain on the same machine. The uncompressed source tree is now at 1.9 GB. That's already more bytes than living Indians on this planet.

Let's continue and open up README.md again, this time from the source tree that we just decompressed. It still states that "Building From Source [...] is not recommended [...] see INSTALL.md", but sadly, that INSTALL.md it refers to didn't make it into the tarball. Well, we've already got 1.9 GB worth of source code, so it seems there was not enough space left to include the install document - Prioritiiiiiies! We've told you that building from source is not recommended!

Counting the Lines of Code

Before we start building Rust (something I try to procrastinate), let's "quickly" run cloc on the Rust source tree...

Quickly as in:

... it's still counting the files...

... still counting... aaaaand... still counting...

... over a minute has passed by and it says 351365 text files...

... now it seems to count unique files...

... another two minutes later it says 276326 unique files... wow...

... and it's still counting, whatever it is counting, maybe lines this time...

... holy cow, the CPU fan now goes full speed, 7 minutes have passed...

... to be fair, Perl (cloc) might not be the best choice here...

... we'd need tokei implemented in Rust for that...

... 8 minutes have now passed by...

... if this takes longer I might have to plug in the power supply...

... 9 minutes (or 3 times OCaml)... and 3-2-1...

... last offer... 10 minutes and we are done!

That was rather quick, wasn't it? Apart from getting a ton of "Line count, exceeded timeout:" lines and 81443 files ignored, the cloc output is truely impressive:

files language blank comment code
68005 Rust 1251355 2413958 20734039
51146 C++ 1435178 2396405 7685767
71995 C 1083792 2936843 5724464
... ... ... ... ...
19 Pest 226 289 849
... ... ... ... ...
1 Brainfuck 3 4 10
1 sed 0 0 5
276326 SUM 6345959 10746776 50771605

A truely remarkable list of languages! I had to cut 101 lines from the output to make it fit. What the hell is "Pest" and "SnakeMake"??? Haha, at least there are 10 lines of Brainfuck!

Okay, that's a total of 20 million lines of Rust code excluding comments and blank lines, in a total of 68005 Rust files, right? And a grand total of 50 million lines of "code" in 276326 files. Impressive! And the list is even incomplete as it was running into several "timeouts".

Just for reference, let's quickly - quickly as in 3 seconds - count the lines of code of the OCaml 5.4.0 toolchain:

files language blank comment code
2825 OCaml 57092 106578 366954
307 C 8478 10140 47737
70 Bourne Shell 5718 6223 35216
12 m4 1481 108 12340
79 C/C++ Header 1747 3203 5084
12 Assembly 548 2169 4828
22 make 916 576 3463
25 Markdown 622 34 1849
12 AsciiDoc 518 0 1682
... ... ... ... ...
1 C# 2 0 9
3433 SUM 78065 130234 487018

(19 lines were removed from the output).

That's a total of 500,000 lines of code, no "Pest", no SnakeMake and zero Brainfuck. Bad enough, 9 lines of C# kept in, that must be an error :).

Half a million lines of code for an advanced language like OCaml which ships with a bytecode interpreter and native-code generators for 5 platforms (x86-64, arm64, RISC-V, s390x and powerpc) isn't all that bad. Furthermore, it is worth noting that OCaml builds out of the box with just a C compiler and the standard buildtools like gmake, m4 etc.

Building Rust

Now let's compile Rust! As Rust 1.84.1 did not build (some kind of error message after 2 hours into the build IIRC), I tried to build Rust 1.81 instead.

Here is the build time for Rust 1.81. Sit down please:

12563 seconds, or 3 hours and 30 minutes.

OCaml builds in 197 seconds, or 3 minutes and 17 seconds. On the same machine, obviously.

That's a factor of 63 times slower than OCaml, or 162 times slower than Python, or 4753 times slower than Lua.

To be fair, the build time of Rust includes building LLVM, cargo and some other tools, and it builds Rust at least twice: stage1 is the Rust 1.81 compiler built with the Rust 1.80 bootstrap, while stage2 is using stage1 (1.81) to build itself again.

For DragonFly BSD, we use our own bootstrapping repository on github. If you have the appropriate versions of cargo, the bootstrap rustc and Python installed, then something like the following might or might not bootstrap Rust:

export LIBSSH2_NO_PKG_CONFIG=1
export LIBGIT2_NO_PKG_CONFIG=1
export LIBCURL_NO_PKG_CONFIG=1
export LIBZ_NO_PKG_CONFIG=1 
export LIBLZMA_NO_PKG_CONFIG=1
export PROFILE=release
export LIBZ_SYS_STATIC=1
export OPENSSL_NO_PKG_CONFIG=1

./configure \
        --release-channel=stable \
        --enable-cargo-native-static \
        --enable-extended \
        --enable-vendor \
        --enable-locked-deps \
        --local-rust-root=/path/to/bootstrap/compiler \
        --sysconfdir=/opt/rust/etc \
        --prefix=/opt/rust \
        --python=python \
        --disable-llvm-static-stdcpp \
        --disable-docs

python x.py build --config ./config.toml
python x.py dist --config ./config.toml
python x.py install --config ./config.toml

Who was first? The chicken or the egg?

Not only takes the build 3 hours and 30 minutes, but there is another problem:

For Rust, there currently is no bootstrap compiler written in any other language than Rust, so you need to have a Rust compiler... in order to compile the Rust compiler... in order to compile the Rust compiler... in order to compile the Rust compiler... in order... at this point a stack overflow stops our beautiful infinite recursion.

To be fair, it's not a truely infinite recursion, because there used to be a Rust compiler written in OCaml, but that was over a decade ago, long before I started using Rust back in 2013.

Let's imagine that you'd really want to start from this early version of Rust, for which the compiler was written in OCaml, and compile each version of Rust, one after the other, until you reach the current version of Rust. Theoretically possible. My rough guess would be that you would have to compile a hundred intermediate Rust compilers, which would presumably keep a powerful build machine busy for 10 days or more, 24/7, not including fixing bugs and applying patches after a failed attempt. Impractical.

The n+1 problem of Rust

Note that a significant part of the problem, and why bootstrapping Rust is so expensive, is that each version of Rust must be bootstrapped with exactly the previous version.

Maybe that isn't true anymore, but it used to be like this in the past. In order to build Rust version n+1, you need Rust version n.

At this point, it might helps to take a look at how other languages do the bootstrapping process.

How other languages bootstrap themselves

The Zig language somehow manages to ship a 1.3 million lines long ANSI C file zig1.c, which contains the Zig compiler and LLVM. I think, they compile the Zig compiler written in Zig nowadays to WASM and from there to C. Clever. Just don't attempt to open that file in vim. Once they get rid of LLVM, a lot of the bloat will be gone.

OCaml on the other hand comes with a bytecode interpreter implemented in C (ocamlrun). This allows them to ship a 3.4M sized, portable bootstrap compiler as bytecode binary (see boot/ocamlc), which is then used to bootstrap the OCaml compiler. The process is well documented in OCaml's BOOTSTRAP.adoc.

The compiler for the Go language is also written in Go itself, but Go is less strict with the version of the Go compiler that you need in order to bootstrap another Go compiler. For example, Go 1.24 and 1.25 require a Go compiler of version 1.22. Furthermore, Go 1.4 was written in C, which you can then use to bootstrap newer Go compilers. Even more important, Go compiles in about 3 minutes. This makes it a lot easier in case things break and you need to fix them.

Languages like Python, Ruby or Lua are all implemented in C - they don't need to bootstrap themself.

The Erlang/OTP runtime system erts is implemented in C/C++ and uses a bytecode interpreter (and JIT) to run BEAM bytecode. When you download the sources of Erlang/OTP, this includes precompiled bytecode for the Erlang compiler and everything needed to bootstrap Erlang.

Elixir, a language that runs on the Erlang/OTP platform, is a bit different. Its compiler is implemented in Erlang and not in Elixir. With just Erlang/OTP installed you are ready to bootstrap Elixir.

Bootstrapping Rust without a binary

But what if, for some reason, there is no binary Rust bootstrap compiler available for your platform?

Well, then you have to cross-compile the Rust compiler rustc using an existing binary bootstrap compiler running on another system. I've done it once, over 10 years ago, for DragonFly. It's not exactly fun.

It's important to note here that you'd still need a binary bootstrap compiler for the other platform in order to cross-compile Rust for your system. There is just no feasible way around it, atm.

No binary bootstrap, no Rust

To summarise, it's almost impossible to compile the official Rust distribution without downloading an existing binary Rust bootstrap compiler to my very best knowledge. I am happy to hear otherwise.

Personally, I am not too concerned about that situation, but if I were a software developer from Russia, China, North Korea or Iran, I would be slightly more concerned of having to download a binary Rust compiler from a remote U.S. server. What could go possibly wrong?

Conclusion - Rust is bloated

Size-wise, the "official" Rust implementation in version 1.84.1 is quite "bloated":

Compare this to OCaml version 5.4.0:

Furthermore, you can't build Rust without a binary bootstrap compiler. I hope this will change once the GCC Front-End for Rust becomes more mature.

Both the binary bootstrap problem and the large amount of dependencies Rust relies on make Rust — in my opinion — not the first choice for systems programming tasks where ideally you want as few dependencies as possible. If you're developing fundamental tools upon which other, more complex things will be build, the foundation shouldn't be a thousand times more complex than what's built upon it. Do you agree?

That's much less of an issue when building large-scale enterprise applications or propriatary, embedded and safety-critical products.

Personally, I am not exactly convinced to download a 699 MB compressed tarball, extract it to become 1.9 GB, to then compile Rust for the next three and a half hours just to be able to use:

My advice to you: Don't blindly use Rust for everything just because it's currently popular. Think carefully about whether Rust is really the right tool for the task at hand. There are many good alternatives:

Good night.