Bootstrapping Rust Considered Harmful
Michael NeumannIn this article, I want to "build" Rust from source. Why, you might ask? Well, maybe, because it's not available for my operating system, maybe because I don't trust binary downloads, maybe purely out of curiosity, or maybe for some other reason, who knows.
OCaml is used in this article here and there as reference, not because it is
"better", no, rather because it is similar in it's complexity and it is the
most streamlined and most beautiful bootstrapping experience I've every had
with any language: it just cleanly builds using ./configure && make && make
install.
Warning: This is a "rant" about Rust! Not the language, but it's current implementation. More precisely, it's a rant about the bootstrapping process of Rust's official implementation (not the gcc one) and the huge amount of dependencies that are required to build Rust from source. If you write "minimalist" software - anything that runs on the terminal - please, and I repeat myself, please consider the possibility of using a less bloated language - or wait for a more lightweight "gcc" version of Rust. Thank you!
TL;DR
Bootstrapping Rust:
- Version 1.84.1
- 699 MB source code (compressed)
- 1.9 GB source code (uncompressed)
- 50 million lines of code (total)
- 20 million lines of Rust code
- 3 hours and 30 minutes build time
- Mandates binary bootstrap compiler!
Let the journey begin...
Quoting Rust's README.md about "Installing from Source":
If you really want to install from source (though this is not recommended), see INSTALL.md.
Despite this warning, I download the sources for Rust 1.84.1
rustc-1.84.1-src.tar.gz. The compressed tarball is negligible 699 MB in size.
Hey, would that still fit on a single CD-ROM? No, it would not! Just for
comparison, OCaml's 5.4.0 sources are 6.0 MB compressed, two orders of
magnitudes, I here you cry! But keep in mind that the 699 MB contain all the
sources required to build Rust, right? Well, the sources, yes, except the
binary Rust compiler of course that is still required to build Rust from the
sources!!! Pardon, what? Isn't there a chicken and egg problem here? And,
btw, who was first, the chicken or the egg? Nevermind, we'll talk about
that later on :)
So let's continue and unpack this big beautiful tarball then. 2 minutes and 14
seconds later, tar xvzf rustc-1.84.1-src.tar.gz finishes. Side note: That's
roughly one minute quicker than it would take to build the whole OCaml 5.4.0 toolchain on the same machine. The uncompressed source tree is now at 1.9 GB. That's
already more bytes than living Indians on this planet.
Let's continue and open up README.md again, this time from the source tree
that we just now uncompressed. It still states that "Building From Source [...]
is not recommended [...] see INSTALL.md", but sadly, that INSTALL.md it
refers to didn't make it into the tarball. Well, we've already got 1.9 GB worth
of source code, so clearly there is no space left for the install document -
Prioritiiiiiies! We've told you that it's not recommended!
Counting lines of code...
Before starting the build, let's "quickly" run cloc on the Rust source
tree...
$ time cloc --csv rustc-1.84.1-src > rust.csv
github.com/AlDanial/cloc v 2.00 T=605.91 s (456.0 files/s 112003.3 lines/s)
394.925u 46.034s 10:34.53 69.4% 3+65k 2031302+0io 160pf+0w
Quickly as in: it's still counting the files... still counting... aaaaand... still
counting .... over a minute has passed by and it says 351365 text files... now
it seems to count unique files... another two minutes later it says 276326
unique files... wow... and it's still counting, whatever it is counting, maybe lines
this time... holy cow, the CPU fan now goes full speed, 7 minutes
have passed... to be fair, Perl might not be the best choice to count the number of
lines of the Rust toolchain, we'd need tokei implemented in Rust for that... 8 minutes have
passed... if this takes longer I might have to plug in the power supply...
9 minutes... and 3-2-1... last offer, 10 minutes and we are done! That was rather
quick, wasn't it? Apart from getting a ton of "Line count, exceeded
timeout:" lines and 81443 files ignored, the cloc output is truely impressive:
| files | language | blank | comment | code |
| 68005 | Rust | 1251355 | 2413958 | 20734039 |
| 51146 | C++ | 1435178 | 2396405 | 7685767 |
| 71995 | C | 1083792 | 2936843 | 5724464 |
| 3312 | Text | 274228 | 0 | 3353959 |
| 15054 | C/C++ Header | 476041 | 831091 | 2065631 |
| 2039 | JSON | 52 | 0 | 1406718 |
| 6554 | YAML | 76744 | 70844 | 1033839 |
| 5553 | Markdown | 223944 | 1400 | 995557 |
| 71 | PO File | 340972 | 454044 | 951438 |
| 6666 | Ada | 291806 | 398865 | 835581 |
| 5358 | Go | 103432 | 191999 | 779155 |
| 5181 | D | 119923 | 164987 | 695575 |
| 796 | Bourne Shell | 96620 | 84362 | 539011 |
| 694 | TableGen | 76878 | 63577 | 513135 |
| 302 | XML | 7787 | 1547 | 443425 |
| 443 | HTML | 5643 | 469 | 433596 |
| 3696 | Assembly | 51297 | 124350 | 338918 |
| 2281 | Python | 54211 | 63000 | 237444 |
| 419 | Perl | 31718 | 32773 | 236651 |
| 7638 | Fortran 90 | 42132 | 73776 | 226970 |
| 2017 | reStructuredText | 94072 | 107005 | 183347 |
| 372 | m4 | 18000 | 9380 | 162638 |
| 37 | CSV | 0 | 0 | 148576 |
| 810 | Windows Module Definition | 28089 | 655 | 134374 |
| 2934 | TOML | 21031 | 18864 | 133014 |
| 2462 | Objective-C | 26074 | 38254 | 87976 |
| 1650 | CMake | 12849 | 13076 | 81877 |
| 1556 | LLVM IR | 20101 | 48644 | 71235 |
| 373 | SVG | 805 | 394 | 55486 |
| 685 | diff | 4787 | 59573 | 48275 |
| 443 | Ruby | 8167 | 7820 | 44200 |
| 547 | Expect | 10126 | 19413 | 40151 |
| 807 | Objective-C++ | 9658 | 8347 | 35985 |
| 642 | make | 6043 | 4828 | 31990 |
| 659 | Fortran 77 | 2134 | 6400 | 27925 |
| 267 | JavaScript | 2590 | 3463 | 27093 |
| 159 | PHP | 2280 | 4584 | 23979 |
| 210 | Logos | 2693 | 993 | 16549 |
| 7 | TeX | 1734 | 7187 | 15174 |
| 287 | WebAssembly | 114 | 132 | 14230 |
| 61 | Pascal | 5089 | 31985 | 13658 |
| 2 | yacc | 1283 | 308 | 13625 |
| 329 | OpenCL | 3102 | 7517 | 9681 |
| 59 | CSS | 1410 | 690 | 9518 |
| 76 | TypeScript | 1192 | 539 | 8865 |
| 30 | awk | 787 | 935 | 8557 |
| 49 | Visual Studio Solution | 12 | 42 | 7979 |
| 258 | CUDA | 2720 | 8882 | 6645 |
| 48 | Freemarker Template | 2269 | 0 | 6123 |
| 60 | DOS Batch | 1058 | 987 | 5782 |
| 39 | OCaml | 1488 | 2419 | 5343 |
| 139 | SWIG | 1120 | 29 | 5151 |
| 49 | Bourne Again Shell | 821 | 1272 | 5032 |
| 8 | MSBuild script | 1 | 7 | 4929 |
| 43 | GLSL | 1042 | 1146 | 4775 |
| 39 | Vuejs Component | 420 | 156 | 4582 |
| 173 | Dockerfile | 838 | 495 | 3973 |
| 28 | Lisp | 445 | 368 | 3960 |
| 193 | HLSL | 1636 | 5212 | 3826 |
| 124 | Fortran 95 | 1244 | 3557 | 3387 |
| 134 | Windows Resource File | 213 | 192 | 1838 |
| 3 | AsciiDoc | 314 | 102 | 1826 |
| 8 | Clean | 26 | 0 | 1548 |
| 16 | C# | 311 | 590 | 1417 |
| 2 | Cython | 443 | 277 | 1276 |
| 21 | Handlebars | 139 | 68 | 1105 |
| 3 | zsh | 25 | 23 | 1042 |
| 6 | PowerShell | 24 | 18 | 992 |
| 19 | Pest | 226 | 289 | 849 |
| 2 | Fish Shell | 8 | 8 | 743 |
| 5 | Bazel | 64 | 7 | 729 |
| 1 | XHTML | 122 | 34 | 701 |
| 4 | Jupyter Notebook | 0 | 14354 | 687 |
| 1 | IDL | 0 | 0 | 643 |
| 17 | vim script | 111 | 140 | 597 |
| 8 | XSLT | 93 | 31 | 559 |
| 9 | Protocol Buffers | 94 | 89 | 519 |
| 2 | Snakemake | 91 | 168 | 487 |
| 3 | WiX source | 42 | 32 | 351 |
| 6 | Lua | 46 | 21 | 342 |
| 16 | Puppet | 58 | 159 | 341 |
| 1 | Visual Basic Script | 30 | 60 | 341 |
| 2 | Gencat NLS | 4 | 0 | 262 |
| 4 | Linker Script | 54 | 43 | 225 |
| 1 | Standard ML | 34 | 28 | 215 |
| 2 | SQL | 61 | 0 | 193 |
| 1 | lex | 34 | 30 | 160 |
| 14 | INI | 30 | 0 | 135 |
| 40 | Haskell | 18 | 0 | 134 |
| 6 | Nix | 12 | 5 | 116 |
| 3 | NAnt script | 17 | 0 | 113 |
| 1 | TNSDL | 3 | 0 | 113 |
| 14 | MATLAB | 20 | 0 | 101 |
| 2 | Mathematica | 23 | 0 | 100 |
| 1 | JSON5 | 0 | 4 | 94 |
| 2 | DTD | 18 | 8 | 81 |
| 1 | Julia | 21 | 89 | 62 |
| 1 | Mako | 0 | 0 | 40 |
| 1 | C# Generated | 8 | 23 | 32 |
| 1 | SAS | 14 | 22 | 32 |
| 1 | SCSS | 7 | 0 | 32 |
| 1 | Meson | 9 | 2 | 26 |
| 1 | R | 3 | 0 | 20 |
| 2 | Swift | 6 | 0 | 17 |
| 1 | AppleScript | 3 | 8 | 16 |
| 1 | Brainfuck | 3 | 4 | 10 |
| 1 | sed | 0 | 0 | 5 |
| 276326 | SUM | 6345959 | 10746776 | 50771605 |
This truely is a remarkable list of languages! But what the hell is "Pest" and "SnakeMake"??? Haha, at least there are 10 lines of Brainfuck!
Okay, that's a total of 20 million lines of Rust code excluding comments and blank lines, in a total of 68005 Rust files, right? And a grand total of 50 million lines of "code" in 276326 files. Impressive! And the list is even incomplete as it was running into several "timeouts".
Counting lines of code... OCaml
Just for reference, let's quickly - quickly as in 3 seconds - count the lines of code of the OCaml 5.4.0 toolchain:
$ time cloc --csv ocaml-5.4.0 > ocaml.csv
github.com/AlDanial/cloc v 2.00 T=3.94 s (871.3 files/s 176467.5 lines/s)
2.753u 0.359s 0:03.38 91.7% 4+68k 9322+0io 0pf+0w
| files | language | blank | comment | code |
| 2825 | OCaml | 57092 | 106578 | 366954 |
| 307 | C | 8478 | 10140 | 47737 |
| 70 | Bourne Shell | 5718 | 6223 | 35216 |
| 12 | m4 | 1481 | 108 | 12340 |
| 79 | C/C++ Header | 1747 | 3203 | 5084 |
| 12 | Assembly | 548 | 2169 | 4828 |
| 22 | make | 916 | 576 | 3463 |
| 25 | Markdown | 622 | 34 | 1849 |
| 12 | AsciiDoc | 518 | 0 | 1682 |
| 1 | CSV | 0 | 0 | 1512 |
| 3 | SCSS | 212 | 106 | 1415 |
| 6 | Python | 232 | 263 | 986 |
| 15 | TeX | 153 | 290 | 983 |
| 9 | YAML | 75 | 98 | 975 |
| 7 | Bourne Again Shell | 60 | 133 | 338 |
| 1 | SVG | 0 | 0 | 330 |
| 5 | awk | 42 | 111 | 278 |
| 3 | JavaScript | 48 | 137 | 273 |
| 9 | Text | 15 | 0 | 220 |
| 1 | DOS Batch | 21 | 12 | 128 |
| 1 | TNSDL | 39 | 0 | 127 |
| 1 | CSS | 5 | 1 | 74 |
| 1 | Perl | 4 | 15 | 60 |
| 1 | NAnt script | 24 | 0 | 51 |
| 1 | HTML | 0 | 0 | 40 |
| 1 | diff | 5 | 37 | 35 |
| 1 | Fortran 77 | 5 | 0 | 21 |
| 1 | INI | 3 | 0 | 10 |
| 1 | C# | 2 | 0 | 9 |
| 3433 | SUM | 78065 | 130234 | 487018 |
That's a total of 500,000 lines of code, no "Pest", zero SnakeMake and zero
Brainfuck. Bad enough, 9 lines of C#, but that must be an error :). Half a
million lines of code for an advanced language like OCaml which ships with a
bytecode interpreter and native-code generators for 5 platforms (x86-64, arm64,
RISC-V, s390x and powerpc) isn't too bad. Furthermore, it builds out of the
box with just a C compiler and the standard buildtools like gmake.
Building Rust...
Now let's compile Rust. I tried to postpone that for as long as possible. As Rust 1.84.1 did not build, I tried 1.81 instead. Here is the build time for Rust 1.81. Sit down please: 12563 seconds. That's 3 hours and 30 minutes. OCaml builds in 197 seconds, or 3 minutes and 17 seconds, on the same machine. That's a factor of 63 times faster. To be fair, the build time of Rust includes building LLVM, cargo and some other tools, and it builds Rust at least twice: stage1 is Rust 1.81 built with the Rust 1.80 bootstrap, and stage2 is then using stage1 (1.81) to build itself again.
The problem with the chicken and the egg...
Not only takes the build 3 hours and 30 minutes, but there is another problem: Rust doesn't have a bootstrap compiler written in any other language than Rust, so you need to have a Rust compiler in order to compile the Rust compiler in order to compile the Rust compiler in order to compile the Rust compiler in order... at this point a stack overflow stops our beautiful infinite recursion.
To be fair, it's not a truely infinite recursion because there used to be a Rust compiler written in OCaml, but that was over a decade ago. If you'd really want to start from this early version of Rust and compile each version of Rust, one after the other, until you hit the current version, you'd likely have to compile a hundred intermediate Rust compilers, which would presumably keep a powerful build machine busy for 10 days or more, 24/7, not including fixing bugs and applying patches after a failed attempt.
The n+1 problem...
Note that the real problem here is that each version of Rust needs to be bootstrapped with the previous version. Maybe that isn't true anymore, but it used to be like this in the past. In order to build Rust version n+1, you'd need Rust version n.
It helps to take a look at how other languages do the bootstrapping process.
The compiler for the Go language, for example, is also written in Go itself.
But, there is a slight difference here. It would just build fine (I guess) with
any Go version recent enough. So, in order to do the bootstrapping, you'd pick
an earlier version of the Go compiler written in C (gccgo for example) and
use it to compile the latest version of Go. Done! This might still be a lot of
work, but at least, you can compile Go from source, given that you have a
binary C compiler.
As for OCaml: You only need a C compiler, not even C++ is required, in order to compile the full OCaml toolchain. Other languages, like Python, are similar in that regard. Most commonly, they are implemented in C or C++.
Bootstrapping without a binary...
But what if, for some reason, you don't have this binary Rust bootstrap compiler for your platform?
Well, then you have to cross-compile it using an existing binary bootstrap compiler on another system. I've done it once, over 10 years ago, for DragonFly. It's not exactly fun. But still, you'd need a binary bootstrap compiler for the other system, unless you'd do the impossible and start bootstrapping Rust with this very ancient version of Rust written in OCaml.
No binary, no Rust...
In summary, it's nearly impossible to compile the official Rust distribution without downloading an existing binary Rust bootstrap compiler to my very best knowledge. I am happy to hear otherwise.
Personally, I am not too concerned about that situation, but if I would be a software developer from Russia, China, North Korea or Iran, I would possibly be slightly more concerned having to download Rust binaries from a remote U.S. server - what could go possibly wrong? Not that I share any sympathy with any autocratic regime... that's not the point.
Does size still matter after all?
Rust ("official" implementation 1.84.1):
- Rust compiler ships with 20 million lines of Rust source code (50 million total)
- Build takes 3 hours and 30 minutes
- 1.9 GB sources (uncompressed), 4 GB installed binaries
- Impossible (almost) to build without binary bootstrap compiler (of particular version)
- "binary" mentality (for obvious reasons)
OCaml (5.4.0):
- 500,000 lines of code (total)
- Build takes 3 minutes and 17 seconds
- 31 MB sources (uncompressed), 300 MB installed binaries
- Only C compiler required to bootstrap
- "from source" mentality
While disk-space is plenty these days, and Rust really is a nice language to work with, everyone using Rust should be more concious about this massive bloat and dependency chain, and the potential risks that come with all of that. At this point, the daily TV advertisment spot drops in and suggests anti-bloat pills. Good timing!
Bloated... or just enterprisey?
Or maybe Rust just started to become enterprisey? Building large-scale enterprise applications using a dependency-"rich" language is fine, in my opinion. But, I am not sure, if I really want to install 1.9 GB sources, compile Rust for 3 hours and 30 minutes, just to be able to build:
- the latest and greatest console text editor
- an extremly fast Python package manager
- a memory-safe
cator other "core-utils" - Tailwind CSS
- the next-level Ruby JIT
Good night.