Bootstrapping Rust Considered Harmful

In this article, I want to "build" Rust from source. Why, you might ask? Well, maybe, because it's not available for my operating system, maybe because I don't trust binary downloads, maybe purely out of curiosity, or maybe for some other reason, who knows.

OCaml is used in this article here and there as reference, not because it is "better", no, rather because it is similar in it's complexity and it is the most streamlined and most beautiful bootstrapping experience I've every had with any language: it just cleanly builds using ./configure && make && make install.

Warning: This is a "rant" about Rust! Not the language, but it's current implementation. More precisely, it's a rant about the bootstrapping process of Rust's official implementation (not the gcc one) and the huge amount of dependencies that are required to build Rust from source. If you write "minimalist" software - anything that runs on the terminal - please, and I repeat myself, please consider the possibility of using a less bloated language - or wait for a more lightweight "gcc" version of Rust. Thank you!

TL;DR

Bootstrapping Rust:

Let the journey begin...

Quoting Rust's README.md about "Installing from Source":

If you really want to install from source (though this is not recommended), see INSTALL.md.

Despite this warning, I download the sources for Rust 1.84.1 rustc-1.84.1-src.tar.gz. The compressed tarball is negligible 699 MB in size. Hey, would that still fit on a single CD-ROM? No, it would not! Just for comparison, OCaml's 5.4.0 sources are 6.0 MB compressed, two orders of magnitudes, I here you cry! But keep in mind that the 699 MB contain all the sources required to build Rust, right? Well, the sources, yes, except the binary Rust compiler of course that is still required to build Rust from the sources!!! Pardon, what? Isn't there a chicken and egg problem here? And, btw, who was first, the chicken or the egg? Nevermind, we'll talk about that later on :)

So let's continue and unpack this big beautiful tarball then. 2 minutes and 14 seconds later, tar xvzf rustc-1.84.1-src.tar.gz finishes. Side note: That's roughly one minute quicker than it would take to build the whole OCaml 5.4.0 toolchain on the same machine. The uncompressed source tree is now at 1.9 GB. That's already more bytes than living Indians on this planet.

Let's continue and open up README.md again, this time from the source tree that we just now uncompressed. It still states that "Building From Source [...] is not recommended [...] see INSTALL.md", but sadly, that INSTALL.md it refers to didn't make it into the tarball. Well, we've already got 1.9 GB worth of source code, so clearly there is no space left for the install document - Prioritiiiiiies! We've told you that it's not recommended!

Counting lines of code...

Before starting the build, let's "quickly" run cloc on the Rust source tree...

$ time cloc --csv rustc-1.84.1-src > rust.csv
github.com/AlDanial/cloc v 2.00  T=605.91 s (456.0 files/s 112003.3 lines/s)
394.925u 46.034s 10:34.53 69.4% 3+65k 2031302+0io 160pf+0w

Quickly as in: it's still counting the files... still counting... aaaaand... still counting .... over a minute has passed by and it says 351365 text files... now it seems to count unique files... another two minutes later it says 276326 unique files... wow... and it's still counting, whatever it is counting, maybe lines this time... holy cow, the CPU fan now goes full speed, 7 minutes have passed... to be fair, Perl might not be the best choice to count the number of lines of the Rust toolchain, we'd need tokei implemented in Rust for that... 8 minutes have passed... if this takes longer I might have to plug in the power supply... 9 minutes... and 3-2-1... last offer, 10 minutes and we are done! That was rather quick, wasn't it? Apart from getting a ton of "Line count, exceeded timeout:" lines and 81443 files ignored, the cloc output is truely impressive:

files language blank comment code
68005 Rust 1251355 2413958 20734039
51146 C++ 1435178 2396405 7685767
71995 C 1083792 2936843 5724464
3312 Text 274228 0 3353959
15054 C/C++ Header 476041 831091 2065631
2039 JSON 52 0 1406718
6554 YAML 76744 70844 1033839
5553 Markdown 223944 1400 995557
71 PO File 340972 454044 951438
6666 Ada 291806 398865 835581
5358 Go 103432 191999 779155
5181 D 119923 164987 695575
796 Bourne Shell 96620 84362 539011
694 TableGen 76878 63577 513135
302 XML 7787 1547 443425
443 HTML 5643 469 433596
3696 Assembly 51297 124350 338918
2281 Python 54211 63000 237444
419 Perl 31718 32773 236651
7638 Fortran 90 42132 73776 226970
2017 reStructuredText 94072 107005 183347
372 m4 18000 9380 162638
37 CSV 0 0 148576
810 Windows Module Definition 28089 655 134374
2934 TOML 21031 18864 133014
2462 Objective-C 26074 38254 87976
1650 CMake 12849 13076 81877
1556 LLVM IR 20101 48644 71235
373 SVG 805 394 55486
685 diff 4787 59573 48275
443 Ruby 8167 7820 44200
547 Expect 10126 19413 40151
807 Objective-C++ 9658 8347 35985
642 make 6043 4828 31990
659 Fortran 77 2134 6400 27925
267 JavaScript 2590 3463 27093
159 PHP 2280 4584 23979
210 Logos 2693 993 16549
7 TeX 1734 7187 15174
287 WebAssembly 114 132 14230
61 Pascal 5089 31985 13658
2 yacc 1283 308 13625
329 OpenCL 3102 7517 9681
59 CSS 1410 690 9518
76 TypeScript 1192 539 8865
30 awk 787 935 8557
49 Visual Studio Solution 12 42 7979
258 CUDA 2720 8882 6645
48 Freemarker Template 2269 0 6123
60 DOS Batch 1058 987 5782
39 OCaml 1488 2419 5343
139 SWIG 1120 29 5151
49 Bourne Again Shell 821 1272 5032
8 MSBuild script 1 7 4929
43 GLSL 1042 1146 4775
39 Vuejs Component 420 156 4582
173 Dockerfile 838 495 3973
28 Lisp 445 368 3960
193 HLSL 1636 5212 3826
124 Fortran 95 1244 3557 3387
134 Windows Resource File 213 192 1838
3 AsciiDoc 314 102 1826
8 Clean 26 0 1548
16 C# 311 590 1417
2 Cython 443 277 1276
21 Handlebars 139 68 1105
3 zsh 25 23 1042
6 PowerShell 24 18 992
19 Pest 226 289 849
2 Fish Shell 8 8 743
5 Bazel 64 7 729
1 XHTML 122 34 701
4 Jupyter Notebook 0 14354 687
1 IDL 0 0 643
17 vim script 111 140 597
8 XSLT 93 31 559
9 Protocol Buffers 94 89 519
2 Snakemake 91 168 487
3 WiX source 42 32 351
6 Lua 46 21 342
16 Puppet 58 159 341
1 Visual Basic Script 30 60 341
2 Gencat NLS 4 0 262
4 Linker Script 54 43 225
1 Standard ML 34 28 215
2 SQL 61 0 193
1 lex 34 30 160
14 INI 30 0 135
40 Haskell 18 0 134
6 Nix 12 5 116
3 NAnt script 17 0 113
1 TNSDL 3 0 113
14 MATLAB 20 0 101
2 Mathematica 23 0 100
1 JSON5 0 4 94
2 DTD 18 8 81
1 Julia 21 89 62
1 Mako 0 0 40
1 C# Generated 8 23 32
1 SAS 14 22 32
1 SCSS 7 0 32
1 Meson 9 2 26
1 R 3 0 20
2 Swift 6 0 17
1 AppleScript 3 8 16
1 Brainfuck 3 4 10
1 sed 0 0 5
276326 SUM 6345959 10746776 50771605

This truely is a remarkable list of languages! But what the hell is "Pest" and "SnakeMake"??? Haha, at least there are 10 lines of Brainfuck!

Okay, that's a total of 20 million lines of Rust code excluding comments and blank lines, in a total of 68005 Rust files, right? And a grand total of 50 million lines of "code" in 276326 files. Impressive! And the list is even incomplete as it was running into several "timeouts".

Counting lines of code... OCaml

Just for reference, let's quickly - quickly as in 3 seconds - count the lines of code of the OCaml 5.4.0 toolchain:

$ time cloc --csv ocaml-5.4.0 > ocaml.csv
github.com/AlDanial/cloc v 2.00  T=3.94 s (871.3 files/s 176467.5 lines/s)
2.753u 0.359s 0:03.38 91.7% 4+68k 9322+0io 0pf+0w
files language blank comment code
2825 OCaml 57092 106578 366954
307 C 8478 10140 47737
70 Bourne Shell 5718 6223 35216
12 m4 1481 108 12340
79 C/C++ Header 1747 3203 5084
12 Assembly 548 2169 4828
22 make 916 576 3463
25 Markdown 622 34 1849
12 AsciiDoc 518 0 1682
1 CSV 0 0 1512
3 SCSS 212 106 1415
6 Python 232 263 986
15 TeX 153 290 983
9 YAML 75 98 975
7 Bourne Again Shell 60 133 338
1 SVG 0 0 330
5 awk 42 111 278
3 JavaScript 48 137 273
9 Text 15 0 220
1 DOS Batch 21 12 128
1 TNSDL 39 0 127
1 CSS 5 1 74
1 Perl 4 15 60
1 NAnt script 24 0 51
1 HTML 0 0 40
1 diff 5 37 35
1 Fortran 77 5 0 21
1 INI 3 0 10
1 C# 2 0 9
3433 SUM 78065 130234 487018

That's a total of 500,000 lines of code, no "Pest", zero SnakeMake and zero Brainfuck. Bad enough, 9 lines of C#, but that must be an error :). Half a million lines of code for an advanced language like OCaml which ships with a bytecode interpreter and native-code generators for 5 platforms (x86-64, arm64, RISC-V, s390x and powerpc) isn't too bad. Furthermore, it builds out of the box with just a C compiler and the standard buildtools like gmake.

Building Rust...

Now let's compile Rust. I tried to postpone that for as long as possible. As Rust 1.84.1 did not build, I tried 1.81 instead. Here is the build time for Rust 1.81. Sit down please: 12563 seconds. That's 3 hours and 30 minutes. OCaml builds in 197 seconds, or 3 minutes and 17 seconds, on the same machine. That's a factor of 63 times faster. To be fair, the build time of Rust includes building LLVM, cargo and some other tools, and it builds Rust at least twice: stage1 is Rust 1.81 built with the Rust 1.80 bootstrap, and stage2 is then using stage1 (1.81) to build itself again.

The problem with the chicken and the egg...

Not only takes the build 3 hours and 30 minutes, but there is another problem: Rust doesn't have a bootstrap compiler written in any other language than Rust, so you need to have a Rust compiler in order to compile the Rust compiler in order to compile the Rust compiler in order to compile the Rust compiler in order... at this point a stack overflow stops our beautiful infinite recursion.

To be fair, it's not a truely infinite recursion because there used to be a Rust compiler written in OCaml, but that was over a decade ago. If you'd really want to start from this early version of Rust and compile each version of Rust, one after the other, until you hit the current version, you'd likely have to compile a hundred intermediate Rust compilers, which would presumably keep a powerful build machine busy for 10 days or more, 24/7, not including fixing bugs and applying patches after a failed attempt.

The n+1 problem...

Note that the real problem here is that each version of Rust needs to be bootstrapped with the previous version. Maybe that isn't true anymore, but it used to be like this in the past. In order to build Rust version n+1, you'd need Rust version n.

It helps to take a look at how other languages do the bootstrapping process. The compiler for the Go language, for example, is also written in Go itself. But, there is a slight difference here. It would just build fine (I guess) with any Go version recent enough. So, in order to do the bootstrapping, you'd pick an earlier version of the Go compiler written in C (gccgo for example) and use it to compile the latest version of Go. Done! This might still be a lot of work, but at least, you can compile Go from source, given that you have a binary C compiler.

As for OCaml: You only need a C compiler, not even C++ is required, in order to compile the full OCaml toolchain. Other languages, like Python, are similar in that regard. Most commonly, they are implemented in C or C++.

Bootstrapping without a binary...

But what if, for some reason, you don't have this binary Rust bootstrap compiler for your platform?

Well, then you have to cross-compile it using an existing binary bootstrap compiler on another system. I've done it once, over 10 years ago, for DragonFly. It's not exactly fun. But still, you'd need a binary bootstrap compiler for the other system, unless you'd do the impossible and start bootstrapping Rust with this very ancient version of Rust written in OCaml.

No binary, no Rust...

In summary, it's nearly impossible to compile the official Rust distribution without downloading an existing binary Rust bootstrap compiler to my very best knowledge. I am happy to hear otherwise.

Personally, I am not too concerned about that situation, but if I would be a software developer from Russia, China, North Korea or Iran, I would possibly be slightly more concerned having to download Rust binaries from a remote U.S. server - what could go possibly wrong? Not that I share any sympathy with any autocratic regime... that's not the point.

Does size still matter after all?

Rust ("official" implementation 1.84.1):

OCaml (5.4.0):

While disk-space is plenty these days, and Rust really is a nice language to work with, everyone using Rust should be more concious about this massive bloat and dependency chain, and the potential risks that come with all of that. At this point, the daily TV advertisment spot drops in and suggests anti-bloat pills. Good timing!

Bloated... or just enterprisey?

Or maybe Rust just started to become enterprisey? Building large-scale enterprise applications using a dependency-"rich" language is fine, in my opinion. But, I am not sure, if I really want to install 1.9 GB sources, compile Rust for 3 hours and 30 minutes, just to be able to build:

Good night.