diff --git a/README.md b/README.md deleted file mode 100644 index 331c55c..0000000 --- a/README.md +++ /dev/null @@ -1,402 +0,0 @@ - - -# live-bootstrap - -An attempt to provide a reproducible, automatic, complete end-to-end bootstrap -from a minimal number of binary seeds to a supported fully functioning operating -system. - -## Get me started! - -1. `git clone https://github.com/fosslinux/live-bootstrap` -2. `git submodule update --init --recursive` -3. Provide a kernel (vmlinuz file) as the name kernel in the root of the repository. -4. `./rootfs.sh` - ensure your account has kvm privileges and qemu installed. - a. Alternatively, run `./rootfs.sh chroot` to run it in a chroot. - b. Alternatively, run `./rootfs.sh` but don't run the actual virtualization - and instead copy sysa/tmp/initramfs.igz to a USB or some other device and - boot from bare metal. -6. Wait. -7. Currently, live-bootstrap doesn't provide anything to you, as it is incomplete. - -## Background - -This project is a part of the bootstrappable project, a project that aims to be -able to build complete computing platforms through the use of source code. When -you build a compiler like GCC, you need another C compiler to compile the -compiler - turtles all the way down. Even the first GCC compiler was written in -C. There has to be a way to break the chain... - -There has been significant work on this over the last 5 years, from Jeremiah -Orians' stage0, hex2 and M2-Planet to janneke's Mes. We have a currently, -fully-functioning chain of bootstrapping from the 357-byte hex0 seed to a -complete GCC compiler and hence a full Linux operating system. From there, it is -trivial to move to other UNIXes. However, there is only currently one vector -through which this can be automatically done, GNU Guix. - -While the primary author of this project does not believe Guix is a bad project, -the great reliance on Guile, the complexity of many of the scripts and the -rather steep learning curve to install and run Guix make it a very non -plug-and-play solution. Furthermore, there is currently (Jan 2021) no possible -way to run the bootstrap from outside of a pre-existing Linux environment. -Additionally, Guix uses many scripts and distributed files that cannot be -considered source code. - -(NOTE: Guix is working on a Full Source Bootstrap, but I'm not completely sure -what that entails). - -Furthermore, having an alternative bootstrap automation tool allows people to -have greater trust in the bootstrap procedure. - -## Comparison between GNU Guix and live-bootstrap - -| Item | Guix | live-bootstrap | -| -- | -- | -- | -| Total size of seeds [1] | ~30MB (Reduced Source Bootstrap) [2] | ~1KB | -| Use of kernel | Linux-Libre Kernel | Any Linux Kernel (2.6+) [3] | -| Implementation complete | Yes | No (in development) | -| Automation | Almost fully automatic | Optional user customization | - -[1]: Excluding kernel. -[2]: Reiterating that Guix is working on a full source bootstrap, although that still uses guile (~12 MB). -[3]: Work is ongoing to use other, smaller POSIX kernels. - -## Why would I want bootstrapping? - -That is outside of the scope of this README. Here's a few things you can look -at: - -- https://bootstrappable.org -- Trusting Trust Attack (as described by Ken Thompson) -- https://guix.gnu.org/manual/en/html_node/Bootstrapping.html -- Collapse of the Internet (eg CollapseOS) - -## Specific things to be bootstrapped - -GNU Guix is currently the furthest along project to automate bootstrapping. -However, there are a number of non-auditable files used in many of their -packages. Here is a list of file types that we deem unsuitable for -bootstrapping. - -1. Binaries (apart from seed hex0, kaem, kernel). -2. Any pre-generated configure scripts, or Makefile.in's from autotools. -3. Pre-generated bison/flex parsers (identifiable through a `.y` file). -4. Any source code/binaries downloaded within a software's build system that is - outside of our control to verify before use in the build system. -5. Any non-free software. [1] - -[1]: We only use software licensed under a FSF-approved free software license. - -## How does this work? - -### sysa - -sysa is the first 'system' used in live-bootstrap. We move to a new system after -a reboot, which often occurs after the movement to a new kernel. It is run by -the seed Linux kernel provided by the user, and has 16 parts. - -#### Part 1: mescc-tools-seed - -This is where all the magic begins. We start with our hex0 and kaem seeds and -bootstrap our way up to M2-Planet, a subset of C, and mes-m2, an independent -port of GNU Mes to M2-Planet. The following steps are taken here: - -- hex0 (seed) -- hex0 compiles hex1 -- hex0 compiles catm -- hex1 compiles hex2 (v1) -- hex2 (v1) compiles M0 -- M0 compiles cc_x86 -- cc_x86 compiles M2-Planet (v1) -- M2-Planet (v1) compiles blood-elf (v1) -- M2-Planet (v1) compiles hex2 (final) -- M2-Planet (v1) compiles M1 -- M2-Planet (v1) compiles kaem -- M2-Planet (v1) compiles blood-elf (final) -- M2-Planet (v1) compiles get_machine -- M2-Planet (v1) compiles M2-Planet (final) - -This seems very intimidating, but becomes clearer when reading the source: -https://github.com/oriansj/mescc-tools-seed/blob/master/x86/ (start at -mescc-tools-seed-kaem.kaem). - -From here, we can move on from the lowest level stuff. - -#### Part 2: mescc-tools-extra - -mescc-tools and mes-m2 are the projects bootstrapped by mescc-tools-seed. -However, we have some currently unmerged additions to mescc-tools that we -require for this project, namely filesystem utilities `cp` and `chown`. This -allows us to have one unified directory for our binaries. Futhermore, we also -build `fletcher16`, a preliminary checksumming tool, that we use to ensure -reproducibility and authenticity of generated binaries. - -#### Part 3: `/after` - -We now move into the `/after` directory. As mescc-tools-seed has no concept of -`chdir()` (not added until very late in mescc-tools-seed), we have to copy a lot -of files into the root of the initramfs, making it very messy. We get into the -move ordered directory `/after` here, copying over all of the required binaries -from `/`. - -#### Part 4: blynn-compiler - -`blynn-compiler` is a project on top of mescc-tools-seed to bootstrap a minimal -haskell compiler from M2-Planet. While we don't currently use this for anything, -it is planned to be eventually used to bootstrap the next part. - -#### Part 5: mes - -`mes` is a scheme interpreter. It runs the sister project `mescc`, which is a C -compiler written in scheme, which links against the Mes C Library. All 3 are -included in this same repository. Note that we are using the experimental -`wip-m2` branch to jump over the gap between `M2-Planet` and `mes`. There are -two stages to this part: - -1. Compiling an initial mes using `M2-Planet`. Note that this is *only* the Mes - interpreter, not the libc or anything else. -2. We then use this to recompile the Mes interpreter as well as building the - libc. This second interpreter is faster and less buggy. We need the libc to - compile all the programs until we get glibc. - -#### Part 6, 7: tinycc - -`tinycc` is a minimal C compiler that aims to be small and fast. It complies -with all C89 and most of C99 standards. This is also a two-tiered process: - -1. First, we compile janneke's fork of tcc 0.9.26 using `mescc`, containing 27 - patches to make it operate well in the bootstrap environment and make it - compilable using `mescc`. This is a non-trivial process and as seen within - tcc.kaem has many different parts within it: a. tcc 0.9.26 is first compiled - using `mescc`. b. The mes libc is recompiled using tcc (`mescc` has a - non-standard `.a` format), including some additions for later programs. c. - tcc 0.9.26 is recompiled 5(!) times to add new features that are required for - other features, namely `long long` and `float`. Each time, the libc is also - recompiled. -2. Then we compile upstream tcc 0.9.27, the latest release of tinycc, using the - final version of tcc 0.9.26. We then recompile the libc once more. - -From this point onwards, until further notice, all programs are compiled using -tinycc 0.9.27. Note that now we begin to delve into the realm of old GNU -software, using older versions compilable by tinycc. Prior to this point, all -tools have been adapted significantly for the bootstrap; now, we will be using -old tooling instead. - -#### Part 8: sed 4.0.7 - -You are most likely aware of GNU `sed`, a line editor. - -#### Part 9: tar 1.12 - -GNU `tar` is the most common archive format used by software source code, often -compressed also. To avoid continuing using submodules, we build GNU tar 1.12, -the last version compilable by tinycc without significant patching. - -#### Part 10: gzip 1.2.4 - -`gzip` is the most common compression format used for software source code. It -is luckily distributed as a barebones uncompressed `.tar`, which we extract and -then build. We do require deletion of a few lines unsupported by mes libc. - -Going forward, we can now use `.tar.gz` for source code. - -#### Part 11: patch 2.5.9 - -`patch` is a very useful tool at this stage, allowing us to make significantly -more complex edits, including just changes to lines. Luckily, we are able to -patch patch using sed only. - -#### Part 12: sha-2 - -`sha-2` is a standalone external `sha256sum` implementation, originally as a -library, but patched to have a command line interface. It is mostly -output-compatible with `sha256sum` from coreutils. We use this in replacement of -`fletcher16`. - -#### Part 12a: Redo checksums using `sha256sum` - -We have now just built `sha256sum`, which has a 16x lower collision rate than -`fletcher16`, so we recheck all of the existing binaries using `sha256sum`. - -#### Part 13: patched mes-libc - -Since patch is available at this point, we can apply additional fixes to -mes-libc that are not included in the wip-m2 branch and recompile libc. - -#### Part 14: patched tinycc - -In Guix, tinycc is patched to force static linking. Prior to this step, we have -been forced to manually specify static linking for each tool. Now that we have -patch, we can patch tinycc to force static linking and then recompile it. - -Note that we have to do this using tinycc 0.9.26, as tinycc 0.9.27 cannot -recompile itself for unknown reasons. - -#### Part 15: make 3.80 - -GNU `make` is now built so we have a more robust building system. `make` allows -us to do things like define rules for files rather than writing complex kaem -scripts. - -#### Part 16: bzip2 1.0.8 - -`bzip2` is a compression format that compresses more than `gzip`. It is -preferred where we can use it, and makes source code sizes smaller. - -#### Part 17: coreutils 5.0.0 - -GNU Coreutils is a collection of widely used utilities such as `cat`, `chmod`, -`chown`, `cp`, `install`, `ln`, `ls`, `mkdir`, `mknod`, `mv`, `rm`, `rmdir`, -`tee`, `test`, `true`, and many others. - -A few of the utilities cannot be easily compiled with Mes C library, so we skip -them. - -The `cp` in this stage replaces the `mescc-tools-extra` `cp`. - -#### Part 18: heirloom devtools - -`lex` and `yacc` from the Heirloom project. The Heirloom project is a collection -of standard UNIX utilities derived from code by Caldera and Sun. Differently -from the analogous utilities from the GNU project, they can be compiled with a -simple `Makefile`. - -#### Part 19: bash 2.05b - -GNU `bash` is the most well known shell and the most complex piece of software -so far. However, it comes with a number of great benefits over kaem, including -proper POSIX sh support, globbing, etc. - -Bash ships with a bison pre-generated file here which we delete. Unfortunately, -we have not bootstrapped bison but fortunately for us, heirloom yacc is able to -cope here. - -#### Part 20: flex 2.5.11 - -`flex` is a tool for generating lexers or scanners: programs that recognize -lexical patters. - -Unfortunately `flex` also depends on itself for compiling its own scanner, so -first flex 2.5.11 is compiled, with its scanner definition manually modified so -that it can be processed by lex for the Heirloom project (the required -modifications are mostly syntactical, plus a few workarounds to avoid some flex -advanced features). - -#### Part 21: musl 1.1.24 - -`musl` is a C standard library that is lightweight, fast, simple, free, and -strives to be correct in the sense of standards-conformance and safety. `musl` -is used by some distributions of GNU/Linux as their C library. Our previous Mes -C library was incomplete which prevented us from building many newer or more -complex programs. - -`tcc` has slight problems when building and linking `musl`, so we apply a few -patches. In particular, we replace all weak symbols with strong symbols and will -patch `tcc` in the next step to ignore duplicate symbols. - -#### Part 22: tcc 0.9.27 (musl) - -We recompile `tcc` against musl. This is a two stage process. First we build -tcc-0.9.27 that itself links to Mes C library but produces binaries linked to -musl. Then we recompile newly produced tcc with itself. Interestingly, -tcc-0.9.27 linked against musl is self hosting. - -#### Part 23: musl 1.1.24 (tcc-musl) - -We now rebuild `musl` with `tcc-musl` of Part 22, which fixes a number of bugs, -particularly regarding floats, in the first `musl`. - -#### Part 24: tcc 0.9.27 (musl v2) - -Now that we have a 'fixed' `musl`, we now recompile `tcc` as `tcc` uses floats -extensively. - -#### Part 25: bzip2 1.0.8 - -`bzip2` is rebuilt unpatched with the new tcc and musl fixing issues with reading -files from stdin that existed in the previous build. - -#### Part 26: m4 1.4.7 - -`m4` is the first piece of software we need in the autotools suite, flex 2.6.4 -and bison. It allows macros to be defined and files to be generated from those -macros. - -#### Part 27: flex 2.6.14 - -We recompile unpatched GNU `flex` using older flex 2.5.11. This is again a two -stage process, first compiling flex using `scan.c` (from `scan.l`) created by -old flex, then recompile `scan.c` using the new version of flex to remove any -buggy artifacts from the old flex. - -#### Part 28: bison 3.4.1 - -GNU `bison` is a parser generator. With `m4` and `flex` we can now bootstrap it -following https://gitlab.com/giomasce/bison-bootstrap. It's a 3 stage process: - -1. Build bison using a handwritten grammar parser in C. -2. Use bison from previous stage on a simplified bison grammar file. -3. Build bison using original grammar file. - -Finally we have a fully functional `bison` executable. - -#### Part 29: grep 2.4 - -GNU `grep` is a pattern matching utility. Is is not immediately needed but will -be useful later for autotools. - -#### Part 30: diffutils 2.7 - -`diffutils` is useful for comparing two files. It is not immediately needed but -is required later for autotools. - -#### Part 31: coreutils 5.0 - -`coreutils` is rebuilt against musl. Additional utilities are built including -`comm`, `expr`, `date`, `dd`, `sort`, `uname` and `uniq`. This fixes a variety -of issues with existing `coreutils`. - -#### Part 32: gawk 3.0.4 - -`gawk` is the GNU implementation of `awk`, yet another pattern matching and data -extraction utility. It is also required for autotools. - -#### Part 33: perl 5.000 - -Perl is a general purpose programming language that is especially suitable for -text processing. It is essential for autotools build system because automake and -some other tools are written in Perl. - -Perl itself is written in C but ships with some pre-generated files that need -perl for processing, namely `embed.h` and `keywords.h`. To bootstrap Perl we -will start with the oldest Perl 5 version which has the fewest number of -pregenerated files. We reimplement two remaining perl scripts in awk and use our -custom makefile instead of Perl's pre-generated Configure script. - -At this first step we build `miniperl` which is `perl` without support for -loading modules. - -#### Part 34: perl 5.003 - -We now use `perl` from the previous stage to recreate pre-generated files that -are shipped in perl 5.003. But for now we still need to use handwritten makefile -instead of `./Configure` script. - -#### Part 35: perl 5.004_05 - -Yet another version of perl; the last version buildable with 5.003. - -#### Part 36: perl 5.005_03 - -More perl! This is the last version buildable with 5.004. It also introduces the -new pregenerated files `regnodes.h` and `byterun.{h,c}`. - -#### Part 37: perl 5.6.2 - -Even more perl. 5.6.2 is the last version buildable with 5.004. diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..8eb327c --- /dev/null +++ b/README.rst @@ -0,0 +1,131 @@ +.. SPDX-FileCopyrightText: 2021 Andrius Štikonas +.. SPDX-FileCopyrightText: 2021 Paul Dersey +.. SPDX-FileCopyrightText: 2021 fosslinux + +.. SPDX-License-Identifier: CC-BY-SA-4.0 + + +live-bootstrap +============== + +An attempt to provide a reproducible, automatic, complete end-to-end +bootstrap from a minimal number of binary seeds to a supported fully +functioning operating system. + +Get me started! +--------------- + +1. ``git clone https://github.com/fosslinux/live-bootstrap`` +2. ``git submodule update --init --recursive`` +3. Provide a kernel (vmlinuz file) as the name kernel in the root of the + repository. +4. ``./rootfs.sh`` - ensure your account has kvm privileges and qemu + installed. + + a. Alternatively, run ``./rootfs.sh chroot`` to run it in a chroot. + b. Alternatively, run ``./rootfs.sh`` but don’t run the actual + virtualization and instead copy sysa/tmp/initramfs.igz to a USB or + some other device and boot from bare metal. + +5. Wait. +6. If you can, observe the many binaries in ``/after/bin``! (Soon, you will + also have an interactive bash shell to play around in). + +Background +---------- + +This project is a part of the bootstrappable project, a project that +aims to be able to build complete computing platforms through the use of +source code. When you build a compiler like GCC, you need another C +compiler to compile the compiler - turtles all the way down. Even the +first GCC compiler was written in C. There has to be a way to break the +chain… + +There has been significant work on this over the last 5 years, from +Jeremiah Orians’ stage0, hex2 and M2-Planet to janneke’s Mes. We have a +currently, fully-functioning chain of bootstrapping from the 357-byte +hex0 seed to a complete GCC compiler and hence a full Linux operating +system. From there, it is trivial to move to other UNIXes. However, +there is only currently one vector through which this can be +automatically done, GNU Guix. + +While the primary author of this project does not believe Guix is a bad +project, the great reliance on Guile, the complexity of many of the +scripts and the rather steep learning curve to install and run Guix make +it a very non plug-and-play solution. Furthermore, there is currently +(Jan 2021) no possible way to run the bootstrap from outside of a +pre-existing Linux environment. Additionally, Guix uses many scripts and +distributed files that cannot be considered source code. + +(NOTE: Guix is working on a Full Source Bootstrap, but I’m not +completely sure what that entails). + +Furthermore, having an alternative bootstrap automation tool allows +people to have greater trust in the bootstrap procedure. + +Comparison between GNU Guix and live-bootstrap +---------------------------------------------- + ++----------------------+----------------------+----------------------+ +| Item | Guix | live-bootstrap | ++======================+======================+======================+ +| Total size of seeds | ~30MB (Reduced | ~1KB | +| [1] | Source Bootstrap) | | +| | [2] | | ++----------------------+----------------------+----------------------+ +| Use of kernel | Linux-Libre Kernel | Any Linux Kernel | +| | | (2.6+) [3] | ++----------------------+----------------------+----------------------+ +| Implementation | Yes | No (in development) | +| complete | | | ++----------------------+----------------------+----------------------+ +| Automation | Almost fully | Optional user | +| | automatic | customization | ++----------------------+----------------------+----------------------+ + +[1]: Both projects only use software licensed under a FSF-approved +free software license. +[2]: Reiterating that Guix is working on a full source bootstrap, +although that still uses guile (~12 MB). [3]: Work is ongoing to use +other, smaller POSIX kernels. + +Why would I want bootstrapping? +------------------------------- + +That is outside of the scope of this README. Here’s a few things you can +look at: + +- https://bootstrappable.org +- Trusting Trust Attack (as described by Ken Thompson) +- https://guix.gnu.org/manual/en/html_node/Bootstrapping.html +- Collapse of the Internet (eg CollapseOS) + +Specific things to be bootstrapped +---------------------------------- + +GNU Guix is currently the furthest along project to automate +bootstrapping. However, there are a number of non-auditable files used +in many of their packages. Here is a list of file types that we deem +unsuitable for bootstrapping. + +1. Binaries (apart from seed hex0, kaem, kernel). +2. Any pre-generated configure scripts, or Makefile.in’s from autotools. +3. Pre-generated bison/flex parsers (identifiable through a ``.y`` + file). +4. Any source code/binaries downloaded within a software’s build system + that is outside of our control to verify before use in the build + system. +5. Any non-free software. (Must be FSF-approved license). + +How does this work? +------------------- + +**For a more in-depth discussion, see parts.rst.** + +sysa +~~~~ + +sysa is the first ‘system’ used in live-bootstrap. We move to a new +system after a reboot, which often occurs after the movement to a new +kernel. It is run by the seed Linux kernel provided by the user. It +currently compiles everything. diff --git a/parts.rst b/parts.rst new file mode 100644 index 0000000..6fcd015 --- /dev/null +++ b/parts.rst @@ -0,0 +1,354 @@ +.. sectnum:: +.. SPDX-FileCopyrightText: 2021 Andrius Štikonas +.. SPDX-FileCopyrightText: 2021 Paul Dersey +.. SPDX-FileCopyrightText: 2021 fosslinux + +.. SPDX-License-Identifier: CC-BY-SA-4.0 + +mescc-tools-seed +================ + +This is where all the magic begins. We start with our hex0 and kaem +seeds and bootstrap our way up to M2-Planet, a subset of C, and mes-m2, +an independent port of GNU Mes to M2-Planet. The following steps are +taken here: + +- hex0 (seed) +- hex0 compiles hex1 +- hex0 compiles catm +- hex1 compiles hex2 (v1) +- hex2 (v1) compiles M0 +- M0 compiles cc_x86 +- cc_x86 compiles M2-Planet (v1) +- M2-Planet (v1) compiles blood-elf (v1) +- M2-Planet (v1) compiles hex2 (final) +- M2-Planet (v1) compiles M1 +- M2-Planet (v1) compiles kaem +- M2-Planet (v1) compiles blood-elf (final) +- M2-Planet (v1) compiles get_machine +- M2-Planet (v1) compiles M2-Planet (final) + +This seems very intimidating, but becomes clearer when reading the +source: https://github.com/oriansj/mescc-tools-seed/blob/master/x86/ +(start at mescc-tools-seed-kaem.kaem). + +From here, we can move on from the lowest level stuff. + +mescc-tools-extra +================= + +mescc-tools and mes-m2 are the projects bootstrapped by +mescc-tools-seed. However, we have some currently unmerged additions to +mescc-tools that we require for this project, namely filesystem +utilities ``cp`` and ``chown``. This allows us to have one unified +directory for our binaries. Futhermore, we also build ``fletcher16``, a +preliminary checksumming tool, that we use to ensure reproducibility and +authenticity of generated binaries. + +``/after`` +========== + +We now move into the ``/after`` directory. As mescc-tools-seed has no +concept of ``chdir()`` (not added until very late in mescc-tools-seed), +we have to copy a lot of files into the root of the initramfs, making it +very messy. We get into the move ordered directory ``/after`` here, +copying over all of the required binaries from ``/``. + +mes +=== + +``mes`` is a scheme interpreter. It runs the sister project ``mescc``, +which is a C compiler written in scheme, which links against the Mes C +Library. All 3 are included in this same repository. Note that we are +using the experimental ``wip-m2`` branch to jump over the gap between +``M2-Planet`` and ``mes``. There are two stages to this part: + +1. Compiling an initial mes using ``M2-Planet``. Note that this is + *only* the Mes interpreter, not the libc or anything else. +2. We then use this to recompile the Mes interpreter as well as building + the libc. This second interpreter is faster and less buggy. We need + the libc to compile all the programs until we get glibc. + +tinycc 0.9.26 +============= + +``tinycc`` is a minimal C compiler that aims to be small and fast. It +complies with all C89 and most of C99 standards. + +First, we compile janneke’s fork of tcc 0.9.26 using ``mescc``, +containing 27 patches to make it operate well in the bootstrap +environment and make it compilable using ``mescc``. This is a +non-trivial process and as seen within tcc.kaem has many different parts +within it: a. tcc 0.9.26 is first compiled using ``mescc``. b. The mes +libc is recompiled using tcc (``mescc`` has a non-standard ``.a`` +format), including some additions for later programs. c. tcc 0.9.26 is +recompiled 5(!) times to add new features that are required for other +features, namely ``long long`` and ``float``. Each time, the libc is +also recompiled. + +tinycc 0.9.27 +============= + +Now, we compile upstream tcc 0.9.27, the latest release of tinycc, using +the final version of tcc 0.9.26. We then recompile the libc once more. + +From this point onwards, until further notice, all programs are compiled +using tinycc 0.9.27. Note that now we begin to delve into the realm of +old GNU software, using older versions compilable by tinycc. Prior to +this point, all tools have been adapted significantly for the bootstrap; +now, we will be using old tooling instead. + +sed 4.0.7 +========= + +You are most likely aware of GNU ``sed``, a line editor. + +tar 1.12 +======== + +GNU ``tar`` is the most common archive format used by software source +code, often compressed also. To avoid continuing using submodules, we +build GNU tar 1.12, the last version compilable by tinycc without +significant patching. + +gzip 1.2.4 +========== + +``gzip`` is the most common compression format used for software source +code. It is luckily distributed as a barebones uncompressed ``.tar``, +which we extract and then build. We do require deletion of a few lines +unsupported by mes libc. + +Going forward, we can now use ``.tar.gz`` for source code. + +patch 2.5.9 +=========== + +``patch`` is a very useful tool at this stage, allowing us to make +significantly more complex edits, including just changes to lines. +Luckily, we are able to patch patch using sed only. + +sha-2 +===== + +``sha-2`` is a standalone external ``sha256sum`` implementation, +originally as a library, but patched to have a command line interface. +It is mostly output-compatible with ``sha256sum`` from coreutils. We use +this in replacement of ``fletcher16``. + +Redo checksums using ``sha256sum`` +================================== + +We have now just built ``sha256sum``, which has a significantly (many orders +of magnitude) lower collision rate than ``fletcher16``, so we recheck all of +the existing binaries using ``sha256sum``. + +patched mes-libc +================ + +Since patch is available at this point, we can apply additional fixes to +mes-libc that are not included in the wip-m2 branch and recompile libc. + +patched tinycc +============== + +In Guix, tinycc is patched to force static linking. Prior to this step, +we have been forced to manually specify static linking for each tool. +Now that we have patch, we can patch tinycc to force static linking and +then recompile it. + +Note that we have to do this using tinycc 0.9.26, as tinycc 0.9.27 +cannot recompile itself for unknown reasons. + +make 3.80 +========= + +GNU ``make`` is now built so we have a more robust building system. +``make`` allows us to do things like define rules for files rather than +writing complex kaem scripts. + +bzip2 1.0.8 +=========== + +``bzip2`` is a compression format that compresses more than ``gzip``. It +is preferred where we can use it, and makes source code sizes smaller. + +coreutils 5.0.0 +=============== + +GNU Coreutils is a collection of widely used utilities such as ``cat``, +``chmod``, ``chown``, ``cp``, ``install``, ``ln``, ``ls``, ``mkdir``, +``mknod``, ``mv``, ``rm``, ``rmdir``, ``tee``, ``test``, ``true``, and +many others. + +A few of the utilities cannot be easily compiled with Mes C library, so +we skip them. + +The ``cp`` in this stage replaces the ``mescc-tools-extra`` ``cp``. + +heirloom devtools +================= + +``lex`` and ``yacc`` from the Heirloom project. The Heirloom project is +a collection of standard UNIX utilities derived from code by Caldera and +Sun. Differently from the analogous utilities from the GNU project, they +can be compiled with a simple ``Makefile``. + +bash 2.05b +========== + +GNU ``bash`` is the most well known shell and the most complex piece of +software so far. However, it comes with a number of great benefits over +kaem, including proper POSIX sh support, globbing, etc. + +Bash ships with a bison pre-generated file here which we delete. +Unfortunately, we have not bootstrapped bison but fortunately for us, +heirloom yacc is able to cope here. + +flex 2.5.11 +=========== + +``flex`` is a tool for generating lexers or scanners: programs that +recognize lexical patters. + +Unfortunately ``flex`` also depends on itself for compiling its own +scanner, so first flex 2.5.11 is compiled, with its scanner definition +manually modified so that it can be processed by lex for the Heirloom +project (the required modifications are mostly syntactical, plus a few +workarounds to avoid some flex advanced features). + +musl 1.1.24 +=========== + +``musl`` is a C standard library that is lightweight, fast, simple, +free, and strives to be correct in the sense of standards-conformance +and safety. ``musl`` is used by some distributions of GNU/Linux as their +C library. Our previous Mes C library was incomplete which prevented us +from building many newer or more complex programs. + +``tcc`` has slight problems when building and linking ``musl``, so we +apply a few patches. In particular, we replace all weak symbols with +strong symbols and will patch ``tcc`` in the next step to ignore +duplicate symbols. + +tcc 0.9.27 (musl) +================= + +We recompile ``tcc`` against musl. This is a two stage process. First we +build tcc-0.9.27 that itself links to Mes C library but produces +binaries linked to musl. Then we recompile newly produced tcc with +itself. Interestingly, tcc-0.9.27 linked against musl is self hosting. + +musl 1.1.24 (tcc-musl) +====================== + +We now rebuild ``musl`` with ``tcc-musl`` of Part 22, which fixes a +number of bugs, particularly regarding floats, in the first ``musl``. + +tcc 0.9.27 (musl v2) +==================== + +Now that we have a ‘fixed’ ``musl``, we now recompile ``tcc`` as ``tcc`` +uses floats extensively. + +.. _bzip2-1.0.8-1: + +bzip2 1.0.8 +=========== + +``bzip2`` is rebuilt unpatched with the new tcc and musl fixing issues +with reading files from stdin that existed in the previous build. + +m4 1.4.7 +======== + +``m4`` is the first piece of software we need in the autotools suite, +flex 2.6.4 and bison. It allows macros to be defined and files to be +generated from those macros. + +flex 2.6.14 +=========== + +We recompile unpatched GNU ``flex`` using older flex 2.5.11. This is +again a two stage process, first compiling flex using ``scan.c`` (from +``scan.l``) created by old flex, then recompile ``scan.c`` using the new +version of flex to remove any buggy artifacts from the old flex. + +bison 3.4.1 +=========== + +GNU ``bison`` is a parser generator. With ``m4`` and ``flex`` we can now +bootstrap it following https://gitlab.com/giomasce/bison-bootstrap. It’s +a 3 stage process: + +1. Build bison using a handwritten grammar parser in C. +2. Use bison from previous stage on a simplified bison grammar file. +3. Build bison using original grammar file. + +Finally we have a fully functional ``bison`` executable. + +grep 2.4 +======== + +GNU ``grep`` is a pattern matching utility. Is is not immediately needed +but will be useful later for autotools. + +diffutils 2.7 +============= + +``diffutils`` is useful for comparing two files. It is not immediately +needed but is required later for autotools. + +coreutils 5.0 +============= + +``coreutils`` is rebuilt against musl. Additional utilities are built +including ``comm``, ``expr``, ``date``, ``dd``, ``sort``, ``uname`` and +``uniq``. This fixes a variety of issues with existing ``coreutils``. + +gawk 3.0.4 +========== + +``gawk`` is the GNU implementation of ``awk``, yet another pattern +matching and data extraction utility. It is also required for autotools. + +perl 5.000 +========== + +Perl is a general purpose programming language that is especially +suitable for text processing. It is essential for autotools build system +because automake and some other tools are written in Perl. + +Perl itself is written in C but ships with some pre-generated files that +need perl for processing, namely ``embed.h`` and ``keywords.h``. To +bootstrap Perl we will start with the oldest Perl 5 version which has +the fewest number of pregenerated files. We reimplement two remaining +perl scripts in awk and use our custom makefile instead of Perl’s +pre-generated Configure script. + +At this first step we build ``miniperl`` which is ``perl`` without +support for loading modules. + +perl 5.003 +========== + +We now use ``perl`` from the previous stage to recreate pre-generated +files that are shipped in perl 5.003. But for now we still need to use +handwritten makefile instead of ``./Configure`` script. + +perl 5.004_05 +============= + +Yet another version of perl; the last version buildable with 5.003. + +perl 5.005_03 +============= + +More perl! This is the last version buildable with 5.004. It also +introduces the new pregenerated files ``regnodes.h`` and +``byterun.{h,c}``. + +perl 5.6.2 +========== + +Even more perl. 5.6.2 is the last version buildable with 5.004. diff --git a/sysa/after.kaem.run b/sysa/after.kaem.run index 97f6d5b..f845027 100755 --- a/sysa/after.kaem.run +++ b/sysa/after.kaem.run @@ -18,13 +18,13 @@ incdir=${prefix}/include MES_PREFIX=${prefix}/mes/src/mes GUILE_LOAD_PATH=${prefix}/mes/src/nyacc/module:${prefix}/mes/src/mes/mes/module:${prefix}/mes/src/mes/module -# Part 2: cp and chown (mescc-tools-extra) +# cp and chown (mescc-tools-extra) pkg="mescc-tools-extra" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 3: Remove remaining dependencies on / (root of /after) +# Remove remaining dependencies on / (root of /after) cp ../bin/hex2 bin/hex2 cp ../bin/M1 bin/M1 cp ../bin/M2-Planet bin/M2-Planet @@ -38,92 +38,92 @@ chmod 755 bin/hex2 bin/M1 bin/M2-Planet bin/blood-elf \ fletcher16 mescc-tools-seed-checksums PATH=/after/bin -# Part 5: mes +# mes pkg="mes" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 6: tcc 0.9.26 +# tcc 0.9.26 pkg="tcc-0.9.26" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 7: tcc 0.9.27 +# tcc 0.9.27 pkg="tcc-0.9.27" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 8: sed +# sed pkg="sed-4.0.7" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 9: tar +# tar pkg="tar-1.12" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 10: gzip +# gzip pkg="gzip-1.2.4" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 11: patch +# patch pkg="patch-2.5.9" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 12: sha-2 +# sha-2 pkg="sha-2-61555d" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 12a: Check all up to this part as sha256sum +# Check all up to this part as sha256sum sha256sum -c pre-sha.sha256sums -# Part 13: mes-libc-patched +# mes-libc-patched cd tcc-0.9.27 kaem --file mes-libc-patched.kaem cd .. -# Part 14: tcc-patched +# tcc-patched cd tcc-0.9.27 kaem --file tcc-patched.kaem cd .. -# Part 15: make +# make pkg="make-3.80" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 16: bzip2 +# bzip2 pkg="bzip2-1.0.8" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 17: coreutils +# coreutils pkg="coreutils-5.0" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 18: heirloom-devtools +# heirloom-devtools pkg="heirloom-devtools-070527" cd ${pkg} kaem --file ${pkg}.kaem cd .. -# Part 19: bash +# bash pkg="bash-2.05b" cd ${pkg} kaem --file ${pkg}.kaem diff --git a/sysa/run.sh b/sysa/run.sh index bbb4d0b..8863952 100755 --- a/sysa/run.sh +++ b/sysa/run.sh @@ -12,60 +12,47 @@ set -e export PREFIX=/after -# Part 20 build flex-2.5.11 -# Part 21 build musl-1.1.24 musl-1.1.24.sh checksums/pass1 -# Part 22 +# Rebuild tcc using musl build tcc-0.9.27 tcc-musl-pass1.sh checksums/tcc-musl-pass1 -# Part 23 +# Rebuild musl using tcc-musl build musl-1.1.24 musl-1.1.24.sh checksums/pass2 -# Part 24 +# Rebuild tcc-musl using new musl build tcc-0.9.27 tcc-musl-pass2.sh checksums/tcc-musl-pass2 -# Part 25 +# Rebuild bzip2 using musl build bzip2-1.0.8 bzip2-1.0.8.sh checksums/bzip2-pass2 -# Part 26 build m4-1.4.7 -# Part 27 build flex-2.6.4 -# Part 28 build bison-3.4.1 stage1.sh checksums/stage1 build bison-3.4.1 stage2.sh checksums/stage2 build bison-3.4.1 stage3.sh checksums/stage3 -# Part 29 build grep-2.4 -# Part 30 build diffutils-2.7 -# Part 31 +# Rebuild coreutils using musl build coreutils-5.0 coreutils-5.0.sh checksums/pass2 -# Part 32 build gawk-3.0.4 -# Part 33 build perl-5.000 -# Part 34 build perl-5.003 -# Part 35 build perl5.004_05 -# Part 36 build perl5.005_03 -# Part 37 build perl-5.6.2 echo "Bootstrapping completed."