Change the part numbering system + Move to .rst

Parts have been split out into seperate file from README. Convert
README to .rst; remove part numbers from scripts.
This commit is contained in:
fosslinux 2021-02-25 16:48:21 +11:00
parent 40bdcee0ac
commit 8eec63e1b7
5 changed files with 508 additions and 438 deletions

402
README.md
View File

@ -1,402 +0,0 @@
<!--
SPDX-FileCopyrightText: 2021 Andrius Štikonas <andrius@stikonas.eu>
SPDX-FileCopyrightText: 2021 Paul Dersey <pdersey@gmail.com>
SPDX-FileCopyrightText: 2021 fosslinux <fosslinux@aussies.space>
SPDX-License-Identifier: CC-BY-SA-4.0
-->
# live-bootstrap
An attempt to provide a reproducible, automatic, complete end-to-end bootstrap
from a minimal number of binary seeds to a supported fully functioning operating
system.
## Get me started!
1. `git clone https://github.com/fosslinux/live-bootstrap`
2. `git submodule update --init --recursive`
3. Provide a kernel (vmlinuz file) as the name kernel in the root of the repository.
4. `./rootfs.sh` - ensure your account has kvm privileges and qemu installed.
a. Alternatively, run `./rootfs.sh chroot` to run it in a chroot.
b. Alternatively, run `./rootfs.sh` but don't run the actual virtualization
and instead copy sysa/tmp/initramfs.igz to a USB or some other device and
boot from bare metal.
6. Wait.
7. Currently, live-bootstrap doesn't provide anything to you, as it is incomplete.
## Background
This project is a part of the bootstrappable project, a project that aims to be
able to build complete computing platforms through the use of source code. When
you build a compiler like GCC, you need another C compiler to compile the
compiler - turtles all the way down. Even the first GCC compiler was written in
C. There has to be a way to break the chain...
There has been significant work on this over the last 5 years, from Jeremiah
Orians' stage0, hex2 and M2-Planet to janneke's Mes. We have a currently,
fully-functioning chain of bootstrapping from the 357-byte hex0 seed to a
complete GCC compiler and hence a full Linux operating system. From there, it is
trivial to move to other UNIXes. However, there is only currently one vector
through which this can be automatically done, GNU Guix.
While the primary author of this project does not believe Guix is a bad project,
the great reliance on Guile, the complexity of many of the scripts and the
rather steep learning curve to install and run Guix make it a very non
plug-and-play solution. Furthermore, there is currently (Jan 2021) no possible
way to run the bootstrap from outside of a pre-existing Linux environment.
Additionally, Guix uses many scripts and distributed files that cannot be
considered source code.
(NOTE: Guix is working on a Full Source Bootstrap, but I'm not completely sure
what that entails).
Furthermore, having an alternative bootstrap automation tool allows people to
have greater trust in the bootstrap procedure.
## Comparison between GNU Guix and live-bootstrap
| Item | Guix | live-bootstrap |
| -- | -- | -- |
| Total size of seeds [1] | ~30MB (Reduced Source Bootstrap) [2] | ~1KB |
| Use of kernel | Linux-Libre Kernel | Any Linux Kernel (2.6+) [3] |
| Implementation complete | Yes | No (in development) |
| Automation | Almost fully automatic | Optional user customization |
[1]: Excluding kernel.
[2]: Reiterating that Guix is working on a full source bootstrap, although that still uses guile (~12 MB).
[3]: Work is ongoing to use other, smaller POSIX kernels.
## Why would I want bootstrapping?
That is outside of the scope of this README. Here's a few things you can look
at:
- https://bootstrappable.org
- Trusting Trust Attack (as described by Ken Thompson)
- https://guix.gnu.org/manual/en/html_node/Bootstrapping.html
- Collapse of the Internet (eg CollapseOS)
## Specific things to be bootstrapped
GNU Guix is currently the furthest along project to automate bootstrapping.
However, there are a number of non-auditable files used in many of their
packages. Here is a list of file types that we deem unsuitable for
bootstrapping.
1. Binaries (apart from seed hex0, kaem, kernel).
2. Any pre-generated configure scripts, or Makefile.in's from autotools.
3. Pre-generated bison/flex parsers (identifiable through a `.y` file).
4. Any source code/binaries downloaded within a software's build system that is
outside of our control to verify before use in the build system.
5. Any non-free software. [1]
[1]: We only use software licensed under a FSF-approved free software license.
## How does this work?
### sysa
sysa is the first 'system' used in live-bootstrap. We move to a new system after
a reboot, which often occurs after the movement to a new kernel. It is run by
the seed Linux kernel provided by the user, and has 16 parts.
#### Part 1: mescc-tools-seed
This is where all the magic begins. We start with our hex0 and kaem seeds and
bootstrap our way up to M2-Planet, a subset of C, and mes-m2, an independent
port of GNU Mes to M2-Planet. The following steps are taken here:
- hex0 (seed)
- hex0 compiles hex1
- hex0 compiles catm
- hex1 compiles hex2 (v1)
- hex2 (v1) compiles M0
- M0 compiles cc_x86
- cc_x86 compiles M2-Planet (v1)
- M2-Planet (v1) compiles blood-elf (v1)
- M2-Planet (v1) compiles hex2 (final)
- M2-Planet (v1) compiles M1
- M2-Planet (v1) compiles kaem
- M2-Planet (v1) compiles blood-elf (final)
- M2-Planet (v1) compiles get_machine
- M2-Planet (v1) compiles M2-Planet (final)
This seems very intimidating, but becomes clearer when reading the source:
https://github.com/oriansj/mescc-tools-seed/blob/master/x86/ (start at
mescc-tools-seed-kaem.kaem).
From here, we can move on from the lowest level stuff.
#### Part 2: mescc-tools-extra
mescc-tools and mes-m2 are the projects bootstrapped by mescc-tools-seed.
However, we have some currently unmerged additions to mescc-tools that we
require for this project, namely filesystem utilities `cp` and `chown`. This
allows us to have one unified directory for our binaries. Futhermore, we also
build `fletcher16`, a preliminary checksumming tool, that we use to ensure
reproducibility and authenticity of generated binaries.
#### Part 3: `/after`
We now move into the `/after` directory. As mescc-tools-seed has no concept of
`chdir()` (not added until very late in mescc-tools-seed), we have to copy a lot
of files into the root of the initramfs, making it very messy. We get into the
move ordered directory `/after` here, copying over all of the required binaries
from `/`.
#### Part 4: blynn-compiler
`blynn-compiler` is a project on top of mescc-tools-seed to bootstrap a minimal
haskell compiler from M2-Planet. While we don't currently use this for anything,
it is planned to be eventually used to bootstrap the next part.
#### Part 5: mes
`mes` is a scheme interpreter. It runs the sister project `mescc`, which is a C
compiler written in scheme, which links against the Mes C Library. All 3 are
included in this same repository. Note that we are using the experimental
`wip-m2` branch to jump over the gap between `M2-Planet` and `mes`. There are
two stages to this part:
1. Compiling an initial mes using `M2-Planet`. Note that this is *only* the Mes
interpreter, not the libc or anything else.
2. We then use this to recompile the Mes interpreter as well as building the
libc. This second interpreter is faster and less buggy. We need the libc to
compile all the programs until we get glibc.
#### Part 6, 7: tinycc
`tinycc` is a minimal C compiler that aims to be small and fast. It complies
with all C89 and most of C99 standards. This is also a two-tiered process:
1. First, we compile janneke's fork of tcc 0.9.26 using `mescc`, containing 27
patches to make it operate well in the bootstrap environment and make it
compilable using `mescc`. This is a non-trivial process and as seen within
tcc.kaem has many different parts within it: a. tcc 0.9.26 is first compiled
using `mescc`. b. The mes libc is recompiled using tcc (`mescc` has a
non-standard `.a` format), including some additions for later programs. c.
tcc 0.9.26 is recompiled 5(!) times to add new features that are required for
other features, namely `long long` and `float`. Each time, the libc is also
recompiled.
2. Then we compile upstream tcc 0.9.27, the latest release of tinycc, using the
final version of tcc 0.9.26. We then recompile the libc once more.
From this point onwards, until further notice, all programs are compiled using
tinycc 0.9.27. Note that now we begin to delve into the realm of old GNU
software, using older versions compilable by tinycc. Prior to this point, all
tools have been adapted significantly for the bootstrap; now, we will be using
old tooling instead.
#### Part 8: sed 4.0.7
You are most likely aware of GNU `sed`, a line editor.
#### Part 9: tar 1.12
GNU `tar` is the most common archive format used by software source code, often
compressed also. To avoid continuing using submodules, we build GNU tar 1.12,
the last version compilable by tinycc without significant patching.
#### Part 10: gzip 1.2.4
`gzip` is the most common compression format used for software source code. It
is luckily distributed as a barebones uncompressed `.tar`, which we extract and
then build. We do require deletion of a few lines unsupported by mes libc.
Going forward, we can now use `.tar.gz` for source code.
#### Part 11: patch 2.5.9
`patch` is a very useful tool at this stage, allowing us to make significantly
more complex edits, including just changes to lines. Luckily, we are able to
patch patch using sed only.
#### Part 12: sha-2
`sha-2` is a standalone external `sha256sum` implementation, originally as a
library, but patched to have a command line interface. It is mostly
output-compatible with `sha256sum` from coreutils. We use this in replacement of
`fletcher16`.
#### Part 12a: Redo checksums using `sha256sum`
We have now just built `sha256sum`, which has a 16x lower collision rate than
`fletcher16`, so we recheck all of the existing binaries using `sha256sum`.
#### Part 13: patched mes-libc
Since patch is available at this point, we can apply additional fixes to
mes-libc that are not included in the wip-m2 branch and recompile libc.
#### Part 14: patched tinycc
In Guix, tinycc is patched to force static linking. Prior to this step, we have
been forced to manually specify static linking for each tool. Now that we have
patch, we can patch tinycc to force static linking and then recompile it.
Note that we have to do this using tinycc 0.9.26, as tinycc 0.9.27 cannot
recompile itself for unknown reasons.
#### Part 15: make 3.80
GNU `make` is now built so we have a more robust building system. `make` allows
us to do things like define rules for files rather than writing complex kaem
scripts.
#### Part 16: bzip2 1.0.8
`bzip2` is a compression format that compresses more than `gzip`. It is
preferred where we can use it, and makes source code sizes smaller.
#### Part 17: coreutils 5.0.0
GNU Coreutils is a collection of widely used utilities such as `cat`, `chmod`,
`chown`, `cp`, `install`, `ln`, `ls`, `mkdir`, `mknod`, `mv`, `rm`, `rmdir`,
`tee`, `test`, `true`, and many others.
A few of the utilities cannot be easily compiled with Mes C library, so we skip
them.
The `cp` in this stage replaces the `mescc-tools-extra` `cp`.
#### Part 18: heirloom devtools
`lex` and `yacc` from the Heirloom project. The Heirloom project is a collection
of standard UNIX utilities derived from code by Caldera and Sun. Differently
from the analogous utilities from the GNU project, they can be compiled with a
simple `Makefile`.
#### Part 19: bash 2.05b
GNU `bash` is the most well known shell and the most complex piece of software
so far. However, it comes with a number of great benefits over kaem, including
proper POSIX sh support, globbing, etc.
Bash ships with a bison pre-generated file here which we delete. Unfortunately,
we have not bootstrapped bison but fortunately for us, heirloom yacc is able to
cope here.
#### Part 20: flex 2.5.11
`flex` is a tool for generating lexers or scanners: programs that recognize
lexical patters.
Unfortunately `flex` also depends on itself for compiling its own scanner, so
first flex 2.5.11 is compiled, with its scanner definition manually modified so
that it can be processed by lex for the Heirloom project (the required
modifications are mostly syntactical, plus a few workarounds to avoid some flex
advanced features).
#### Part 21: musl 1.1.24
`musl` is a C standard library that is lightweight, fast, simple, free, and
strives to be correct in the sense of standards-conformance and safety. `musl`
is used by some distributions of GNU/Linux as their C library. Our previous Mes
C library was incomplete which prevented us from building many newer or more
complex programs.
`tcc` has slight problems when building and linking `musl`, so we apply a few
patches. In particular, we replace all weak symbols with strong symbols and will
patch `tcc` in the next step to ignore duplicate symbols.
#### Part 22: tcc 0.9.27 (musl)
We recompile `tcc` against musl. This is a two stage process. First we build
tcc-0.9.27 that itself links to Mes C library but produces binaries linked to
musl. Then we recompile newly produced tcc with itself. Interestingly,
tcc-0.9.27 linked against musl is self hosting.
#### Part 23: musl 1.1.24 (tcc-musl)
We now rebuild `musl` with `tcc-musl` of Part 22, which fixes a number of bugs,
particularly regarding floats, in the first `musl`.
#### Part 24: tcc 0.9.27 (musl v2)
Now that we have a 'fixed' `musl`, we now recompile `tcc` as `tcc` uses floats
extensively.
#### Part 25: bzip2 1.0.8
`bzip2` is rebuilt unpatched with the new tcc and musl fixing issues with reading
files from stdin that existed in the previous build.
#### Part 26: m4 1.4.7
`m4` is the first piece of software we need in the autotools suite, flex 2.6.4
and bison. It allows macros to be defined and files to be generated from those
macros.
#### Part 27: flex 2.6.14
We recompile unpatched GNU `flex` using older flex 2.5.11. This is again a two
stage process, first compiling flex using `scan.c` (from `scan.l`) created by
old flex, then recompile `scan.c` using the new version of flex to remove any
buggy artifacts from the old flex.
#### Part 28: bison 3.4.1
GNU `bison` is a parser generator. With `m4` and `flex` we can now bootstrap it
following https://gitlab.com/giomasce/bison-bootstrap. It's a 3 stage process:
1. Build bison using a handwritten grammar parser in C.
2. Use bison from previous stage on a simplified bison grammar file.
3. Build bison using original grammar file.
Finally we have a fully functional `bison` executable.
#### Part 29: grep 2.4
GNU `grep` is a pattern matching utility. Is is not immediately needed but will
be useful later for autotools.
#### Part 30: diffutils 2.7
`diffutils` is useful for comparing two files. It is not immediately needed but
is required later for autotools.
#### Part 31: coreutils 5.0
`coreutils` is rebuilt against musl. Additional utilities are built including
`comm`, `expr`, `date`, `dd`, `sort`, `uname` and `uniq`. This fixes a variety
of issues with existing `coreutils`.
#### Part 32: gawk 3.0.4
`gawk` is the GNU implementation of `awk`, yet another pattern matching and data
extraction utility. It is also required for autotools.
#### Part 33: perl 5.000
Perl is a general purpose programming language that is especially suitable for
text processing. It is essential for autotools build system because automake and
some other tools are written in Perl.
Perl itself is written in C but ships with some pre-generated files that need
perl for processing, namely `embed.h` and `keywords.h`. To bootstrap Perl we
will start with the oldest Perl 5 version which has the fewest number of
pregenerated files. We reimplement two remaining perl scripts in awk and use our
custom makefile instead of Perl's pre-generated Configure script.
At this first step we build `miniperl` which is `perl` without support for
loading modules.
#### Part 34: perl 5.003
We now use `perl` from the previous stage to recreate pre-generated files that
are shipped in perl 5.003. But for now we still need to use handwritten makefile
instead of `./Configure` script.
#### Part 35: perl 5.004_05
Yet another version of perl; the last version buildable with 5.003.
#### Part 36: perl 5.005_03
More perl! This is the last version buildable with 5.004. It also introduces the
new pregenerated files `regnodes.h` and `byterun.{h,c}`.
#### Part 37: perl 5.6.2
Even more perl. 5.6.2 is the last version buildable with 5.004.

131
README.rst Normal file
View File

@ -0,0 +1,131 @@
.. SPDX-FileCopyrightText: 2021 Andrius Štikonas <andrius@stikonas.eu>
.. SPDX-FileCopyrightText: 2021 Paul Dersey <pdersey@gmail.com>
.. SPDX-FileCopyrightText: 2021 fosslinux <fosslinux@aussies.space>
.. SPDX-License-Identifier: CC-BY-SA-4.0
live-bootstrap
==============
An attempt to provide a reproducible, automatic, complete end-to-end
bootstrap from a minimal number of binary seeds to a supported fully
functioning operating system.
Get me started!
---------------
1. ``git clone https://github.com/fosslinux/live-bootstrap``
2. ``git submodule update --init --recursive``
3. Provide a kernel (vmlinuz file) as the name kernel in the root of the
repository.
4. ``./rootfs.sh`` - ensure your account has kvm privileges and qemu
installed.
a. Alternatively, run ``./rootfs.sh chroot`` to run it in a chroot.
b. Alternatively, run ``./rootfs.sh`` but dont run the actual
virtualization and instead copy sysa/tmp/initramfs.igz to a USB or
some other device and boot from bare metal.
5. Wait.
6. If you can, observe the many binaries in ``/after/bin``! (Soon, you will
also have an interactive bash shell to play around in).
Background
----------
This project is a part of the bootstrappable project, a project that
aims to be able to build complete computing platforms through the use of
source code. When you build a compiler like GCC, you need another C
compiler to compile the compiler - turtles all the way down. Even the
first GCC compiler was written in C. There has to be a way to break the
chain…
There has been significant work on this over the last 5 years, from
Jeremiah Orians stage0, hex2 and M2-Planet to jannekes Mes. We have a
currently, fully-functioning chain of bootstrapping from the 357-byte
hex0 seed to a complete GCC compiler and hence a full Linux operating
system. From there, it is trivial to move to other UNIXes. However,
there is only currently one vector through which this can be
automatically done, GNU Guix.
While the primary author of this project does not believe Guix is a bad
project, the great reliance on Guile, the complexity of many of the
scripts and the rather steep learning curve to install and run Guix make
it a very non plug-and-play solution. Furthermore, there is currently
(Jan 2021) no possible way to run the bootstrap from outside of a
pre-existing Linux environment. Additionally, Guix uses many scripts and
distributed files that cannot be considered source code.
(NOTE: Guix is working on a Full Source Bootstrap, but Im not
completely sure what that entails).
Furthermore, having an alternative bootstrap automation tool allows
people to have greater trust in the bootstrap procedure.
Comparison between GNU Guix and live-bootstrap
----------------------------------------------
+----------------------+----------------------+----------------------+
| Item | Guix | live-bootstrap |
+======================+======================+======================+
| Total size of seeds | ~30MB (Reduced | ~1KB |
| [1] | Source Bootstrap) | |
| | [2] | |
+----------------------+----------------------+----------------------+
| Use of kernel | Linux-Libre Kernel | Any Linux Kernel |
| | | (2.6+) [3] |
+----------------------+----------------------+----------------------+
| Implementation | Yes | No (in development) |
| complete | | |
+----------------------+----------------------+----------------------+
| Automation | Almost fully | Optional user |
| | automatic | customization |
+----------------------+----------------------+----------------------+
[1]: Both projects only use software licensed under a FSF-approved
free software license.
[2]: Reiterating that Guix is working on a full source bootstrap,
although that still uses guile (~12 MB). [3]: Work is ongoing to use
other, smaller POSIX kernels.
Why would I want bootstrapping?
-------------------------------
That is outside of the scope of this README. Heres a few things you can
look at:
- https://bootstrappable.org
- Trusting Trust Attack (as described by Ken Thompson)
- https://guix.gnu.org/manual/en/html_node/Bootstrapping.html
- Collapse of the Internet (eg CollapseOS)
Specific things to be bootstrapped
----------------------------------
GNU Guix is currently the furthest along project to automate
bootstrapping. However, there are a number of non-auditable files used
in many of their packages. Here is a list of file types that we deem
unsuitable for bootstrapping.
1. Binaries (apart from seed hex0, kaem, kernel).
2. Any pre-generated configure scripts, or Makefile.ins from autotools.
3. Pre-generated bison/flex parsers (identifiable through a ``.y``
file).
4. Any source code/binaries downloaded within a softwares build system
that is outside of our control to verify before use in the build
system.
5. Any non-free software. (Must be FSF-approved license).
How does this work?
-------------------
**For a more in-depth discussion, see parts.rst.**
sysa
~~~~
sysa is the first system used in live-bootstrap. We move to a new
system after a reboot, which often occurs after the movement to a new
kernel. It is run by the seed Linux kernel provided by the user. It
currently compiles everything.

354
parts.rst Normal file
View File

@ -0,0 +1,354 @@
.. sectnum::
.. SPDX-FileCopyrightText: 2021 Andrius Štikonas <andrius@stikonas.eu>
.. SPDX-FileCopyrightText: 2021 Paul Dersey <pdersey@gmail.com>
.. SPDX-FileCopyrightText: 2021 fosslinux <fosslinux@aussies.space>
.. SPDX-License-Identifier: CC-BY-SA-4.0
mescc-tools-seed
================
This is where all the magic begins. We start with our hex0 and kaem
seeds and bootstrap our way up to M2-Planet, a subset of C, and mes-m2,
an independent port of GNU Mes to M2-Planet. The following steps are
taken here:
- hex0 (seed)
- hex0 compiles hex1
- hex0 compiles catm
- hex1 compiles hex2 (v1)
- hex2 (v1) compiles M0
- M0 compiles cc_x86
- cc_x86 compiles M2-Planet (v1)
- M2-Planet (v1) compiles blood-elf (v1)
- M2-Planet (v1) compiles hex2 (final)
- M2-Planet (v1) compiles M1
- M2-Planet (v1) compiles kaem
- M2-Planet (v1) compiles blood-elf (final)
- M2-Planet (v1) compiles get_machine
- M2-Planet (v1) compiles M2-Planet (final)
This seems very intimidating, but becomes clearer when reading the
source: https://github.com/oriansj/mescc-tools-seed/blob/master/x86/
(start at mescc-tools-seed-kaem.kaem).
From here, we can move on from the lowest level stuff.
mescc-tools-extra
=================
mescc-tools and mes-m2 are the projects bootstrapped by
mescc-tools-seed. However, we have some currently unmerged additions to
mescc-tools that we require for this project, namely filesystem
utilities ``cp`` and ``chown``. This allows us to have one unified
directory for our binaries. Futhermore, we also build ``fletcher16``, a
preliminary checksumming tool, that we use to ensure reproducibility and
authenticity of generated binaries.
``/after``
==========
We now move into the ``/after`` directory. As mescc-tools-seed has no
concept of ``chdir()`` (not added until very late in mescc-tools-seed),
we have to copy a lot of files into the root of the initramfs, making it
very messy. We get into the move ordered directory ``/after`` here,
copying over all of the required binaries from ``/``.
mes
===
``mes`` is a scheme interpreter. It runs the sister project ``mescc``,
which is a C compiler written in scheme, which links against the Mes C
Library. All 3 are included in this same repository. Note that we are
using the experimental ``wip-m2`` branch to jump over the gap between
``M2-Planet`` and ``mes``. There are two stages to this part:
1. Compiling an initial mes using ``M2-Planet``. Note that this is
*only* the Mes interpreter, not the libc or anything else.
2. We then use this to recompile the Mes interpreter as well as building
the libc. This second interpreter is faster and less buggy. We need
the libc to compile all the programs until we get glibc.
tinycc 0.9.26
=============
``tinycc`` is a minimal C compiler that aims to be small and fast. It
complies with all C89 and most of C99 standards.
First, we compile jannekes fork of tcc 0.9.26 using ``mescc``,
containing 27 patches to make it operate well in the bootstrap
environment and make it compilable using ``mescc``. This is a
non-trivial process and as seen within tcc.kaem has many different parts
within it: a. tcc 0.9.26 is first compiled using ``mescc``. b. The mes
libc is recompiled using tcc (``mescc`` has a non-standard ``.a``
format), including some additions for later programs. c. tcc 0.9.26 is
recompiled 5(!) times to add new features that are required for other
features, namely ``long long`` and ``float``. Each time, the libc is
also recompiled.
tinycc 0.9.27
=============
Now, we compile upstream tcc 0.9.27, the latest release of tinycc, using
the final version of tcc 0.9.26. We then recompile the libc once more.
From this point onwards, until further notice, all programs are compiled
using tinycc 0.9.27. Note that now we begin to delve into the realm of
old GNU software, using older versions compilable by tinycc. Prior to
this point, all tools have been adapted significantly for the bootstrap;
now, we will be using old tooling instead.
sed 4.0.7
=========
You are most likely aware of GNU ``sed``, a line editor.
tar 1.12
========
GNU ``tar`` is the most common archive format used by software source
code, often compressed also. To avoid continuing using submodules, we
build GNU tar 1.12, the last version compilable by tinycc without
significant patching.
gzip 1.2.4
==========
``gzip`` is the most common compression format used for software source
code. It is luckily distributed as a barebones uncompressed ``.tar``,
which we extract and then build. We do require deletion of a few lines
unsupported by mes libc.
Going forward, we can now use ``.tar.gz`` for source code.
patch 2.5.9
===========
``patch`` is a very useful tool at this stage, allowing us to make
significantly more complex edits, including just changes to lines.
Luckily, we are able to patch patch using sed only.
sha-2
=====
``sha-2`` is a standalone external ``sha256sum`` implementation,
originally as a library, but patched to have a command line interface.
It is mostly output-compatible with ``sha256sum`` from coreutils. We use
this in replacement of ``fletcher16``.
Redo checksums using ``sha256sum``
==================================
We have now just built ``sha256sum``, which has a significantly (many orders
of magnitude) lower collision rate than ``fletcher16``, so we recheck all of
the existing binaries using ``sha256sum``.
patched mes-libc
================
Since patch is available at this point, we can apply additional fixes to
mes-libc that are not included in the wip-m2 branch and recompile libc.
patched tinycc
==============
In Guix, tinycc is patched to force static linking. Prior to this step,
we have been forced to manually specify static linking for each tool.
Now that we have patch, we can patch tinycc to force static linking and
then recompile it.
Note that we have to do this using tinycc 0.9.26, as tinycc 0.9.27
cannot recompile itself for unknown reasons.
make 3.80
=========
GNU ``make`` is now built so we have a more robust building system.
``make`` allows us to do things like define rules for files rather than
writing complex kaem scripts.
bzip2 1.0.8
===========
``bzip2`` is a compression format that compresses more than ``gzip``. It
is preferred where we can use it, and makes source code sizes smaller.
coreutils 5.0.0
===============
GNU Coreutils is a collection of widely used utilities such as ``cat``,
``chmod``, ``chown``, ``cp``, ``install``, ``ln``, ``ls``, ``mkdir``,
``mknod``, ``mv``, ``rm``, ``rmdir``, ``tee``, ``test``, ``true``, and
many others.
A few of the utilities cannot be easily compiled with Mes C library, so
we skip them.
The ``cp`` in this stage replaces the ``mescc-tools-extra`` ``cp``.
heirloom devtools
=================
``lex`` and ``yacc`` from the Heirloom project. The Heirloom project is
a collection of standard UNIX utilities derived from code by Caldera and
Sun. Differently from the analogous utilities from the GNU project, they
can be compiled with a simple ``Makefile``.
bash 2.05b
==========
GNU ``bash`` is the most well known shell and the most complex piece of
software so far. However, it comes with a number of great benefits over
kaem, including proper POSIX sh support, globbing, etc.
Bash ships with a bison pre-generated file here which we delete.
Unfortunately, we have not bootstrapped bison but fortunately for us,
heirloom yacc is able to cope here.
flex 2.5.11
===========
``flex`` is a tool for generating lexers or scanners: programs that
recognize lexical patters.
Unfortunately ``flex`` also depends on itself for compiling its own
scanner, so first flex 2.5.11 is compiled, with its scanner definition
manually modified so that it can be processed by lex for the Heirloom
project (the required modifications are mostly syntactical, plus a few
workarounds to avoid some flex advanced features).
musl 1.1.24
===========
``musl`` is a C standard library that is lightweight, fast, simple,
free, and strives to be correct in the sense of standards-conformance
and safety. ``musl`` is used by some distributions of GNU/Linux as their
C library. Our previous Mes C library was incomplete which prevented us
from building many newer or more complex programs.
``tcc`` has slight problems when building and linking ``musl``, so we
apply a few patches. In particular, we replace all weak symbols with
strong symbols and will patch ``tcc`` in the next step to ignore
duplicate symbols.
tcc 0.9.27 (musl)
=================
We recompile ``tcc`` against musl. This is a two stage process. First we
build tcc-0.9.27 that itself links to Mes C library but produces
binaries linked to musl. Then we recompile newly produced tcc with
itself. Interestingly, tcc-0.9.27 linked against musl is self hosting.
musl 1.1.24 (tcc-musl)
======================
We now rebuild ``musl`` with ``tcc-musl`` of Part 22, which fixes a
number of bugs, particularly regarding floats, in the first ``musl``.
tcc 0.9.27 (musl v2)
====================
Now that we have a fixed ``musl``, we now recompile ``tcc`` as ``tcc``
uses floats extensively.
.. _bzip2-1.0.8-1:
bzip2 1.0.8
===========
``bzip2`` is rebuilt unpatched with the new tcc and musl fixing issues
with reading files from stdin that existed in the previous build.
m4 1.4.7
========
``m4`` is the first piece of software we need in the autotools suite,
flex 2.6.4 and bison. It allows macros to be defined and files to be
generated from those macros.
flex 2.6.14
===========
We recompile unpatched GNU ``flex`` using older flex 2.5.11. This is
again a two stage process, first compiling flex using ``scan.c`` (from
``scan.l``) created by old flex, then recompile ``scan.c`` using the new
version of flex to remove any buggy artifacts from the old flex.
bison 3.4.1
===========
GNU ``bison`` is a parser generator. With ``m4`` and ``flex`` we can now
bootstrap it following https://gitlab.com/giomasce/bison-bootstrap. Its
a 3 stage process:
1. Build bison using a handwritten grammar parser in C.
2. Use bison from previous stage on a simplified bison grammar file.
3. Build bison using original grammar file.
Finally we have a fully functional ``bison`` executable.
grep 2.4
========
GNU ``grep`` is a pattern matching utility. Is is not immediately needed
but will be useful later for autotools.
diffutils 2.7
=============
``diffutils`` is useful for comparing two files. It is not immediately
needed but is required later for autotools.
coreutils 5.0
=============
``coreutils`` is rebuilt against musl. Additional utilities are built
including ``comm``, ``expr``, ``date``, ``dd``, ``sort``, ``uname`` and
``uniq``. This fixes a variety of issues with existing ``coreutils``.
gawk 3.0.4
==========
``gawk`` is the GNU implementation of ``awk``, yet another pattern
matching and data extraction utility. It is also required for autotools.
perl 5.000
==========
Perl is a general purpose programming language that is especially
suitable for text processing. It is essential for autotools build system
because automake and some other tools are written in Perl.
Perl itself is written in C but ships with some pre-generated files that
need perl for processing, namely ``embed.h`` and ``keywords.h``. To
bootstrap Perl we will start with the oldest Perl 5 version which has
the fewest number of pregenerated files. We reimplement two remaining
perl scripts in awk and use our custom makefile instead of Perls
pre-generated Configure script.
At this first step we build ``miniperl`` which is ``perl`` without
support for loading modules.
perl 5.003
==========
We now use ``perl`` from the previous stage to recreate pre-generated
files that are shipped in perl 5.003. But for now we still need to use
handwritten makefile instead of ``./Configure`` script.
perl 5.004_05
=============
Yet another version of perl; the last version buildable with 5.003.
perl 5.005_03
=============
More perl! This is the last version buildable with 5.004. It also
introduces the new pregenerated files ``regnodes.h`` and
``byterun.{h,c}``.
perl 5.6.2
==========
Even more perl. 5.6.2 is the last version buildable with 5.004.

View File

@ -18,13 +18,13 @@ incdir=${prefix}/include
MES_PREFIX=${prefix}/mes/src/mes
GUILE_LOAD_PATH=${prefix}/mes/src/nyacc/module:${prefix}/mes/src/mes/mes/module:${prefix}/mes/src/mes/module
# Part 2: cp and chown (mescc-tools-extra)
# cp and chown (mescc-tools-extra)
pkg="mescc-tools-extra"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 3: Remove remaining dependencies on / (root of /after)
# Remove remaining dependencies on / (root of /after)
cp ../bin/hex2 bin/hex2
cp ../bin/M1 bin/M1
cp ../bin/M2-Planet bin/M2-Planet
@ -38,92 +38,92 @@ chmod 755 bin/hex2 bin/M1 bin/M2-Planet bin/blood-elf \
fletcher16 mescc-tools-seed-checksums
PATH=/after/bin
# Part 5: mes
# mes
pkg="mes"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 6: tcc 0.9.26
# tcc 0.9.26
pkg="tcc-0.9.26"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 7: tcc 0.9.27
# tcc 0.9.27
pkg="tcc-0.9.27"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 8: sed
# sed
pkg="sed-4.0.7"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 9: tar
# tar
pkg="tar-1.12"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 10: gzip
# gzip
pkg="gzip-1.2.4"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 11: patch
# patch
pkg="patch-2.5.9"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 12: sha-2
# sha-2
pkg="sha-2-61555d"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 12a: Check all up to this part as sha256sum
# Check all up to this part as sha256sum
sha256sum -c pre-sha.sha256sums
# Part 13: mes-libc-patched
# mes-libc-patched
cd tcc-0.9.27
kaem --file mes-libc-patched.kaem
cd ..
# Part 14: tcc-patched
# tcc-patched
cd tcc-0.9.27
kaem --file tcc-patched.kaem
cd ..
# Part 15: make
# make
pkg="make-3.80"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 16: bzip2
# bzip2
pkg="bzip2-1.0.8"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 17: coreutils
# coreutils
pkg="coreutils-5.0"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 18: heirloom-devtools
# heirloom-devtools
pkg="heirloom-devtools-070527"
cd ${pkg}
kaem --file ${pkg}.kaem
cd ..
# Part 19: bash
# bash
pkg="bash-2.05b"
cd ${pkg}
kaem --file ${pkg}.kaem

View File

@ -12,60 +12,47 @@ set -e
export PREFIX=/after
# Part 20
build flex-2.5.11
# Part 21
build musl-1.1.24 musl-1.1.24.sh checksums/pass1
# Part 22
# Rebuild tcc using musl
build tcc-0.9.27 tcc-musl-pass1.sh checksums/tcc-musl-pass1
# Part 23
# Rebuild musl using tcc-musl
build musl-1.1.24 musl-1.1.24.sh checksums/pass2
# Part 24
# Rebuild tcc-musl using new musl
build tcc-0.9.27 tcc-musl-pass2.sh checksums/tcc-musl-pass2
# Part 25
# Rebuild bzip2 using musl
build bzip2-1.0.8 bzip2-1.0.8.sh checksums/bzip2-pass2
# Part 26
build m4-1.4.7
# Part 27
build flex-2.6.4
# Part 28
build bison-3.4.1 stage1.sh checksums/stage1
build bison-3.4.1 stage2.sh checksums/stage2
build bison-3.4.1 stage3.sh checksums/stage3
# Part 29
build grep-2.4
# Part 30
build diffutils-2.7
# Part 31
# Rebuild coreutils using musl
build coreutils-5.0 coreutils-5.0.sh checksums/pass2
# Part 32
build gawk-3.0.4
# Part 33
build perl-5.000
# Part 34
build perl-5.003
# Part 35
build perl5.004_05
# Part 36
build perl5.005_03
# Part 37
build perl-5.6.2
echo "Bootstrapping completed."