Update docs with new changes

2023-11-24 14:59:30 +11:00 · 2023-11-24 14:59:30 +11:00 · ee77ef837d
parent e06a19f9e2
commit ee77ef837d
3 changed files with 203 additions and 210 deletions
--- a/DEVEL.md
+++ b/DEVEL.md
@ -14,31 +14,40 @@ and that a full build completes.

 ## Structure

-Each system corresponds to a reboot of the live environment. There is only one
-appropriate structure as shown below (eg for sysa):
-
 ```
-sysa
-├── any-global-files.sh
+seed
+├── seed.kaem
+├── script-generator.c
+├── ...
+└── stage0-posix
+
+steps
+├── manifest
+├── any-global-files
+├── jump
+│   └── linux.sh
+├── improve
+│   └── x.sh
 ├── somepackage-version
-│   ├── somepackage-version.kaem (or .sh)
+│   ├── pass1.kaem
+│   ├── pass2.sh
 │   ├── files
 │   ├── simple-patches
 │   ├── mk
 │   └── patches
-└── tmp
 ```

-Global scripts that drive the entire system go directly under `sysx`. `tmp`
-contains the temporary system used for QEMU or a chroot.
+The `seed` directory contains everything required for `script-generator` to be
+run.

-Then, each package is in its own specific directory, named `package-version`.
-It then diverges based upon which driver is being used:
-
- `kaem`: A file named `package-version.kaem` is called by the master script.
- `bash`: The `build` function from helper.sh is called from the master script.
-  There are default functions run which can be overridden by an optional script
-  `package-version.sh` within the package-specific directory.
+In the `steps` directory, the bootstrap process is defined in `manifest`.
+Each package to be built is named `package-version`.
+Each subsequent build of a package is the nth pass. Scripts are named
+accordingly; eg, the first build would be called `pass1.sh`, the second would be
+`pass2.sh`, etc.
+Scripts run in kaem era should be denoted as such in their filename;
+`pass1.kaem`, for example. Pass numbers do not reset after kaem, ie, you cannot
+have both `pass1.kaem` and `pass1.sh`.

 In this folder, there are other folders/files. `*.checksums` are
 required for early packages that are build with kaem, others are optional.
@ -51,21 +60,16 @@ Permissible folders/files:
 - `simple-patches`: patches for the source that use the before/after convention of simple-patch.c
 - `*.checksums`: files containing the checksums for the resulting binaries and
 libraries that are compiled and installed.
-  - Up to and including `coreutils-6.10`, `sha256sum` from `stage0-posix`
-    is used for the checksumming. Later we switch to GNU version.
-  - To extract checksums of the binaries, use of qemu mode is recommended
-    (i.e. `./rootfs.py -q -qk $kernel --update-checksums`).
- compilation script
-
-The directory m2-functions is used for M2-Planet functions (currently).
+  - Otherwise, the package's checksum is in SHA256SUMS.pkgs.
+- compilation script(s)

 ## Conventions

 - **Patches:**
  - all patches are `-p0`
  - all patches begin with a patch header
- **README:**
-  - all stages are explained in README
+- **parts.rst:**
+  - all packages are explained in `parts.rst`
 - **General:**
  - Where possible, all blocks of text should be limited to a length of 80
    characters.
@ -79,9 +83,3 @@ The directory m2-functions is used for M2-Planet functions (currently).
  - Patches are licensed under the license of the project which they are
    patching.
  - All files (excluding files within submodules) must comply with REUSE v3.0.
-
-## git
-
-All changes must be submitted as PRs. Pushing to master is disallowed, even if
-push access is granted to a user. Only pushes to master should be merging of
-patches into master.
--- a/README.rst
+++ b/README.rst
@ -12,94 +12,90 @@ An attempt to provide a reproducible, automatic, complete end-to-end
 bootstrap from a minimal number of binary seeds to a supported fully
 functioning operating system.

-Get me started!
---------------
+How do I use this?
+------------------
+
+Quick start:
+
+See ``./rootfs.py --help`` and follow the instructions given there.
+This uses a variety of userland tools to prepare the bootstrap.
+
+(*Currently, there is no way to perform the bootstrap without external
+preparations! This is a currently unsolved problem.*)
+
+Without using Python:

 1. ``git clone https://github.com/fosslinux/live-bootstrap``
 2. ``git submodule update --init --recursive``
-3. Provide a kernel (vmlinuz file) as the name ``kernel`` in the root of the
-   repository. **This must be a 32-bit kernel.**
-4. ``./rootfs.py --qemu`` - ensure your account has kvm privileges and qemu
-   installed.
-
-   a. Alternatively, run ``./rootfs.py --chroot`` to run it in a chroot.
-   b. Alternatively, run ``./rootfs.py --bwrap`` to run it in a bubblewrap
-      sandbox. When user namespaces are supported, this mode is rootless.
-   c. Alternatively, run ``./rootfs.py`` but don’t run the actual
-      virtualization and instead copy sysa/tmp/initramfs to a USB or
-      some other device and boot from bare metal. NOTE: we now require
-      a hard drive. This is currently hardcoded as sda. You also need
-      to put ``sysc/tmp/disk.img`` onto your sda on the bootstrapping
-      machine.
-   d. Alternatively, do not use python at all, see "Python-less build"
-      below.
-
-5. Wait.
-6. If you can, observe the many binaries in ``/usr/bin``! When the
-   bootstrap is completed ``bash`` is launched providing a shell to
-   explore the system.
-
+3. Consider whether you are going to run this in a chroot, in QEMU, or on bare
+   metal. (All of this *can* be automated, but not in a trustable way. See
+   further below.)
+   a. **chroot:** Create a directory where the chroot will reside, run
+   ``./download-distfiles.sh``, and copy:
+      * The entire contents of ``seed/stage0-posix`` into that directory.
+      * All other files in ``seed`` into that directory.
+      * ``steps/`` and ``distfiles/`` into that directory.
+        * At least all files listed in ``steps/pre-network-sources`` must be
+          copied in. All other files will be obtained from the network.
+      * Run ``/bootstrap-seeds/POSIX/x86/kaem-optional-seed`` in the chroot.
+        (Eg, ``chroot rootfs /bootstrap-seeds/POSIX/x86/kaem-optional-seed``).
+   b. **QEMU:** Create two blank disk images.
+      * On the first image, write
+        ``seed/stage0-posix/bootstrap-seeds/NATIVE/x86/builder-hex0-x86-stage1.img``
+        to it, followed by ``kernel-bootstrap/builder-hex0-x86-stage2.hex0``,
+        followed by zeros padding the disk to the next sector.
+      * distfiles can be obtained using ``./download-distfiles.sh``.
+      * See the list in part a. For every file within that list, write a line to
+        the disk ``src <size-of-file> <path-to-file>``, followed by the contents
+        of the file.
+        * *Only* copy distfiles listed in ``steps/pre-network-sources`` into
+          this disk.
+      * Optionally (if you don't do this, distfiles will be network downloaded):
+        * On the second image, create an MSDOS partition table and one ext3
+          partition.
+        * Copy ``distfiles/`` into this disk.
+      * Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (in the
+        order introduced above), a NIC with model E1000 (``-nic
+        user,model=e1000``), and ``-machine kernel-irqchip=split``.
+   c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be
+   two different *physical* disks, and boot from the first disk.

 Background
 ----------

-This project is a part of the bootstrappable project, a project that
-aims to be able to build complete computing platforms through the use of
-source code. When you build a compiler like GCC, you need another C
-compiler to compile the compiler - turtles all the way down. Even the
-first GCC compiler was written in C. There has to be a way to break the
-chain…
+Problem statement
+=================

-There has been significant work on this over the last 5 years, from
-Jeremiah Orians’ stage0, hex2 and M2-Planet to janneke’s Mes. We have a
-currently, fully-functioning chain of bootstrapping from the 357-byte
-hex0 seed to a complete GCC compiler and hence a full Linux operating
-system. From there, it is trivial to move to other UNIXes. However,
-there is only currently one vector through which this can be
-automatically done, GNU Guix.
+live-bootstrap's overarching problem statement is;

-While the primary author of this project does not believe Guix is a bad
-project, the great reliance on Guile, the complexity of many of the
-scripts and the rather steep learning curve to install and run Guix make
-it a very non plug-and-play solution. Furthermore, there is currently
-(Jan 2021) no possible way to run the bootstrap from outside of a
-pre-existing Linux environment. Additionally, Guix uses many scripts and
-distributed files that cannot be considered source code.
+> How can a usable Linux system be created with only human-auditable, and
+wherever possible, human-written, source code?

-(NOTE: Guix is working on a Full Source Bootstrap, but I’m not
-completely sure what that entails).
+Clarifications:

-Furthermore, having an alternative bootstrap automation tool allows
-people to have greater trust in the bootstrap procedure.
+* "usable" means a modern toolchain, with appropriate utilities, that can be
+  used to expand the amount of software on the system, interactively, or
+  non-interactively.
+* "human-auditable" is discretionary, but is usually fairly strict. See
+  "Specific things to be bootstrapped" below.

-Comparison between GNU Guix and live-bootstrap
----------------------------------------------
+Why is this difficult?
+======================

-+----------------------+----------------------+----------------------+
-| Item                 | Guix                 | live-bootstrap       |
-+======================+======================+======================+
-| Total size of seeds  | ~30MB (Reduced       | ~1KB                 |
-| [1]                  | Source Bootstrap)    |                      |
-|                      | [2]                  |                      |
-+----------------------+----------------------+----------------------+
-| Use of kernel        | Linux-Libre Kernel   | Any Linux Kernel     |
-|                      |                      | (2.6+) [3]           |
-+----------------------+----------------------+----------------------+
-| Implementation       | Yes                  | No (in development)  |
-| complete             |                      |                      |
-+----------------------+----------------------+----------------------+
-| Automation           | Almost fully         | Optional user        |
-|                      | automatic            | customization        |
-+----------------------+----------------------+----------------------+
+The core of a modern Linux system is primarily written in C and C++. C and C++
+are **self-hosting**, ie, nearly every single C compiler is written in C.

-[1]: Both projects only use software licensed under a FSF-approved
-free software license. Kernel is excluded from seed.
-[2]: Reiterating that Guix is working on a full source bootstrap,
-although that still uses guile (~12 MB). [3]: Work is ongoing to use
-other, smaller POSIX kernels.
+Every single version of GCC was written in C. To avoid using an existing
+toolchain, we need some way to be able to compile a GCC version without C. We
+can use a less well-featured compiler, TCC, to do this. And so forth, until we
+get to a fairly primitive C compiler written in assembly, ``cc_x86``.

-Why would I want bootstrapping?
-------------------------------
+Going up through this process requires a bunch of other utilities as well; the
+autotools suite, guile and autogen, etc. These also have to be matched
+appropriately to the toolchain available.
+
+Why should I care?
+------------------

 That is outside of the scope of this README. Here’s a few things you can
 look at:
@ -117,7 +113,7 @@ bootstrapping. However, there are a number of non-auditable files used
 in many of their packages. Here is a list of file types that we deem
 unsuitable for bootstrapping.

-1. Binaries (apart from seed hex0, kaem, kernel).
+1. Binaries (apart from seed hex0, kaem, builder-hex0).
 2. Any pre-generated configure scripts, or Makefile.in’s from autotools.
 3. Pre-generated bison/flex parsers (identifiable through a ``.y``
   file).
@ -131,56 +127,18 @@ How does this work?

 **For a more in-depth discussion, see parts.rst.**

-sysa
-~~~~
+Firstly, ``builder-hex0`` is launched. ``builder-hex0`` is a minimal kernel that is
+written in ``hex0``, existing in 3 self-bootstrapping stages.

-sysa is the first ‘system’ used in live-bootstrap. We move to a new
-system after a reboot, which often occurs after the movement to a new
-kernel. It is run by the seed Linux kernel provided by the user. It
-compiles everything we need to be able to compile our own Linux kernel.
-It runs fully in an initramfs and does not rely on disk support in the
-seed Linux kernel.
+This is capable of executing the entirety of ``stage0-posix``, (see
+``seed/stage0-posix``), which produces a variety of useful utilities and a basic
+C language, ``M2-Planet``.

-sysb
-~~~~
+``stage0-posix`` runs a file called ``after.kaem``. This is a shell script that
+builds and runs a small program called ``script-generator``. This program reads
+``steps/manifest`` and converts it into a series of shell scripts that can be
+executed in sequence to complete the bootstrap.

-sysb is the second 'system' of live-bootstrap. This uses the Linux 4.9.10
-kernel compiled within sysa. As we do not rely on disk support in sysa, we
-need this intermediate system to be able to add the missing binaries to sysc
-before moving into it. This is executed through kexec from sysa. At this point,
-a SATA disk IS required.
-
-sysc
-~~~~
-
-sysc is the (current) last 'system' of live-bootstrap. This is a continuation
-from sysb, executed through util-linux's ``switch_root`` command which moves
-the entire rootfs without a reboot. Every package from here on out is compiled
-under this system, taking binaries from sysa. Chroot and bubblewrap modes skip
-sysb, as it is obviously irrelevant to them.
-
-Python-less build
-----------------
-
-Python is no longer a requirement to set up the build system. The
-repository is almost completely in a form where it can be used as the
-source of a build.
-
-1. Download required tarballs into ``sysa/distfiles`` and ``sysc/distfiles``.
-   You can use the ``download-distfiles.sh`` script.
-2. Copy sysa/stage0-posix/src/* to the root of the repository.
-3. Copy sysa/stage0-posix/src/bootstrap-seeds/POSIX/x86/kaem-optional-seed
-   to init in the root of the repository.
-4. Copy sysa/after.kaem to after.kaem
-5. Create a CPIO archive (eg, ``cpio --format newc --create --directory . > ../initramfs``).
-6. Boot your initramfs and kernel.
-
-chroot builds
-~~~~~~~~~~~~~
-
-For chroot  based bootstraps you can skip creation of initramfs and instead start bootstrap with
-
-``sudo chroot . bootstrap-seeds/POSIX/x86/kaem-optional-seed``
-
-It is also recommended to copy everything to a new directory as bootstrapping messes up with files
-in git repository and cannot be re-run again.
+From this point forward, ``steps/manifest`` is effectively self documenting.
+Each package built exists in ``steps/<pkg>``, and the build scripts can be seen
+there.
--- a/parts.rst
+++ b/parts.rst
@ -155,14 +155,46 @@ checksumming tool, that we use to ensure reproducibility and authenticity
 of generated binaries. We also build initial ``untar``, ``ungz`` and ``unbz2``
 utilities to deal with compressed archives.

-``/sysa``
-=========
+live-bootstrap seed
+===================

-We now move into the ``/sysa`` directory. As stage0-posix has no
-concept of ``chdir()`` (not added until very late in stage0-posix),
-we have to copy a lot of files into the root of the initramfs, making it
-very messy. We get into the move ordered directory ``/sysa`` here,
-copying over all of the required binaries from ``/``.
+``stage0-posix`` executes a file ``after.kaem``, which creates a kaem script to
+continue the bootstrap. This is responsible for cleaning up the mess in
+``/x86/bin`` and moving it to the permanent ``/usr/bin``, and setting a few
+environment variables.
+
+script-generator
+================
+
+``script-generator`` is a program that translates live-bootstrap's
+domain-specific manifest language into shell scripts that can be run to complete
+the bootstrap. The translator is implemented in ``M2-Planet``.
+
+The language is fairly simple; each line has the format
+``<directive>: <arguments> <predicate>``. A predicate only runs the line if a
+particular condition is true.
+
+The following directives are supported:
+
+* ``build``, builds a particular package defined in ``steps/``.
+* ``improve``, runs a script making a distinct and logical improvement to the
+  live bootstrap system.
+* ``define``, define a variable evaluated from other constants/variables.
+* ``jump``, moves into a new rootfs/kernel using a custom script.
+
+checksum-transcriber 1.0
+========================
+
+``checksum-transcriber`` is a small program that converts live-bootstrap's
+source specification for packages into a SHA256SUM file that can be used to
+checksum source tarballs.
+
+simple-patch 1.0
+================
+
+``simple-patch`` is a rudimentary patching program. It works by matching for a
+text block given to it, and replacing it with another text block. This is
+sufficient for the early patching required before we have full proper GNU patch.

 mes 0.25
 ========
@ -177,6 +209,10 @@ to this part:
 2. We then use this to recompile the Mes interpreter as well as building
   the libc. This second interpreter is faster and less buggy.

+From this point until musl, we are capable of making non-standard and strange
+libraries. All libraries are in ``/usr/lib/mes``, and includes are in
+``/usr/include/mes``, as they are incompatible with musl.
+
 tinycc 0.9.26
 =============

@ -215,8 +251,8 @@ This is a Linux 2.0 clone which is much simpler to understand and build than
 Linux.  This version of Fiwix is a fork of 1.4.0 that contains many
 modifications and enhancements to support live-boostrap.

-lwext4 1.0.0
-============
+lwext4 1.0.0-lb1
+================

 If the kernel bootstrap option is enabled then `lwext4 <https://github.com/gkostka/lwext4>`
 is built next. This is a library for creating ext2/3/4 file systems from user land.
@ -230,11 +266,19 @@ kexec-fiwix
 If the kernel bootstrap option is enabled then a C program `kexec-fiwix` is compiled
 and run which places the Fiwix ram drive in memory and launches the Fiwix kernel.

-kexec-linux
-===========
+esfu 1.0
+========

-If the kernel bootstrap option is enabled then a C program `kexec-linux` is compiled.
-This is used as part of the go_sysb step later to launch the Linux kernel.
+This is an extremely crippled basic implementation of ``mount`` and ``mknod``.
+Sufficient only for the next step.
+
+early_mount_disk
+================
+
+When using kernel bootstrap, distfiles from this point exist on an external
+disk. Using ``esfu``'s ``mount`` and ``mknod``, we are able to mount this disk.
+This is unnecessary when not using kernel bootstrap as everything is done on the
+disk.

 make 3.82
 =========
@ -304,6 +348,12 @@ Bash ships with a bison pre-generated file here which we delete.
 Unfortunately, we have not bootstrapped bison but fortunately for us,
 heirloom yacc is able to cope here.

+update_env
+==========
+
+This is a simple script that makes some small updates to the env file that were
+not possible when using kaem.
+
 flex 2.5.11
 ===========

@ -321,8 +371,8 @@ tcc 0.9.27 (patched)

 We recompile ``tcc`` with some patches needed to build musl.

-musl 1.1.24
-===========
+musl 1.1.24 and musl_libdir
+===========================

 ``musl`` is a C standard library that is lightweight, fast, simple,
 free, and strives to be correct in the sense of standards-conformance
@ -335,6 +385,9 @@ apply a few patches. In particular, we replace all weak symbols with
 strong symbols and will patch ``tcc`` in the next step to ignore
 duplicate symbols.

+We do not use any of ``/usr/lib/mes`` or ``/usr/include/mes`` any longer, rather
+using ``/usr/lib`` and ``/usr/include`` like normal.
+
 tcc 0.9.27 (musl)
 =================

@ -586,12 +639,6 @@ libtool 2.2.4
 GNU Libtool is the final part of GNU Autotools. It is a script used to hide away differences
 when compiling shared libraries on different platforms.

-bash 2.05b
-==========
-
-Up to this point, our build of ``bash`` could run scripts but could not be used
-interactively. Rebuilding bash makes this functionality work.
-
 automake 1.15.1
 ===============

@ -646,6 +693,12 @@ GCC can build the latest as of the time of writing musl version.
 We also don't need any of the TCC patches that we used before.
 To accomodate Fiwix, there are patches to avoid syscalls set_thread_area and clone.

+Linux headers 5.10.41
+=====================
+
+This gets some headers out of the Linux kernel that are required to use the
+kernel ABI, needed for ``util-linux``.
+
 gcc 4.0.4
 =========

@ -655,10 +708,15 @@ util-linux 2.19.1
 =================

 ``util-linux`` contains a number of general system administration utilities.
-Most pressingly, we need these for being able to mount disks (for non-chroot
-mode, but it is built it in chroot mode anyway because it will likely be useful
-later). The latest version is not used because of autotools/GCC
-incompatibilities.
+This gives us access to a much less crippled version of ``mount`` and ``mknod``.
+The latest version is not used because of autotools/GCC incompatibilities.
+
+move_disk
+=========
+
+In ``kernel-bootstrap`` mode, we have been working off an initramfs for some
+things up until now. At this point we are now capable of moving to it entirely,
+so we do so.

 kbd-1.15
 ========
@ -685,6 +743,12 @@ bc 1.07.1
 ``bc`` is a console based calculator that is sometime used in scripts. We need ``bc``
 to rebuild some Linux kernel headers.

+kexec-linux
+===========
+
+If the kernel bootstrap option is enabled then a C program ``kexec-linux`` is compiled.
+This can be used to launch a Linux kernel from Fiwix.
+
 kexec-tools 2.0.22
 ==================

@ -693,13 +757,6 @@ Linux kernel without a manual restart from within a running system. It is a
 kind of soft-restart. It is only built for non-chroot mode, as we only use it
 in non-chroot mode. It is used to go into sysb/sysc.

-create_sysb
-===========
-
-The next step is not a package, but the creation of the sysb rootfs, containing
-all of the scripts for sysb (which merely move to sysc). Again, this is only
-done in non-chroot mode, because sysb does not exist in chroot mode.
-
 Linux kernel 4.9.10
 ===================

@ -716,30 +773,10 @@ so we use a ``find`` command to remove those, which are automatically regenerate
 The kernel config was originally taken from Void Linux, and was then modified
 for the requirements of live-bootstrap, including compiler features, drivers,
 and removing modules. Modules are unused. They are difficult to transfer to
-subsequent systems, and we do not have ``modprobe``. Lastly,
-the initramfs of sysb is generated in this stage, using ``gen_init_cpio`` within
-the Linux kernel tree. This avoids the compilation of ``cpio`` as well.
+subsequent systems, and we do not have ``modprobe``.

-musl 1.2.4
-==========
-Prior to booting Linux, musl is rebuilt yet again with syscalls
-``clone`` and ``set_thread_area`` enabled for Linux thread support.
-
-go_sysb
-=======
-
-This is the last step of sysa, run for non-chroot mode. It uses kexec to load
-the new Linux kernel into RAM and execute it, moving into sysb.
-
-In chroot, sysb is skipped, and data is transferred directly to sysc and
-chrooted into.
-
-sysb
-====
-
-sysb is purely a transition to sysc, allowing binaries from sysa to get onto a
-disk (as sysa does not necessarily have hard disk support in the kernel).
-It populates device nodes, mounts sysc, copies over data, and executes sysc.
+We then kexec to use the new Linux kernel, using ``kexec-tools`` for a Linux
+kernel and ``kexec-linux`` for Fiwix.

 curl 7.88.1
 ===========