Update docs with new changes

This commit is contained in:
fosslinux 2023-11-24 14:59:30 +11:00
parent e06a19f9e2
commit ee77ef837d
3 changed files with 203 additions and 210 deletions

View File

@ -14,31 +14,40 @@ and that a full build completes.
## Structure
Each system corresponds to a reboot of the live environment. There is only one
appropriate structure as shown below (eg for sysa):
```
sysa
├── any-global-files.sh
seed
├── seed.kaem
├── script-generator.c
├── ...
└── stage0-posix
steps
├── manifest
├── any-global-files
├── jump
│   └── linux.sh
├── improve
│   └── x.sh
├── somepackage-version
│   ├── somepackage-version.kaem (or .sh)
│   ├── pass1.kaem
│   ├── pass2.sh
│   ├── files
│   ├── simple-patches
│   ├── mk
│   └── patches
└── tmp
```
Global scripts that drive the entire system go directly under `sysx`. `tmp`
contains the temporary system used for QEMU or a chroot.
The `seed` directory contains everything required for `script-generator` to be
run.
Then, each package is in its own specific directory, named `package-version`.
It then diverges based upon which driver is being used:
- `kaem`: A file named `package-version.kaem` is called by the master script.
- `bash`: The `build` function from helper.sh is called from the master script.
There are default functions run which can be overridden by an optional script
`package-version.sh` within the package-specific directory.
In the `steps` directory, the bootstrap process is defined in `manifest`.
Each package to be built is named `package-version`.
Each subsequent build of a package is the nth pass. Scripts are named
accordingly; eg, the first build would be called `pass1.sh`, the second would be
`pass2.sh`, etc.
Scripts run in kaem era should be denoted as such in their filename;
`pass1.kaem`, for example. Pass numbers do not reset after kaem, ie, you cannot
have both `pass1.kaem` and `pass1.sh`.
In this folder, there are other folders/files. `*.checksums` are
required for early packages that are build with kaem, others are optional.
@ -51,21 +60,16 @@ Permissible folders/files:
- `simple-patches`: patches for the source that use the before/after convention of simple-patch.c
- `*.checksums`: files containing the checksums for the resulting binaries and
libraries that are compiled and installed.
- Up to and including `coreutils-6.10`, `sha256sum` from `stage0-posix`
is used for the checksumming. Later we switch to GNU version.
- To extract checksums of the binaries, use of qemu mode is recommended
(i.e. `./rootfs.py -q -qk $kernel --update-checksums`).
- compilation script
The directory m2-functions is used for M2-Planet functions (currently).
- Otherwise, the package's checksum is in SHA256SUMS.pkgs.
- compilation script(s)
## Conventions
- **Patches:**
- all patches are `-p0`
- all patches begin with a patch header
- **README:**
- all stages are explained in README
- **parts.rst:**
- all packages are explained in `parts.rst`
- **General:**
- Where possible, all blocks of text should be limited to a length of 80
characters.
@ -79,9 +83,3 @@ The directory m2-functions is used for M2-Planet functions (currently).
- Patches are licensed under the license of the project which they are
patching.
- All files (excluding files within submodules) must comply with REUSE v3.0.
## git
All changes must be submitted as PRs. Pushing to master is disallowed, even if
push access is granted to a user. Only pushes to master should be merging of
patches into master.

View File

@ -12,94 +12,90 @@ An attempt to provide a reproducible, automatic, complete end-to-end
bootstrap from a minimal number of binary seeds to a supported fully
functioning operating system.
Get me started!
---------------
How do I use this?
------------------
Quick start:
See ``./rootfs.py --help`` and follow the instructions given there.
This uses a variety of userland tools to prepare the bootstrap.
(*Currently, there is no way to perform the bootstrap without external
preparations! This is a currently unsolved problem.*)
Without using Python:
1. ``git clone https://github.com/fosslinux/live-bootstrap``
2. ``git submodule update --init --recursive``
3. Provide a kernel (vmlinuz file) as the name ``kernel`` in the root of the
repository. **This must be a 32-bit kernel.**
4. ``./rootfs.py --qemu`` - ensure your account has kvm privileges and qemu
installed.
a. Alternatively, run ``./rootfs.py --chroot`` to run it in a chroot.
b. Alternatively, run ``./rootfs.py --bwrap`` to run it in a bubblewrap
sandbox. When user namespaces are supported, this mode is rootless.
c. Alternatively, run ``./rootfs.py`` but dont run the actual
virtualization and instead copy sysa/tmp/initramfs to a USB or
some other device and boot from bare metal. NOTE: we now require
a hard drive. This is currently hardcoded as sda. You also need
to put ``sysc/tmp/disk.img`` onto your sda on the bootstrapping
machine.
d. Alternatively, do not use python at all, see "Python-less build"
below.
5. Wait.
6. If you can, observe the many binaries in ``/usr/bin``! When the
bootstrap is completed ``bash`` is launched providing a shell to
explore the system.
3. Consider whether you are going to run this in a chroot, in QEMU, or on bare
metal. (All of this *can* be automated, but not in a trustable way. See
further below.)
a. **chroot:** Create a directory where the chroot will reside, run
``./download-distfiles.sh``, and copy:
* The entire contents of ``seed/stage0-posix`` into that directory.
* All other files in ``seed`` into that directory.
* ``steps/`` and ``distfiles/`` into that directory.
* At least all files listed in ``steps/pre-network-sources`` must be
copied in. All other files will be obtained from the network.
* Run ``/bootstrap-seeds/POSIX/x86/kaem-optional-seed`` in the chroot.
(Eg, ``chroot rootfs /bootstrap-seeds/POSIX/x86/kaem-optional-seed``).
b. **QEMU:** Create two blank disk images.
* On the first image, write
``seed/stage0-posix/bootstrap-seeds/NATIVE/x86/builder-hex0-x86-stage1.img``
to it, followed by ``kernel-bootstrap/builder-hex0-x86-stage2.hex0``,
followed by zeros padding the disk to the next sector.
* distfiles can be obtained using ``./download-distfiles.sh``.
* See the list in part a. For every file within that list, write a line to
the disk ``src <size-of-file> <path-to-file>``, followed by the contents
of the file.
* *Only* copy distfiles listed in ``steps/pre-network-sources`` into
this disk.
* Optionally (if you don't do this, distfiles will be network downloaded):
* On the second image, create an MSDOS partition table and one ext3
partition.
* Copy ``distfiles/`` into this disk.
* Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (in the
order introduced above), a NIC with model E1000 (``-nic
user,model=e1000``), and ``-machine kernel-irqchip=split``.
c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be
two different *physical* disks, and boot from the first disk.
Background
----------
This project is a part of the bootstrappable project, a project that
aims to be able to build complete computing platforms through the use of
source code. When you build a compiler like GCC, you need another C
compiler to compile the compiler - turtles all the way down. Even the
first GCC compiler was written in C. There has to be a way to break the
chain…
Problem statement
=================
There has been significant work on this over the last 5 years, from
Jeremiah Orians stage0, hex2 and M2-Planet to jannekes Mes. We have a
currently, fully-functioning chain of bootstrapping from the 357-byte
hex0 seed to a complete GCC compiler and hence a full Linux operating
system. From there, it is trivial to move to other UNIXes. However,
there is only currently one vector through which this can be
automatically done, GNU Guix.
live-bootstrap's overarching problem statement is;
While the primary author of this project does not believe Guix is a bad
project, the great reliance on Guile, the complexity of many of the
scripts and the rather steep learning curve to install and run Guix make
it a very non plug-and-play solution. Furthermore, there is currently
(Jan 2021) no possible way to run the bootstrap from outside of a
pre-existing Linux environment. Additionally, Guix uses many scripts and
distributed files that cannot be considered source code.
> How can a usable Linux system be created with only human-auditable, and
wherever possible, human-written, source code?
(NOTE: Guix is working on a Full Source Bootstrap, but Im not
completely sure what that entails).
Clarifications:
Furthermore, having an alternative bootstrap automation tool allows
people to have greater trust in the bootstrap procedure.
* "usable" means a modern toolchain, with appropriate utilities, that can be
used to expand the amount of software on the system, interactively, or
non-interactively.
* "human-auditable" is discretionary, but is usually fairly strict. See
"Specific things to be bootstrapped" below.
Comparison between GNU Guix and live-bootstrap
----------------------------------------------
Why is this difficult?
======================
+----------------------+----------------------+----------------------+
| Item | Guix | live-bootstrap |
+======================+======================+======================+
| Total size of seeds | ~30MB (Reduced | ~1KB |
| [1] | Source Bootstrap) | |
| | [2] | |
+----------------------+----------------------+----------------------+
| Use of kernel | Linux-Libre Kernel | Any Linux Kernel |
| | | (2.6+) [3] |
+----------------------+----------------------+----------------------+
| Implementation | Yes | No (in development) |
| complete | | |
+----------------------+----------------------+----------------------+
| Automation | Almost fully | Optional user |
| | automatic | customization |
+----------------------+----------------------+----------------------+
The core of a modern Linux system is primarily written in C and C++. C and C++
are **self-hosting**, ie, nearly every single C compiler is written in C.
[1]: Both projects only use software licensed under a FSF-approved
free software license. Kernel is excluded from seed.
[2]: Reiterating that Guix is working on a full source bootstrap,
although that still uses guile (~12 MB). [3]: Work is ongoing to use
other, smaller POSIX kernels.
Every single version of GCC was written in C. To avoid using an existing
toolchain, we need some way to be able to compile a GCC version without C. We
can use a less well-featured compiler, TCC, to do this. And so forth, until we
get to a fairly primitive C compiler written in assembly, ``cc_x86``.
Why would I want bootstrapping?
-------------------------------
Going up through this process requires a bunch of other utilities as well; the
autotools suite, guile and autogen, etc. These also have to be matched
appropriately to the toolchain available.
Why should I care?
------------------
That is outside of the scope of this README. Heres a few things you can
look at:
@ -117,7 +113,7 @@ bootstrapping. However, there are a number of non-auditable files used
in many of their packages. Here is a list of file types that we deem
unsuitable for bootstrapping.
1. Binaries (apart from seed hex0, kaem, kernel).
1. Binaries (apart from seed hex0, kaem, builder-hex0).
2. Any pre-generated configure scripts, or Makefile.ins from autotools.
3. Pre-generated bison/flex parsers (identifiable through a ``.y``
file).
@ -131,56 +127,18 @@ How does this work?
**For a more in-depth discussion, see parts.rst.**
sysa
~~~~
Firstly, ``builder-hex0`` is launched. ``builder-hex0`` is a minimal kernel that is
written in ``hex0``, existing in 3 self-bootstrapping stages.
sysa is the first system used in live-bootstrap. We move to a new
system after a reboot, which often occurs after the movement to a new
kernel. It is run by the seed Linux kernel provided by the user. It
compiles everything we need to be able to compile our own Linux kernel.
It runs fully in an initramfs and does not rely on disk support in the
seed Linux kernel.
This is capable of executing the entirety of ``stage0-posix``, (see
``seed/stage0-posix``), which produces a variety of useful utilities and a basic
C language, ``M2-Planet``.
sysb
~~~~
``stage0-posix`` runs a file called ``after.kaem``. This is a shell script that
builds and runs a small program called ``script-generator``. This program reads
``steps/manifest`` and converts it into a series of shell scripts that can be
executed in sequence to complete the bootstrap.
sysb is the second 'system' of live-bootstrap. This uses the Linux 4.9.10
kernel compiled within sysa. As we do not rely on disk support in sysa, we
need this intermediate system to be able to add the missing binaries to sysc
before moving into it. This is executed through kexec from sysa. At this point,
a SATA disk IS required.
sysc
~~~~
sysc is the (current) last 'system' of live-bootstrap. This is a continuation
from sysb, executed through util-linux's ``switch_root`` command which moves
the entire rootfs without a reboot. Every package from here on out is compiled
under this system, taking binaries from sysa. Chroot and bubblewrap modes skip
sysb, as it is obviously irrelevant to them.
Python-less build
-----------------
Python is no longer a requirement to set up the build system. The
repository is almost completely in a form where it can be used as the
source of a build.
1. Download required tarballs into ``sysa/distfiles`` and ``sysc/distfiles``.
You can use the ``download-distfiles.sh`` script.
2. Copy sysa/stage0-posix/src/* to the root of the repository.
3. Copy sysa/stage0-posix/src/bootstrap-seeds/POSIX/x86/kaem-optional-seed
to init in the root of the repository.
4. Copy sysa/after.kaem to after.kaem
5. Create a CPIO archive (eg, ``cpio --format newc --create --directory . > ../initramfs``).
6. Boot your initramfs and kernel.
chroot builds
~~~~~~~~~~~~~
For chroot based bootstraps you can skip creation of initramfs and instead start bootstrap with
``sudo chroot . bootstrap-seeds/POSIX/x86/kaem-optional-seed``
It is also recommended to copy everything to a new directory as bootstrapping messes up with files
in git repository and cannot be re-run again.
From this point forward, ``steps/manifest`` is effectively self documenting.
Each package built exists in ``steps/<pkg>``, and the build scripts can be seen
there.

147
parts.rst
View File

@ -155,14 +155,46 @@ checksumming tool, that we use to ensure reproducibility and authenticity
of generated binaries. We also build initial ``untar``, ``ungz`` and ``unbz2``
utilities to deal with compressed archives.
``/sysa``
=========
live-bootstrap seed
===================
We now move into the ``/sysa`` directory. As stage0-posix has no
concept of ``chdir()`` (not added until very late in stage0-posix),
we have to copy a lot of files into the root of the initramfs, making it
very messy. We get into the move ordered directory ``/sysa`` here,
copying over all of the required binaries from ``/``.
``stage0-posix`` executes a file ``after.kaem``, which creates a kaem script to
continue the bootstrap. This is responsible for cleaning up the mess in
``/x86/bin`` and moving it to the permanent ``/usr/bin``, and setting a few
environment variables.
script-generator
================
``script-generator`` is a program that translates live-bootstrap's
domain-specific manifest language into shell scripts that can be run to complete
the bootstrap. The translator is implemented in ``M2-Planet``.
The language is fairly simple; each line has the format
``<directive>: <arguments> <predicate>``. A predicate only runs the line if a
particular condition is true.
The following directives are supported:
* ``build``, builds a particular package defined in ``steps/``.
* ``improve``, runs a script making a distinct and logical improvement to the
live bootstrap system.
* ``define``, define a variable evaluated from other constants/variables.
* ``jump``, moves into a new rootfs/kernel using a custom script.
checksum-transcriber 1.0
========================
``checksum-transcriber`` is a small program that converts live-bootstrap's
source specification for packages into a SHA256SUM file that can be used to
checksum source tarballs.
simple-patch 1.0
================
``simple-patch`` is a rudimentary patching program. It works by matching for a
text block given to it, and replacing it with another text block. This is
sufficient for the early patching required before we have full proper GNU patch.
mes 0.25
========
@ -177,6 +209,10 @@ to this part:
2. We then use this to recompile the Mes interpreter as well as building
the libc. This second interpreter is faster and less buggy.
From this point until musl, we are capable of making non-standard and strange
libraries. All libraries are in ``/usr/lib/mes``, and includes are in
``/usr/include/mes``, as they are incompatible with musl.
tinycc 0.9.26
=============
@ -215,8 +251,8 @@ This is a Linux 2.0 clone which is much simpler to understand and build than
Linux. This version of Fiwix is a fork of 1.4.0 that contains many
modifications and enhancements to support live-boostrap.
lwext4 1.0.0
============
lwext4 1.0.0-lb1
================
If the kernel bootstrap option is enabled then `lwext4 <https://github.com/gkostka/lwext4>`
is built next. This is a library for creating ext2/3/4 file systems from user land.
@ -230,11 +266,19 @@ kexec-fiwix
If the kernel bootstrap option is enabled then a C program `kexec-fiwix` is compiled
and run which places the Fiwix ram drive in memory and launches the Fiwix kernel.
kexec-linux
===========
esfu 1.0
========
If the kernel bootstrap option is enabled then a C program `kexec-linux` is compiled.
This is used as part of the go_sysb step later to launch the Linux kernel.
This is an extremely crippled basic implementation of ``mount`` and ``mknod``.
Sufficient only for the next step.
early_mount_disk
================
When using kernel bootstrap, distfiles from this point exist on an external
disk. Using ``esfu``'s ``mount`` and ``mknod``, we are able to mount this disk.
This is unnecessary when not using kernel bootstrap as everything is done on the
disk.
make 3.82
=========
@ -304,6 +348,12 @@ Bash ships with a bison pre-generated file here which we delete.
Unfortunately, we have not bootstrapped bison but fortunately for us,
heirloom yacc is able to cope here.
update_env
==========
This is a simple script that makes some small updates to the env file that were
not possible when using kaem.
flex 2.5.11
===========
@ -321,8 +371,8 @@ tcc 0.9.27 (patched)
We recompile ``tcc`` with some patches needed to build musl.
musl 1.1.24
===========
musl 1.1.24 and musl_libdir
===========================
``musl`` is a C standard library that is lightweight, fast, simple,
free, and strives to be correct in the sense of standards-conformance
@ -335,6 +385,9 @@ apply a few patches. In particular, we replace all weak symbols with
strong symbols and will patch ``tcc`` in the next step to ignore
duplicate symbols.
We do not use any of ``/usr/lib/mes`` or ``/usr/include/mes`` any longer, rather
using ``/usr/lib`` and ``/usr/include`` like normal.
tcc 0.9.27 (musl)
=================
@ -586,12 +639,6 @@ libtool 2.2.4
GNU Libtool is the final part of GNU Autotools. It is a script used to hide away differences
when compiling shared libraries on different platforms.
bash 2.05b
==========
Up to this point, our build of ``bash`` could run scripts but could not be used
interactively. Rebuilding bash makes this functionality work.
automake 1.15.1
===============
@ -646,6 +693,12 @@ GCC can build the latest as of the time of writing musl version.
We also don't need any of the TCC patches that we used before.
To accomodate Fiwix, there are patches to avoid syscalls set_thread_area and clone.
Linux headers 5.10.41
=====================
This gets some headers out of the Linux kernel that are required to use the
kernel ABI, needed for ``util-linux``.
gcc 4.0.4
=========
@ -655,10 +708,15 @@ util-linux 2.19.1
=================
``util-linux`` contains a number of general system administration utilities.
Most pressingly, we need these for being able to mount disks (for non-chroot
mode, but it is built it in chroot mode anyway because it will likely be useful
later). The latest version is not used because of autotools/GCC
incompatibilities.
This gives us access to a much less crippled version of ``mount`` and ``mknod``.
The latest version is not used because of autotools/GCC incompatibilities.
move_disk
=========
In ``kernel-bootstrap`` mode, we have been working off an initramfs for some
things up until now. At this point we are now capable of moving to it entirely,
so we do so.
kbd-1.15
========
@ -685,6 +743,12 @@ bc 1.07.1
``bc`` is a console based calculator that is sometime used in scripts. We need ``bc``
to rebuild some Linux kernel headers.
kexec-linux
===========
If the kernel bootstrap option is enabled then a C program ``kexec-linux`` is compiled.
This can be used to launch a Linux kernel from Fiwix.
kexec-tools 2.0.22
==================
@ -693,13 +757,6 @@ Linux kernel without a manual restart from within a running system. It is a
kind of soft-restart. It is only built for non-chroot mode, as we only use it
in non-chroot mode. It is used to go into sysb/sysc.
create_sysb
===========
The next step is not a package, but the creation of the sysb rootfs, containing
all of the scripts for sysb (which merely move to sysc). Again, this is only
done in non-chroot mode, because sysb does not exist in chroot mode.
Linux kernel 4.9.10
===================
@ -716,30 +773,10 @@ so we use a ``find`` command to remove those, which are automatically regenerate
The kernel config was originally taken from Void Linux, and was then modified
for the requirements of live-bootstrap, including compiler features, drivers,
and removing modules. Modules are unused. They are difficult to transfer to
subsequent systems, and we do not have ``modprobe``. Lastly,
the initramfs of sysb is generated in this stage, using ``gen_init_cpio`` within
the Linux kernel tree. This avoids the compilation of ``cpio`` as well.
subsequent systems, and we do not have ``modprobe``.
musl 1.2.4
==========
Prior to booting Linux, musl is rebuilt yet again with syscalls
``clone`` and ``set_thread_area`` enabled for Linux thread support.
go_sysb
=======
This is the last step of sysa, run for non-chroot mode. It uses kexec to load
the new Linux kernel into RAM and execute it, moving into sysb.
In chroot, sysb is skipped, and data is transferred directly to sysc and
chrooted into.
sysb
====
sysb is purely a transition to sysc, allowing binaries from sysa to get onto a
disk (as sysa does not necessarily have hard disk support in the kernel).
It populates device nodes, mounts sysc, copies over data, and executes sysc.
We then kexec to use the new Linux kernel, using ``kexec-tools`` for a Linux
kernel and ``kexec-linux`` for Fiwix.
curl 7.88.1
===========