From 3a4433371ffb2797e20701716c4591c67336ba0b Mon Sep 17 00:00:00 2001 From: Timothy Sample Date: Fri, 31 May 2019 22:11:14 -0400 Subject: [PATCH] Fill out the manual. * doc/gash.texi: Add an introduction, add a discussion of included and missing features, document the parser interface, and remove the indexes. * doc/syntax.txt: Update to match the manual. --- doc/gash.texi | 724 ++++++++++++++++++++++++++++++++++++++++++++++--- doc/syntax.txt | 13 +- 2 files changed, 699 insertions(+), 38 deletions(-) diff --git a/doc/gash.texi b/doc/gash.texi index 8b21b22..3ff62db 100644 --- a/doc/gash.texi +++ b/doc/gash.texi @@ -12,6 +12,7 @@ @copying Copyright @copyright{} 2018 Rutger EW van Beusekom@* Copyright @copyright{} 2018 Jan (janneke) Nieuwenhuizen@* +Copyright @copyright{} 2019 Timothy Sample@* Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or @@ -24,12 +25,12 @@ Documentation License''. @dircategory Basics @direntry * Gash: (gash). Guile As SHell. -* gash: (gash)Invoking gash. Running Gash, a minimalist Bash lookalike. +* gash: (gash)Invoking gash. Running Gash, a shell written in Scheme. @end direntry @titlepage @title Gash Reference Manual -@subtitle A POSIX-compliant sh replacement in Guile Scheme. +@subtitle A POSIX-compatible shell written in Guile Scheme. @author The Gash developers @page @@ -47,36 +48,174 @@ Edition @value{EDITION} @* @top Gash This document describes Gash version @value{VERSION}, a -POSIX-compliant sh replacement in Guile Scheme. +POSIX-compatible shell written in Guile Scheme. @menu * Introduction:: What is Gash about? +* Using Gash:: How to use Gash as a shell. +* Hacking with Gash:: How to use Gash as a Scheme library. * GNU Free Documentation License:: The license of this manual. -* Concept Index:: Concepts. -* Programming Index:: Data types, functions, and variables. - -@detailmenu - --- The Detailed Node Listing --- - -Introduction - -* Invoking Gash:: - -@end detailmenu @end menu @c ********************************************************************* @node Introduction @chapter Introduction -@menu -* Invoking Gash:: -@end menu +Gash is a POSIX-compatible shell written in Guile Scheme. It has two +goals: bootstrapping GNU Bash and exposing the shell to Scheme. In +the following sections we will unpack that definition and explain all +of its constituent parts. + + +@node What is a POSIX-compatible shell? +@section What is a POSIX-compatible shell? + +When talking about operating systems, a @dfn{shell} refers to the +day-to-day user interface that a person would use to interact with the +system. That's what you might learn in computer science school, but +in popular usage ``shell'' refers to a very particular type of +interface---the one that developed along with the Unix operating +system. The Unix shell is a command-line interface designed mostly +for invoking programs and connecting programs together. Although we +provide a whirlwind tour of shell features in @ref{Using Gash}, this +manual is not a good general introduction to using a Unix-like shell. +For that, try +@url{http://write.flossmanuals.net/command-line/introduction/, +Introduction to the Command Line}. + +So now we know what a @emph{Unix} shell is, but what is a +@emph{POSIX-compatible} shell? Over time, Unix inspired many similar +systems, all of which were mostly the same, but not quite compatible +with one another. What worked on one Unix-like operating system might +not work on another. To make it easier to write portable software, +everyone got together and described a minimal set of features that all +Unix-like systems should provide. They called it the @dfn{Portable +Operating System Interface}, or @dfn{POSIX} for short. Being a +@emph{POSIX-compatible} shell means that Gash should have at least all +of the features required by POSIX.@footnote{Note that Gash is merely +POSIX-compatible and @emph{not} POSIX-certified. This means we try +our best, but we have not gotten an official stamp of approval from +The Open Group, who owns the POSIX trademark and administers a +certification program.} + + +@node What is Guile Scheme? +@section What is Guile Scheme? + +@dfn{Scheme} is a cool programming language from the 70's. It is a +very conceptually simple dialect of @dfn{Lisp}---a cool programming +language from the 50's. Why are we interested in using such an old +programming language? There are two reasons. The first is that even +by modern standards, Scheme is a very expressive, high-level language. +Writing in such a language is a pleasure. The second reason is that +Scheme's simplicity makes it well-suited to the bootstrapping task +(@pxref{What of Bash and its bootstraps?}). + +@dfn{Guile} is a particular implementation of Scheme. It is the +official extension language of the GNU project. There are many +implementations of Scheme, but Guile is our choice mainly out of a +desire to integrate nicely with the GNU Guix project, which is also +written in Guile. + + +@node What of Bash and its bootstraps? +@section What of Bash and its bootstraps? + +One of the most important functions of Gash is to bootstrap GNU Bash +(@pxref{What is Bash?,,, bash, Bash Reference Manual}). +@dfn{Bootstrapping} is, briefly, going from having no software to +having software. What makes it hard is that you usually use software +to build software, and that snake has a habit of eating its own tail. +For example, Bash uses shell scripts to describe how it should be +built, so you need a shell to build a shell. What if this other shell +needs shell scripts too? Then you have a problem. Normally this +problem goes unnoticed, because you already have a shell that you got +from somebody else, and that somebody got their shell from somebody, +and this process works its way back to the dawn of time---er, the dawn +of Unix time, which is the early 70's---when the first shell emerged +from the primordial soup. + +Where you really run head-long into this problem is when you have a +complete, fine-grained description of a system's software dependency +graph, like they have over at the GNU Guix project. Then, if you +follow the arrows back as far as you can, you can't miss the fact that +the first step in building the system is ``download many megabytes of +inscrutable binary programs from the Internet.'' There are a handful +of people working to change this, and Gash is a small part of this +larger project. For all the details, see the +@url{https://bootstrappable.org/, Bootstrappable Builds website}. + + +@node What does Scheme want from the shell? +@section What does Scheme want from the shell? + +There is a long history of mixing Scheme and the shell. Most +famously, there is @url{https://scsh.net/, Scsh}, which is a hybrid +language that tries to combine the brevity of the shell with the +flexibility of Scheme. There are a few other projects that mix Scheme +or Lisp with shell syntax, and many projects mixing the shell with +other popular programming languages. A common goal of many of these +projects is to provide a way forward for shell scripts that have +become so large they are difficult to maintain. Having a language +that has a similar syntax to shell scripts but provides more ways to +structure code would allow these scripts to continue to grow cleanly. + +On the other hand, it can be useful to provide a shell-like interface +for Scheme environments. For instance, the Guix System boots into +Guile and if anything goes wrong, it presents the user with a Scheme +REPL. It would be nice to be able to drive Guile using shell syntax, +since most of what you need to do in that context is manipulate files +and execute external utilities. Another Guix example is adding +build-system phases that are just shell scripts.@footnote{This one is +a little heretical, but I imagine it being attractive as Guix becomes +more popular.} + +Ultimately, the future of Gash is yet to be explored, but there are +plenty of spaces it could gracefully move into. Patches are welcome! + +@c ********************************************************************* +@node Using Gash +@chapter Using Gash + +Gash is a mostly complete POSIX-compatible shell, so it should mostly +work as you would expect. However, it is missing a few POSIX features +(@pxref{Missing features}), and does not really have any features +beyond POSIX. Specifically, many of the ``Bashisms'' that you may be +familiar with are not available.@footnote{@xref{Major Differences From +The Bourne Shell,,, bash, Bash Reference Manual}.} + +In the following we cover the major features that work very well in +Gash. You should feel confident using these features as you would in +any other POSIX-compatible shell. + +All kinds of command invocation should just work. This includes +setting temporary variables, redirects, and here-documents. All of +the POSIX-specified ``special'' built-ins are available except for +@code{times}. Only a handful of other built-ins are provided (like +@code{cd} and @code{pwd}). + +All of the control structures are available and full-featured. This +means @code{for}, @code{while}, @code{until}, @code{if}, and +@code{case}. Functions and subshells also work fine. + +Field splitting and quoting is well-tested and should work even in +exceptional cases, like @code{"$@@"} and when @code{$IFS} is +manipulated. + +You can set variables and mark them read-only or exported. Many +special variables are available, and about half of the variable +operators (like @code{$@{VARIABLE+alternate@}}) work. + +Both types of command substitution work (that is, @code{$(...)} and +@code{`...`}), and can even be nested. + +In general, Gash should not surprise you too much, and should run most +POSIX-compatible shell scripts. + @node Invoking Gash @section Invoking Gash -@cindex repl The @command{gash} command is the sh interpreter. @example @@ -87,34 +226,555 @@ The @var{option}s can be among the following: @table @code -@item -c @var{string} -By default, Gash will read a file named on the command line as a script. +@item -c@r{, }--command=@var{string} +Evaluate @var{string}, and then exit. + +@item -e@r{, }--errexit +Exit immediately on any error. @item -h@r{, }--help Display help on invoking Gash, and then exit. +@item -p@r{, }--parse +Rather than evaluate a script, parse it and output its syntax tree. + @item -v@r{, }--version Display the current version of Gash, and then exit. +@item -x@r{, }--xtrace +Print each command that is executed. + @end table + +@node Missing features +@section Missing features + +In this section, we cover some of the features that are currently +missing in Gash. Particularly, we look at the features specified by +POSIX that have not yet been implemented. This list is not +exhaustive, but covers the most glaring omissions. + +@itemize @bullet + +@item +Arithmetic substitution. + +@item +Asynchronous commands and job control. + +@item +Alias creation and substitution. + +@item +Certain special variables. This includes @code{$ENV}; all the +localization variables like @code{$LANG}, @code{$LC_*}, and +@code{$NLSPATH}; the parent process ID variable (@code{$PPID}); and +the prompt variables (@code{$PS*}). + +@item +Tilde expansion. + +@item +Variable pattern operators and assertion operators. This means that +@code{$@{FOO%pattern@}} and the like do not work, and neither does +@code{$@{FOO?@}}. + +@item +Multi-line commands from the readline interface. If you press +@key{Enter} in the middle of a command (e.g., from within an +unfinished quotation) Gash will not ask for more input but rather +treat it as a syntax error and exit with an error message. + +@end itemize + +These features will be added to a future version of Gash. If you +would like to contribute to Gash by implementing one of these +features, please get in touch by writing to +@url{mailto:gash-devel@@nongnu.org, gash-devel@@nongnu.org}. + +@c ********************************************************************* +@node Hacking with Gash +@chapter Hacking with Gash + +Have you ever wanted to write Scheme code that can understand and +manipulate shell scripts on the fly? Have you ever wanted your Scheme +scripts to have the breezy convenience of a shell script? You've come +to the right place! In this chapter we will look at all the tools +that Gash offers for exploring and illuminating the dark canyon that +separates Scheme and the shell. + + +@node Parsing shell scripts +@section Parsing shell scripts + +If you're reading this, you probably have some familiarity with +Scheme, and you have probably heard of Scheme's @code{read} procedure. +Gash exports a very similar function from the @code{(gash parser)} +module. It's called @code{read-sh}, and as you may have guessed, it +reads shell code instead of Scheme code. + +With Scheme code, the internal and external representations of the +code are basically the same, but this is not true with shell code. +You will have to learn and understand Gash's particular internal +representation of shell code to use it effectively. Fortunately, it +is designed to be familiar to a Scheme programmer, and we provide many +examples here to get you up to speed quickly. At the end of this +section, a formal definition of the internal representation is given +for reference. + +@defun{read-sh [port]} +Read a complete shell command from the input port @var{port} if it is +specified, or from the current input port if @var{port} is not +specified. A @dfn{complete command} is essentially something an +interactive shell would be able to execute immediately without +prompting for more input. Technically, it corresponds to the +@code{complete_command} production rule defined in the POSIX +specification. +@end defun + +For convenience, we also export the following procedure. + +@defun{read-sh-all [port]} +Read all complete shell commands from the input port @var{port} if it +is specified, or from the current input port if @var{port} is not +specified. This works much like writing your own loop using +@code{read-sh}, but it should be more efficient. As a bonus, it +ensures that the result is always a list of commands. +@end defun + + +@node Internal representation examples +@subsection Internal representation examples + +The easiest way to understand Gash's internal representation of shell +is through examples, so let's take a look at a few. To simplify the +examples, we are going abuse notation a little bit. Instead of +writing + +@example +(call-with-input-string "@var{SHELL-EXAMPLE}" read-sh) +@result{} +@var{RESULT} +@end example + +we will simply write + +@example +@var{SHELL-EXAMPLE} +@result{} +@var{RESULT} +@end example + +The major benefit of doing this is that we do not have to escape the +shell examples to be proper Scheme strings. + + +@subsubheading Simple commands + +The parser can read simple commands and split them into their +constituent parts: + +@example +echo foo +@result{} +( "echo" "foo") + +echo $FOO +@result{} +( "echo" ( "FOO")) + +echo foo$BAR baz +@result{} +( "echo" ("foo" ( "BAR")) "baz") +@end example + +It preserves quotes, since they are significant for field splitting +and escaping special characters: + +@example +echo "$FOO" +@result{} +( "echo" ( ( "FOO"))) + +echo * +@result{} +( "echo" "*") + +echo '*' +@result{} +( "echo" ( "*")) +@end example + +It represents redirects and temporary variable assignments with a nod +to Scheme: + +@example +CC=gcc CFLAGS='-O2 -g' make +@result{} +( (("CC" "gcc") + ("CFLAGS" ( "-O2 -g"))) + "make") + +wc < gash.texi +@result{} +( ((< 0 "gash.texi")) + ( "wc")) + +POSIXLY_CORRECT=y sed N < file +@result{} +( ((< 0 "file")) + ( (("POSIXLY_CORRECT" "y")) + "sed" "N")) + +;; Watch out for this one, though! +echo foo >| file +@result{} +( ((>! 1 "file")) + ( "echo" "foo")) +@end example + +Here documents get processed by the parser, so they only have one +form: + +@example +cat < hello.scm +(display "Hello world!\n") +EOF +@result{} +( ((<< 0 ( "(display \"Hello world!\\n\")\n")) + (> 1 "hello.scm")) + ( "cat")) + +cat < hello.sh +hi=Howdy; echo $hi world! +EOF +@result{} +( ((<< 0 ( + ("hi=Howdy; echo " ( "hi") " world!\n"))) + (> 1 "hello.sh")) + ( "cat")) + +;; That's not what we meant! Let's try again. +cat <<\EOF > hello.sh +hi=Howdy; echo $hi world! +EOF +@result{} +( ((<< 0 ( "hi=Howdy; echo $hi world!\n")) + (> 1 "hello.sh")) + ( "cat")) +@end example + + +@subsubheading Compound commands + +Loops look like they do on the shell: + +@example +while test $GASH = confusing +do + read-the-manual +done +@result{} +( ( "test" ( "GASH") "=" "confusing") + ( "read-the-manual")) + +until cows come home; do wait; done +@result{} +( ( "cows" "come" "home") + ( "wait")) + +for cat in Murray Flash Boots +do + pet $cat +done +@result{} +( ("cat" ("Murray" "Flash" "Boots")) + ( "pet" ( "cat"))) +@end example + +If-statements and case-statements are a bit more like Scheme: + +@example +if test $FLAVOR = chocolate +then + echo hooray! +elif test $FLAVOR = vanilla +then + echo yum! +else + echo too fancy! +fi +@result{} +( + (( "test" ( "FLAVOR") "=" "chocolate") + ( "echo" "hooray!")) + (( "test" ( "FLAVOR") "=" "vanilla") + ( "echo" "yum!")) + ( + ( "echo" "too" "fancy!"))) + +case $FLAVOR +in + *mint*) echo blech! ;; + *peanut*) echo so good! ;; + *maple*) echo maybe! ;; + *) echo sure why not! ;; +esac +@result{} +( ( "FLAVOR") + (("*mint*") ( "echo" "blech!")) + (("*peanut*") ( "echo" "so" "good!")) + (("*maple*") ( "echo" "maybe!")) + (("*") ( "echo" "sure" "why" "not!"))) +@end example + +Brace groups and subshells are pretty straight-forward: + +@example +@{ echo jump to the left; + echo step to the right; @} +@result{} +( + ( "echo" "jump" "to" "the" "left") + ( "echo" "step" "to" "the" "right")) + +(cd tmp; cat ephemera.txt) +@result{} +( + ( "cd" "tmp") + ( "cat" "ephemera.txt")) +@end example + +@subsubheading Function definitions + +Function definitions look more like Lisp: + +@example +dance() @{ + echo swing your hips now! +@} +@result{} +( "dance" + ( "echo" "swing" "your" "hips" "now!")) + +;; Even this weird thing works! +dance_to_the_beat() @{ + sed 's/beat/swing hips/g' +@} < beats.txt +@result{} +( "dance_to_the_beat" + ( ((< 0 "beats.txt")) + ( "sed" ( "s/beat/move hips/g")))) +@end example + + +@subsubheading Pipelines and asynchronous commands + +Pipelines collect commands into a list: + +@example +source | sink +@result{} +( ( "source") ( "sink")) + +cut -d ' ' -f 4 < ice-cream.txt \ + | grep chocolate \ + | sed 's/mint/peanut butter/g' +@result{} +( + ( ((< 0 "flavors.txt")) + ( "cut" "-d" ( " ") "-f" "4")) + ( "grep" "chocolate") + ( "sed" ( "s/mint/peanut butter/g"))) +@end example + +Asynchronous commands are simply regular commands wrapped with +@code{}: + +@example +helpful-daemon & +@result{} +( ( "helpful-daemon")) + +;; So-so dancing. +swing-hips ; flail-arms +@result{} +( + ( "swing-hips") + ( "flail-arms")) + +;; Great dancing. +swing-hips & flail-arms +@result{} +( + ( ( "swing-hips")) + ( "flail-arms")) +@end example + + +@subsubheading Boolean expressions + +The logical ``and'' and ``or'' operators are binary, left-associative +and have equal precedence: + +@example +foo && bar || baz +@result{} +( ( ( "foo") + ( "bar")) + ( "baz")) + +foo || bar && baz +@result{} +( ( ( "foo") + ( "bar")) + ( "baz")) +@end example + +The logical ``not'' operator wraps a command or pipeline: + +@example +! false +@result{} +( ( "false")) + +! cat flavors.txt | grep mint +@result{} +( ( + ( "cat" "flavors.txt") + ( "grep" "mint"))) +@end example + + +@subsubheading Variables and command substitution + +Variable operators have names inspired by Scheme: + +@example +echo $@{FOO+bar@} +@result{} +( "echo" ( "FOO" "bar")) + +echo $@{FOO:+bar@} +@result{} +( "echo" ( "FOO" "bar")) + +echo $@{FOO-bar@} +@result{} +( "echo" ( "FOO" "bar")) + +echo $@{FOO=bar@} +@result{} +( "echo" ( "FOO" "bar")) +@end example + +Command substitution is a simple wrapper: + +@example +echo $(cat message.txt) +@result{} +( "echo" ( ( "cat" "message.txt"))) +@end example + +That's it for examples. By this point, you should have a pretty good +sense for what Gash is going to give you. If you need to know more, +you can always do your own experiments or read the formal definition +in the next section. + + +@node Internal representation definition +@subsection Internal representation definition + +The internal representation of shell code in Gash has the following +form. (If Gash ever returns something that does not have this form, +please file a bug!) + +@example +sync ::= exp + | (' exp) + +exp ::= pipe + | (' exp1 exp2) + | (' exp1 exp2) + | (' pipe) + +pipe ::= cmd* + | (' cmd* ...) + +cmd* ::= cmd + | (' name sync ...) + +cmd ::= (' word ...) + | (' ((var var-word) ...) word ...) + | (' (redir ...) [cmd]) + | (' (var word) ...) + | (' sync ...) + | (' (name (word ...)) sync ...) + | (' word ((pattern-word ...) sync ...) ...) + | (' (list sync ...) ... [(' sync ...)]) + | (' list sync ...) + | (' list sync ...) + | (' sync ...) + +redir ::= ('> fdes word) + | ('< fdes word) + | ('>& fdes word) + | ('<& fdes word) + | ('>> fdes word) + | ('<> fdes word) + | ('>! fdes word) + | ('<< fdes word) + +word ::= string + | (word ...) + | (' word) + | (' sync ...) + | (' var) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var [word]) + | (' var) + +var ::= string + | ("LINENO" integer) +@end example + + +@node Using shell-like constructs from Scheme +@section Using shell-like constructs from Scheme + +Gash also has some features for writing Scheme code with shell-like +semantics. Indeed, this is how Gash works internally. It interprets +shell code into calls into its shell interface. The two modules that +provide this interface are @code{(gash shell)} and @code{(gash +environment)}. The shell module provides control structures with +shell semantics. For instance, you can use @code{sh:for} to update a +given shell variable and run a thunk for each element of a list of +strings. The environment module allows you to manipulate shell +variables and functions, as well as a number of other things. + +Truthfully, these interfaces are not ready for public consumption yet. +They are in need of some careful redesign to be useful outside of Gash +itself. If this sounds interesting to you, please get in touch by +writing to @url{mailto:gash-devel@@nongnu.org, +gash-devel@@nongnu.org}. + + @c ********************************************************************* @node GNU Free Documentation License @appendix GNU Free Documentation License -@cindex license, GNU Free Documentation License @include fdl-1.3.texi -@c ********************************************************************* -@node Concept Index -@unnumbered Concept Index -@printindex cp - -@node Programming Index -@unnumbered Programming Index -@syncodeindex tp fn -@syncodeindex vr fn -@printindex fn - @bye @c Local Variables: diff --git a/doc/syntax.txt b/doc/syntax.txt index def6f61..289815c 100644 --- a/doc/syntax.txt +++ b/doc/syntax.txt @@ -4,15 +4,12 @@ Gash Abstract Syntax Tree Specification Gash parses the shell language into an abstract syntax tree (AST) that has the following form. -list ::= sync - | (' sync ...) - sync ::= exp | (' exp) exp ::= pipe - | (' exp_1 exp_2) - | (' exp_1 exp_2) + | (' exp1 exp2) + | (' exp1 exp2) | (' pipe) pipe ::= cmd* @@ -31,6 +28,7 @@ cmd ::= (' word ...) | (' (list sync ...) ... [(' sync ...)]) | (' list sync ...) | (' list sync ...) + | (' sync ...) redir ::= ('> fdes word) | ('< fdes word) @@ -49,7 +47,7 @@ Internally, the parser also uses these two forms: word ::= string | (word ...) | (' word) - | (' [list] ...) + | (' sync ...) | (' var) | (' var [word]) | (' var [word]) @@ -65,6 +63,9 @@ word ::= string | (' var [word]) | (' var) +var ::= string + | ("LINENO" integer) + The parser never returns a qword, but it is a useful notion. It gets used internally by the parser and by the `word' module.