lexer: Do not overflow the Mes character buffer.

* gash/lexer.scm (get-token->get-lexical-token): Allow arbitrary
procedure arguments that start with a 'port' argument.
(get-word-lexical-token): Accept a 'start' argument that and push it
onto the soft port character buffer.
(get-token): Use it to avoid ungetting more than one character on a
raw file port.
This commit is contained in:
Timothy Sample 2022-12-22 14:36:10 -06:00
parent ddb5c7ef36
commit cfb4da73d0
1 changed files with 13 additions and 5 deletions

View File

@ -553,10 +553,11 @@ characters read."
"Convert @var{proc} from a procedure that returns a token-value pair
to a procedure that returns a lexical token to be consumed by the LALR
module."
(lambda (port)
(lambda (port . rest)
(let ((port-location (port->port-location port)))
(receive (length token)
(call-with-metered-input-port port proc)
(call-with-metered-input-port port (lambda (mp)
(apply proc mp rest)))
(match token
((category . value)
(make-lexical-token
@ -570,7 +571,15 @@ module."
(get-token->get-lexical-token get-operator))
(define get-word-lexical-token
(get-token->get-lexical-token get-word))
;; The is a roundabout way to avoid putting more than one character
;; on the 'unget-char' buffer for regular ports. Mes only allows
;; one. Here, we delay adding a character until after we have
;; initialized a soft port, which has an unbounded buffer.
(let ((get-word* (lambda* (port #:key start)
(when start
(unget-char port start))
(get-word port))))
(get-token->get-lexical-token get-word*)))
(define (get-newline-lexical-token port)
"Get a newline as a lexical token to be consumed by the LALR module."
@ -602,8 +611,7 @@ is a newline (or EOF)."
(#\newline (get-newline-lexical-token port))
(#\\ (match (next-char port)
(#\newline (loop (next-char port)))
(_ (unget-char port #\\)
(get-word-lexical-token port))))
(_ (get-word-lexical-token port #:start #\\))))
(_ (get-word-lexical-token port)))))