Copyright © 1989 by Matthew Belmonte.
GRAMMAR FOR ARITHMETIC EXPRESSIONS:
Expr -> ArithExpr end-token
ArithExpr -> ArithExpr TermOp Term | Term
TermOp -> plus-token | minus-token
Term -> Term FactorOp Factor | Factor
FactorOp -> mult-token | div-token
Factor -> number-token | left-p-token ArithExpr right-p-token
ADDITIONS FOR BASIC PURPLE:
(remove Expr -> ArithExpr end-token)
Program -> StatementList end-token
StatementList -> StatementList semicolon-token Statement | Statement
Statement -> in-token identifier-token | out-token ArithExpr |
identifier-token assign-token ArithExpr
Factor -> identifier-token
ADDITIONS FOR EXTENDED PURPLE:
Statement -> do-token BoolExpr then-token StatementList od-token
| if-token BoolExpr then-token StatementList fi-token
| if-token BoolExpr then-token StatementList else-token StatementList
fi-token
BoolExpr -> BoolExpr ClauseOp Clause | Clause
ClauseOp -> and-token | or-token
Clause -> not-token PositiveClause | PositiveClause
PositiveClause -> ArithExpr RelOp ArithExpr
RelOp -> less-token | lesseq-token | greater-token | greatereq-token |
eq-token | noteq-token
LEXICAL DEFINITIONS FOR ARITHMETIC EXPRESSIONS:
number-token: (0+1+2+3+4+5+6+7+8+9)+
plus-token: + (that is, the symbol `+', not the regular expression operator
`+')
minus-token: -
mult-token: *
div-token: /
left-p-token: (
right-p-token: )
end-token: .
ADDITIONS FOR BASIC PURPLE:
semicolon-token: ;
in-token: IN
out-token: OU
identifier-token: A + B + C + ... + Z
assign-token: <-
ADDITIONS FOR EXTENDED PURPLE:
do-token: DO
then-token: ->
od-token: OD
if-token: IF
fi-token: FI
else-token: ||
and-token: &
or-token: |
not-token: ~
less-token: <
lesseq-token: <=
greater-token: >
greatereq-token: >=
eq-token: =
noteq-token: <>
ARITHMETIC SEMANTICS:
The arithmetic expressions part of the language is straightforward: to
calculate the value of a tree, apply the operator represented at the root of
the tree to the values of the left and right subtrees. The value of a leaf is
simply the number stored in that leaf. Your evaluator should catch any attempt
to divide by zero and report an error.
BASIC PURPLE SEMANTICS:
To evaluate a sequence of two statements joined by a semicolon, first evaluate
the statement on the left, and then evaluate the statement on the right. To
evaluate an IN statement, read a value from the keyboard and assign it to the
identifier specified in the IN statement (modify the global variable
bindings). To evaluate an OU statement, evaluate the associated
arithmetic expression, and display its value on the screen. To evaluate an
identifier, look up the value that has been associated with it (use the global
variable bindings). To evaluate an assignment statement, find the value
of the arithmetic expression on the right-hand side of the assignment operator
and assign it to the identifier on the left-hand side (modify the global
variable bindings).
EXTENDED PURPLE SEMANTICS:
To evaluate a Boolean expression, apply the operator represented at the root of
the tree to the values of the left and right subtrees. To evaluate a DO
statement, evaluate the Boolean condition. If the result is true, then
evaluate the statement list. After you've evaluated the statement list,
re-evaluate the Boolean condition and keep going around this loop until the
condition becomes false. To evaluate an IF statement, evaluate the Boolean
condition. If the result is true, then evaluate the first statement list and
ignore the second (if it exists). If, on the other hand, the condition is
false, then evaluate the second statement list (if it exists).
HOW TO DO IT:
You will be writing a scanner, a parser, and an evaluator for arithmetic
expressions. Your scanner should be based on the lexical definitions given
above, and your parser should take its structure from the structure of the
grammar given above. Use as guides the sample
scanner, parser, and
evaluator for the WFF language, which were
discussed in class. Remember, though, that you are writing an interpreter for
PURPLE, not for WFFs---the basic principles will be the same but the details of
the program will be completely different.
DO NOT DEVELOP YOUR PROGRAMS ON THE COMPUTERS. Write them out on paper first, and submit them for checking before typing them in. This will catch bugs and errors in style before they become buried beneath other segments of your program. Use EdScheme's comment feature where you think the program deserves some explanation; any line beginning with a semicolon is treated as a comment. In particular, you should annotate your more complex recursive functions with the invariants that you used to develop them.
After typing and debugging each segment of the program, have us check it on the computer to make sure that it runs correctly. When you've finished all three segments (scanner, parser, and evaluator) for arithmetic expressions, begin work on extending your scanner, parser, and evaluator to handle Basic PURPLE. Again, work on each of the three phases separately, and submit all changes and additions to us on paper before typing them into the computer. (You may want to print out a copy of your arithmetic expression evaluator and illustrate on it where the changes will be made.)
If you have time after you get Basic PURPLE working, you can start on Extended PURPLE. Same routine: no typing before it's checked on paper, and work on each phase separately.
IMPORTANT:
Before you start on Basic PURPLE, please submit diagrams of both the full parse trees and the abbreviated parse trees corresponding to the following PURPLE programs, so that we know that you know what you're doing (we'll endeavour to get them back to you as quickly as possible):
IN X; Y <- 1; DO X > 0 -> Y <- Y*X; X <- X-1 OD; OU Y.
IN X; IN Y; IF X = 6 & Y = 9 -> OU 42 || OU X*Y FI.
You may use the following pre-defined functions. (These are available for EdScheme for the Macintosh (unpack with UnStuffit), for MIT Scheme for Windows (unpack with pkunzip), for MIT Scheme for UNIX (unpack with gzcat and tar -xf -), and for Doctor Scheme for Windows.
PRE-DEFINED SCANNER FUNCTIONS:
init-scanner should be called before you use any of the other pre-defined
functions. Call it only once, before you first run the scanner. If you
make the mistake of calling init-scanner every time you run the
scanner, then you will lose any saved characters.
next-char takes zero parameters and returns the next character typed from
the keyboard.
space? takes a character as its parameter, and is true (#t) if that
character is a space or a carriage return, false (#f) otherwise.
letter? takes a character as its parameter, and is true (#t) if that
character is a capital letter of the alphabet A-Z, false (#f) otherwise. (The
letters A-Z are the only legal variable names, or `identifiers', in
PURPLE.)
digit? takes a character as its parameter, and is true (#t) if that
character is a digit 0-9, false (#f) otherwise.
symbol? takes a character as its parameter, and is true (#t) if that
character is a legal PURPLE symbol (i.e., something that is not a digit and not
a letter but still a legal character, such as `+', `&', or `>'). You
can use the predicates letter?, digit?, and symbol? to select among several
helping functions that handle the scanning for these various types of
characters.
char->num takes a digit and returns the number between 0 and 9 that that
digit represents. Note that there's a difference between the digit `1' and the
number one that is represented by that digit. In order to arithmetic on
numbers that are typed in from the keyboard, you must first translate them from
strings of digits into the computer's internal representation for numbers. If
a number consists of n digits, then in order to translate it you must
use char->num n times.
save-char takes as its parameter the character that has most recently been
returned by next-char, and pushes it back out onto the input stream, so that
it'll be returned by the next call to next-char. This is useful, for example,
in a situation in which you've reached the end of a number and have read the
character following the number. You don't want to consume this character
inside your number-scanning function, because it needs to be available for the
function that'll be scanning the token that follows the number. So you need a
way to make the machine act as if this character had not been read. Having a
lot of calls to save-char in your program would make it confusing, so use this
function sparingly! Also, if you've just saved a character, you can't save
another one on top of it; that is, at any time there can be only one character
that has been saved and is waiting to be returned again from next-char. If you
write your scanner clearly and elegantly, a single character should be all you
ever need to save.
error takes any number of parameters, displays these parameters on the
screen, and then halts execution of the program. error is useful for reporting
error conditions that prevent further execution of your program, such as input
that is not well-formed.
begin sequences the evaluation of Scheme expressions. Although in a purely
functional language any pair of functional expressions f and g
have the same values regardless of whether f is evaluated before
g or vice versa, Scheme is not exclusively a functional language,
and it is possible for the evaluation of functional expressions in Scheme to
have side effects. For example, what value is returned by a call to
next-token depends on when the call occurs during the course of
tokenising the input stream. begin can be abused, especially by those of you
who have programming experience in some imperative language such as Pascal.
Check the sample programs for good examples of when--and when not--to use
begin. Note that begin is not essential to Scheme; it's possible to define a
version of begin that works for any pair of functional expressions f and
g:
(define begin ;evaluate f, but pass g to begin-help as is. (lambda (f g) (begin-help (f) g))) (define begin-help ;throw away f's result, and return g's result. (lambda value-of-f g) (g)))
In reality, the Scheme interpreter defines begin as a special form that works for any number of functional expressions, not just two.
let defines local variables. It takes two parameters, the first of which is a list of pairs of variables and bindings, and the second of which is a functional expression within which these bindings will be effective. So, for example, (let ((right-answer 42) (wrong-answer (* 6 9))) (- wrong-answer right-answer)) evaluates to 12. Like begin, let is useful for sequencing the evaluation of functional expressions that have side effects, since every expression in the bindings list is evaluated before the final expression. But the main use of let is to avoid re-computation of complex expressions by allowing these expressions to be evaluated only once, and then referred to several times. Take a moment to think for yourself about how let can be implemented using helping functions that bind their parameters to new names. Depending on how you write your scanner, you may never need to use let in your scanner. If you do use it in your scanner, you certainly won't need to use it a lot. It will come in handy, however, in many functions in your parser. Check the sample programs for examples of situations in which let is useful.
PRE-DEFINED PARSER FUNCTIONS:
init-parser performs a similar function for the predefined parsing-related
functions. It in turn calls init-scanner, so when you write your parser you
won't need to include a separate call to init-scanner.
save-token acts analogously to save-char, but for tokens instead of
characters. Again, you can't have more than one token saved at any time.
get-next-token first checks to see whether a token has been saved and not
yet returned. If so, it returns the saved token. If not, it calls the
top-level scanner function next-token, which you define. Thus, since you'll
find it convenient to use save-token occasionally in your parser, your parser
should use get-next-token as an interface to the scanner, not plain old
next-token. If you were to use next-token, you would be ignoring any saved
token.
PRE-DEFINED EVALUATOR FUNCTIONS:
init-evaluator sets up bindings for the evaluator and calls init-parser
(which in turn calls init-scanner). It defines a data structure called an
assoc list, which stores all the variable bindings of the PURPLE program
that's being evaluated. Look in The Schemer's Guide to learn about
assoc lists, and look at the sample evaluator for WFFs to see how they can be
used.
set! is a special form that re-sets the value of a Scheme variable (don't
confuse this with a PURPLE variable!). So, for example, after evaluating the
sequence `(define foo (* 6 9)) (set! foo 42)' the value of the Scheme variable
foo would be 42 instead of 54. You should need to use set! only in one place,
to modify the bindings. DON'T use it anywhere else. Imperative functions such
as set! are dirty pool in a functional language such as Scheme and usually make
for confusing programs, so it's best to stay away from them.
display takes any Scheme expression as a parameter and displays it on the
screen. It's useful for issuing prompts and for printing values.
THINGS TO REMEMBER:
--The top-level function of your scanner should be called next-token (so that
the predefined function get-next-token will be able to refer to it) and should
take zero arguments and return the next token in the input stream.
--Don't use ultra-generic variable names such as a, b, x, y, l, &c. Use
names that tell what kind of data your variables hold, and what the
significance of that data is. Look at the sample programs for examples.
--Annotate all complex recursive functions with invariants, and think about
these invariants before you write the functions.