Copyright © 1989, 1998 by Matthew Belmonte.
GRAMMAR FOR ARITHMETIC EXPRESSIONS:
Expr -> ArithExpr end-token
ArithExpr -> ArithExpr TermOp Term | Term
TermOp -> plus-token | minus-token
Term -> Term FactorOp Factor | Factor
FactorOp -> mult-token | div-token
Factor -> number-token | left-p-token ArithExpr right-p-token
ADDITIONS FOR BASIC PURPLE:
(remove Expr -> ArithExpr end-token)
Program -> StatementList end-token
StatementList -> StatementList semicolon-token Statement | Statement
Statement -> in-token identifier-token | out-token ArithExpr |
identifier-token assign-token ArithExpr
Factor -> identifier-token
ADDITIONS FOR EXTENDED PURPLE:
Statement -> do-token BoolExpr then-token StatementList od-token
| if-token BoolExpr then-token StatementList fi-token
| if-token BoolExpr then-token StatementList else-token StatementList
fi-token
BoolExpr -> BoolExpr ClauseOp Clause | Clause
ClauseOp -> and-token | or-token
Clause -> not-token PositiveClause | PositiveClause
PositiveClause -> ArithExpr RelOp ArithExpr
RelOp -> less-token | lesseq-token | greater-token | greatereq-token |
eq-token | noteq-token
LEXICAL DEFINITIONS FOR ARITHMETIC EXPRESSIONS:
number-token: (0+1+2+3+4+5+6+7+8+9)+
plus-token: + (that is, the symbol `+', not the regular expression operator
`+')
minus-token: -
mult-token: *
div-token: /
left-p-token: (
right-p-token: )
end-token: .
ADDITIONS FOR BASIC PURPLE:
semicolon-token: ;
in-token: IN
out-token: OU
identifier-token: A + B + C + ... + Z
assign-token: <-
ADDITIONS FOR EXTENDED PURPLE:
do-token: DO
then-token: ->
od-token: OD
if-token: IF
fi-token: FI
else-token: ||
and-token: &
or-token: |
not-token: ~
less-token: <
lesseq-token: <=
greater-token: >
greatereq-token: >=
eq-token: =
noteq-token: <>
ARITHMETIC SEMANTICS:
The arithmetic expressions part of the language is straightforward: to
calculate the value of a tree, apply the operator represented at the root of
the tree to the values of the left and right subtrees. The value of a leaf is
simply the number stored in that leaf. Your evaluator should catch any attempt
to divide by zero and report an error.
BASIC PURPLE SEMANTICS:
To evaluate a sequence of two statements joined by a semicolon, first evaluate
the statement on the left, and then evaluate the statement on the right. To
evaluate an IN statement, read a value from the keyboard and assign it to the
identifier specified in the IN statement (modify the global variable
bindings). To evaluate an OU statement, evaluate the associated
arithmetic expression, and display its value on the screen. To evaluate an
identifier, look up the value that has been associated with it (use the global
variable bindings). To evaluate an assignment statement, find the value
of the arithmetic expression on the right-hand side of the assignment operator
and assign it to the identifier on the left-hand side (modify the global
variable bindings).
EXTENDED PURPLE SEMANTICS:
To evaluate a Boolean expression, apply the operator represented at the root of
the tree to the values of the left and right subtrees. To evaluate a DO
statement, evaluate the Boolean condition. If the result is true, then
evaluate the statement list. After you've evaluated the statement list,
re-evaluate the Boolean condition and keep going around this loop until the
condition becomes false. To evaluate an IF statement, evaluate the Boolean
condition. If the result is true, then evaluate the first statement list and
ignore the second (if it exists). If, on the other hand, the condition is
false, then evaluate the second statement list (if it exists).
HOW TO DO IT:
You will be writing a scanner, a parser, and an evaluator for arithmetic
expressions. Your scanner should be based on the lexical definitions given
above, and your parser should take its structure from the structure of the
grammar given above. Use as guides the sample
scanner, parser, and
evaluator for the WFF language (and the
corresponding header files scanner.h,
parser.h, and evaluator.h),
which were discussed in class. Remember, though, that you are writing an
interpreter for PURPLE, not for WFFs---the basic principles will be the same
but the details of the program will be completely different.
DO NOT DEVELOP YOUR PROGRAMS ON THE COMPUTERS. Write them out on paper first, and submit them for checking before typing them in. This will catch bugs and errors in style before they become buried beneath other segments of your program. Include comments where you think the program deserves some explanation. In particular, you should annotate any nontrivial loops and recursive functions with the invariants that you used to develop them.
After typing and debugging each segment of the program, have us check it on the computer to make sure that it runs correctly. When you've finished all three segments (scanner, parser, and evaluator) for arithmetic expressions, begin work on extending your scanner, parser, and evaluator to handle Basic PURPLE. Again, work on each of the three phases separately, and submit all changes and additions to us on paper before typing them into the computer. (You may want to print out a copy of your arithmetic expression evaluator and illustrate on it where the changes will be made.)
If you have time after you get Basic PURPLE working, you can start on Extended PURPLE. Same routine: no typing before it's checked on paper, and work on each phase separately.
IMPORTANT:
Before you start on Basic PURPLE, please submit diagrams of both the full parse trees and the abbreviated parse trees corresponding to the following PURPLE programs, so that we know that you know what you're doing (we'll endeavour to get them back to you as quickly as possible):
IN X; Y <- 1; DO X > 0 -> Y <- Y*X; X <- X-1 OD; OU Y.
IN X; IN Y; IF X = 6 & Y = 9 -> OU 42 || OU X*Y FI.
You may want to define the following functions:
init_scanner is a function to be called before you first run the
scanner. It makes sure that the scanner's input variable is loaded.
next_char takes zero parameters and places the next character typed from
the keyboard into a global variable, from which it can be read by the scanning
functions. (It's also possible to do this, and more, using the cin
input stream. For the purposes of this project, though, we don't want to take
for granted the conversion of single characters to strings in input streams.
So don't use cin. You may still use cout if you need it.)
isspace (defined for you in <ctype.h>) takes a character as its
parameter, and is true if that character is a space or a carriage return, false
otherwise.
isupper (defined for you in <ctype.h>) takes a character as its
parameter, and is true if that character is a capital letter of the alphabet
A-Z, false otherwise. (The letters A-Z are the only legal variable names, or
`identifiers', in PURPLE.)
isdigit (defined for you in <ctype.h>) takes a character as its
parameter, and is true if that character is a digit 0-9, false otherwise.
issymbol (which we will provide for you) takes a character as its
parameter, and is true if that character is a legal PURPLE symbol (i.e.,
something that is not a digit and not a letter but still a legal character,
such as `+', `&', or `>'). You can use the predicates isupper, isdigit,
and issymbol to select among several helping functions that handle the scanning
for these various types of characters.
THINGS TO REMEMBER:
--When processing digits, you'll want to convert them to integers so that you
can do arithmetic on them. A single digit can be converted by subtracting from
it the character code for the digit '0'. For example, the ASCII character code
for the digit '5' happens to be 53, and the code for '0' is 48:
'5'-'0' = 53-48 = 5. In general, for any digit d, d-'0' is the number
represented by d. It is a mistake to use d as a number without subtracting
'0' from it. If you do this, you get 48 where you should get 0, 49 where
you should have 1, and so on. Now that you know how to convert a single digit,
you need to figure out how to convert a string of digits. The solution
involves a loop, and you should write its invariant before doing anything else
with it.
--The top-level function of your scanner should be a method associated
with a class of tokens. It should take no arguments and should place
the next token in the input stream into the associated instance of the token
class.
--The top-level function of your parser should take no arguments and should
call the scanner to get tokens.
--The top-level function of your evaluator should take a pointer to a parse
tree as an argument.
--Don't use ultra-generic variable names such as a, b, x, y, l, &c. Use
names that tell what kind of data your variables hold, and what the
significance of that data is. Look at the sample programs for examples.
--Annotate all complex recursive functions with invariants, and think about
these invariants before you write the functions.