The Syntax and Semantics of PURPLE

Copyright © 1989, 1998 by Matthew Belmonte.

GRAMMAR FOR ARITHMETIC EXPRESSIONS:
Expr -> ArithExpr end-token
ArithExpr -> ArithExpr TermOp Term | Term
TermOp -> plus-token | minus-token
Term -> Term FactorOp Factor | Factor
FactorOp -> mult-token | div-token
Factor -> number-token | left-p-token ArithExpr right-p-token

ADDITIONS FOR BASIC PURPLE:
(remove Expr -> ArithExpr end-token)
Program -> StatementList end-token
StatementList -> StatementList semicolon-token Statement | Statement
Statement -> in-token identifier-token | out-token ArithExpr | identifier-token assign-token ArithExpr
Factor -> identifier-token

ADDITIONS FOR EXTENDED PURPLE:
Statement -> do-token BoolExpr then-token StatementList od-token
| if-token BoolExpr then-token StatementList fi-token
| if-token BoolExpr then-token StatementList else-token StatementList fi-token
BoolExpr -> BoolExpr ClauseOp Clause | Clause
ClauseOp -> and-token | or-token
Clause -> not-token PositiveClause | PositiveClause
PositiveClause -> ArithExpr RelOp ArithExpr
RelOp -> less-token | lesseq-token | greater-token | greatereq-token | eq-token | noteq-token

LEXICAL DEFINITIONS FOR ARITHMETIC EXPRESSIONS:
number-token: (0+1+2+3+4+5+6+7+8+9)+
plus-token: + (that is, the symbol `+', not the regular expression operator `+')
minus-token: -
mult-token: *
div-token: /
left-p-token: (
right-p-token: )
end-token: .

ADDITIONS FOR BASIC PURPLE:
semicolon-token: ;
in-token: IN
out-token: OU
identifier-token: A + B + C + ... + Z
assign-token: <-

ADDITIONS FOR EXTENDED PURPLE:
do-token: DO
then-token: ->
od-token: OD
if-token: IF
fi-token: FI
else-token: ||
and-token: &
or-token: |
not-token: ~
less-token: <
lesseq-token: <=
greater-token: >
greatereq-token: >=
eq-token: =
noteq-token: <>

ARITHMETIC SEMANTICS:
The arithmetic expressions part of the language is straightforward: to calculate the value of a tree, apply the operator represented at the root of the tree to the values of the left and right subtrees. The value of a leaf is simply the number stored in that leaf. Your evaluator should catch any attempt to divide by zero and report an error.

BASIC PURPLE SEMANTICS:
To evaluate a sequence of two statements joined by a semicolon, first evaluate the statement on the left, and then evaluate the statement on the right. To evaluate an IN statement, read a value from the keyboard and assign it to the identifier specified in the IN statement (modify the global variable bindings). To evaluate an OU statement, evaluate the associated arithmetic expression, and display its value on the screen. To evaluate an identifier, look up the value that has been associated with it (use the global variable bindings). To evaluate an assignment statement, find the value of the arithmetic expression on the right-hand side of the assignment operator and assign it to the identifier on the left-hand side (modify the global variable bindings).

EXTENDED PURPLE SEMANTICS:
To evaluate a Boolean expression, apply the operator represented at the root of the tree to the values of the left and right subtrees. To evaluate a DO statement, evaluate the Boolean condition. If the result is true, then evaluate the statement list. After you've evaluated the statement list, re-evaluate the Boolean condition and keep going around this loop until the condition becomes false. To evaluate an IF statement, evaluate the Boolean condition. If the result is true, then evaluate the first statement list and ignore the second (if it exists). If, on the other hand, the condition is false, then evaluate the second statement list (if it exists).

HOW TO DO IT:
You will be writing a scanner, a parser, and an evaluator for arithmetic expressions. Your scanner should be based on the lexical definitions given above, and your parser should take its structure from the structure of the grammar given above. Use as guides the sample scanner, parser, and evaluator for the WFF language (and the corresponding header files scanner.h, parser.h, and evaluator.h), which were discussed in class. Remember, though, that you are writing an interpreter for PURPLE, not for WFFs---the basic principles will be the same but the details of the program will be completely different.

DO NOT DEVELOP YOUR PROGRAMS ON THE COMPUTERS. Write them out on paper first, and submit them for checking before typing them in. This will catch bugs and errors in style before they become buried beneath other segments of your program. Include comments where you think the program deserves some explanation. In particular, you should annotate any nontrivial loops and recursive functions with the invariants that you used to develop them.

After typing and debugging each segment of the program, have us check it on the computer to make sure that it runs correctly. When you've finished all three segments (scanner, parser, and evaluator) for arithmetic expressions, begin work on extending your scanner, parser, and evaluator to handle Basic PURPLE. Again, work on each of the three phases separately, and submit all changes and additions to us on paper before typing them into the computer. (You may want to print out a copy of your arithmetic expression evaluator and illustrate on it where the changes will be made.)

If you have time after you get Basic PURPLE working, you can start on Extended PURPLE. Same routine: no typing before it's checked on paper, and work on each phase separately.

IMPORTANT:

Before you start on Basic PURPLE, please submit diagrams of both the full parse trees and the abbreviated parse trees corresponding to the following PURPLE programs, so that we know that you know what you're doing (we'll endeavour to get them back to you as quickly as possible):

	IN X;
	Y <- 1;
	DO X > 0 ->
		Y <- Y*X;
		X <- X-1
	OD;
	OU Y.

	IN X;
	IN Y;
	IF X = 6 & Y = 9 ->
		OU 42
	||
		OU X*Y
	FI.

You may want to define the following functions:
init_scanner is a function to be called before you first run the scanner. It makes sure that the scanner's input variable is loaded.
next_char takes zero parameters and places the next character typed from the keyboard into a global variable, from which it can be read by the scanning functions. (It's also possible to do this, and more, using the cin input stream. For the purposes of this project, though, we don't want to take for granted the conversion of single characters to strings in input streams. So don't use cin. You may still use cout if you need it.)
isspace (defined for you in <ctype.h>) takes a character as its parameter, and is true if that character is a space or a carriage return, false otherwise.
isupper (defined for you in <ctype.h>) takes a character as its parameter, and is true if that character is a capital letter of the alphabet A-Z, false otherwise. (The letters A-Z are the only legal variable names, or `identifiers', in PURPLE.)
isdigit (defined for you in <ctype.h>) takes a character as its parameter, and is true if that character is a digit 0-9, false otherwise.
issymbol (which we will provide for you) takes a character as its parameter, and is true if that character is a legal PURPLE symbol (i.e., something that is not a digit and not a letter but still a legal character, such as `+', `&', or `>'). You can use the predicates isupper, isdigit, and issymbol to select among several helping functions that handle the scanning for these various types of characters.

THINGS TO REMEMBER:
--When processing digits, you'll want to convert them to integers so that you can do arithmetic on them. A single digit can be converted by subtracting from it the character code for the digit '0'. For example, the ASCII character code for the digit '5' happens to be 53, and the code for '0' is 48: '5'-'0' = 53-48 = 5. In general, for any digit d, d-'0' is the number represented by d. It is a mistake to use d as a number without subtracting '0' from it. If you do this, you get 48 where you should get 0, 49 where you should have 1, and so on. Now that you know how to convert a single digit, you need to figure out how to convert a string of digits. The solution involves a loop, and you should write its invariant before doing anything else with it.
--The top-level function of your scanner should be a method associated with a class of tokens. It should take no arguments and should place the next token in the input stream into the associated instance of the token class.
--The top-level function of your parser should take no arguments and should call the scanner to get tokens.
--The top-level function of your evaluator should take a pointer to a parse tree as an argument.
--Don't use ultra-generic variable names such as a, b, x, y, l, &c. Use names that tell what kind of data your variables hold, and what the significance of that data is. Look at the sample programs for examples.
--Annotate all complex recursive functions with invariants, and think about these invariants before you write the functions.