The Team
Annie Ku, Margo Crawford, Nathan Yee, Serena Chen
Goal and Background
Our goal is to write an interpreter in C. Our personal learning goals included gaining an understanding of how programming languages are implemented at a low level, and understanding how to parse not only programming languages, but a wide variety of files (such as markup language files). We chose to interpret Lisp, since its syntax, with its heavy reliance on parentheses, facilitated a more straightforward implementation of the parser. By the end of the project, we had hoped to implement basic numeric operations, functions, and variables, in order to understand parsing and the how the basic building blocks of programming languages worked.
Since we wanted gain a deeper understanding of parsing and interpreting programming languages, we split up and implemented two different versions of parsing and interpreting. The first version we have is included within the lexyacc
directory, and includes heavy use of Yacc and Lex to assist with parsing. This version was significantly assisted by Dr. Pucella, visiting lecturer at Olin. Our conversations with him guided much of the structure we used in the parsing and the interpreter. The other version is included in the directory parsing_focused
. This version was less guided by the help of Dr. Pucella, but used Peter Norvig’s Lispy interpreter, a Lisp interpreter written in Python, as a reference. This version served more as an exploration into how to build an abstract syntax tree and interpret it.
Lisp Interpreter - Lex and Yacc based
We made an interpreter that evaluates the contents of a lisp file with an open-source yacc parser generator called Bison. This interpreter supports function definitions through defun, variable definitions through let, and mathematical binary expressions. This portion of the project explores the use of parser generators and lexical tools to construct and evaluate abstract syntax trees. Dr. Pucella guided us throughout this portion with explaining how lisp files are deconstructed and visualized.
Sample Input and Output:
(defun square (x) (* x x))
(defun times3 (x) (* 3 x))
(times3 (/ 1 (-
(+ 2
(let ((x 3)) (* x 3)))
(square 4))))
- output: -0.600000
(exp 3 2)
- output: 9.000000
(exp 3 2)
- output: 9.000000
(defun square (x) (* x x))
(defun times3 (x) (* 3 x))
(square (/ 1 (- (+ 2 (let ((x 3)) (* x 3)) ) (times3 4))))
- Output: 1
(+ (exp 3 2) (let ((x 3)) (* x 3)))
- Output: 18
(defun minus1 (x) (- x 1))
(defun times3 (x) (* 3 x))
(minus1 (/ 1 (- (+ 2 (let ((x 3)) (* x 3))) (times3 4) ) ) )
- Output: -2.000000
(+ (exp 3 2) (let ((x 2.5)) (* x 3)))
- Output: 16.500000
Known limitations:
-
Functions are restricted to one parameter and only one parameter (no parameter-less functions)
-
No documentation in functions
-
The operators +, -, *, and / are limited to 2 operands
-
Float is the only data type available
Implementation
Above is a high level block diagram of our project. We split the lexing, parsing, and evaluating into separate files. The lexer, at the very bottom, defined float and identifier tokens, the let and defun keywords, and ignored all whitespace. The parser defined the grammar, started the process of tokenizing the input file, and created a node in the abstract syntax tree when an expression was matched. That abstract syntax tree (along with a list of any functions generated) was then passed to the main function, which then passed the tree to the recursive eval
function to be evaluated.
The abstract syntax tree is represented by a series of AstNode
structs. Each AstNode
included the type of node (the operator, data type, function), the value if it had one (for data types), the left and right nodes (for operators), the name (for variables and functions), and the next node (for the let statement, which is unary). In a future iteration, we hope to structure the node such that it is a little more generalized, instead of fields that are only sometimes relevant.
The parser builds the abstract syntax tree. It has multiple constructors depending on the type of node, such as a function call, and operator, or simply a numeric literal. When a function is defined in the parser, it adds a node to the linked list of functions. Each node in the linked list has a name, the parameter name, and the abstract syntax tree that serves as the function body. When the parser is done parsing, it sends the final abstract syntax tree and the list of functions to the caller.
The eval
function in the interpreter utilizes recursive descent. It starts at the head AstNode
and evaluates it based on its type. When a variable gets introduced, it moves to a separate function called eval_param
where the AstNode
gets evaluated and the resulting variable is added to the list of variables. Then after recursively evaluating the rest of the branch, the variable is popped off the list. Functions are given their own environments, which is done by passing in an empty list of variables to eval_param
instead of the existing list of variables.
Variables and functions are stored as linked lists for ease of implementation. Unfortunately, that comes at a cost of runtime; however, the language is so primitive, we are not anticipating runtime to be too much of a problem. If we had more time, we would try to implement the variable and function lists as hash tables.
Lisp Interpreter - Parsing focused
We made a lisp interpreter using only C, with no external libraries such as lex or yacc. It recursively builds an abstract syntax tree, then evaluates it. It supports function definitions through defun and variable definitions through defvar, and unary and binary expressions. It reads from a lisp file, then builds a tree and evaluates it. Since our focus for this part was not on the syntax tree itself, but on creating it, much of the code for the tree was borrowed from Leonidas Ferigas. However, the code that builds the tree from an input file was purely written by us.
Sample input and output:
( defun func2 ( a b c ) ( + a b ) )
( func2 5 7 9 )
- output: 12
( * 5 7 )
- output: 35
( defvar ( x ( + 7 10 ) ) )
- output: 17
( * 2 x )
- output: 34
( * ( + 7 ( - 9 3 ) ) 15 )
- output: 195
( defun func3 ( ) ( 18 ) )
( func3 )
- output: 18
( if t 4 5 )
- output: 4
( if nil 4 5 )
- output: 5
Known limitations:
-
Inputs to functions cannot be non-integers
-
No support for local variables (including in function arguments)
-
Finding a user defined function takes linear time with the number of functions that exist
-
Every token must have white spaces between it, including nested parentheses
Implementation:
Interpreter takes a lisp file as an input, and reads the values of a line, splitting on whitespaces. It then passes a list of values to the make_tree function, which checks the first token to determine whether it is a variable definition, function definition, integer or variable value, unary or binary predefined operation, if statement, or a user defined function. From there, it looks at the array and finds any nested values, and evaluates those. This is a top-down recursive structure. Once the whole tree is built, it returns, and is evaluated. It returns an integer or null, depending on what the expression was.
Getting Started
git clone https://github.com/kuannie1/SoftSysEccentricEucalyptus.git
sudo apt-get install flex bison
Usage
Using the Lex and Yacc based interpreter
cd SoftSysEccentricEucalyptus/lexyacc
make interpreter
./interpreter <lisp file>
There is a test lisp file included in the directory, called test.lisp
.
Using the parsing focused interpreter
cd SoftSysEccentricEucalyptus/parsing_focused
make interpreter
./interpreter <lisp file>
There is a test lisp file included in the directory, called sample.lisp
.