Lisp Interpreter in Rust (a.k.a Juoni)

I wrote an interpreter for a Scheme/LISP-like language in Rust. It was originally an assignment for Compiler Technology course at University of Jyväskylä. Originally meant to be rather smaller, I added one feature after another and.. result is something I feel very proud of and want to showcase it.

It uses Pest.rs for parsing - a very powerful library, which I’d recommend people to try if interested.

It can be downloaded from GitLab

Live terminal

Best demonstration for the capabilities of the language is offering users a chance to try it for themselves. To make it easier, I also developed a WebAssembly adaptation, which can be run directly in the browser. It supports most things that the native version does, with some limitations (#readln is a prompt, due to the difficulty of asynchronously receiving input with minimal changes)

The terminal will be dynamically loaded to the frame below. Please wait a few moments for it to initialize

Language specification

This is abbreviated from the full specficiation found from GitLab, omitting less commonly used functions and technical minutiae

Datatypes and values

A value is understood as any possible data expressible with a Juoni datatype. All Juoni values are of exactly one datatype.

A given datatype may contain nested values. Currently, only lists and functions may contain other values - other types do not have interior contents.

Immutability

All Juoni values are immutable. Whilst some types are internally implemented with references, language is designed not to allow mutable access to such references.

This should not be mixed with variable bindings - apparent mutability can be achieved by changing what a given binding points to - but this does not alter the original content in any way.

Expressions, scopes and bindings

Juoni programs are constructed of expressions. Expressions may be wholly literal expressions, or they may contain references to variable bindings, which are expressed as literal symbols. A binding points to a value in a scope, and therefore, a given name may point to a different value depending on the context.

Types

Categories

Juoni recognizes following distinct datatypes. The following table names them, presents an example as they would be seen in the source code of a Juoni program, and defines their characteristics.

Type Example Definition
String "Lorem ipsum" Strings in Juoni are UTF-8 sequences of characters, of zero or more length. A single character is treated as a string of length 1.
Integer 1337 Integers in Juoni are 64bit, signed integers. Arithmetic operations will over- or underflow if the result of some operation would exceed datatype limits.
Symbol #cons, x Symbols are singleton objects, as identified by a name. Internally, symbols are 32bit unsigned integers, with a symbol table being used to match a name to a number. This comparability is global, independent of current scope. Symbols starting with # have a special meaning, and are intended for special cases (e.g native functions, constants, et al). Symbols are also used to express names for variable bindings, and are used to key a map as such.
List '("List" 1 2 3) Lists in Juoni are singly-linked lists, each node containing a reference to a location containing any value, and optionally reference to some other location containing a list node (tail). This is the most common type of a expression in Juoni.
Function (#func '(x) '(#mul x 2)) A function is an opaque object, internally containing scoping type, appropriate variable bindings and associated expressions that can be called and evaluated.
Special form - Special form is a function-like opaque object that can be called like a function, but may in addition evaluate its parameters in a different way (or not evaluate some at all!)

Any Juoni value will be of one and only one of the types specified above. The type of a given value can be tested with appropriate functions, as specified in the language reference.

Evaluation rules and scope

Modifiers

Any Juoni expression may be prefixed with a modifier, indicating that this expression should be interpreted differently than usual. Currently, only verbatim modifiers are implemented.

Verbatim

The verbatim modifier ' signifies that the following expression is to be interpreted literally, even if it would otherwise lead to a different result, e.g function call or a variable reference. Any modifiers inside a verbatim expression are left as is, and are generally ignored by most functions and special forms.

This is an important property to be aware of, as this could cause unexpected results if not understood. For instance, '('"a") is not equal to '("a") when tested with #eqv?, even if both are for many cases treated equally.

Literals

Most external representations can be interpreted as literal expressions, but for certain types, you must explicitly signify that they are to be interpreted as literals. For certain types, it is not possible to build an equivalent value solely with an external representation, e.g. special forms.

Reference

A reference is expressed by a literal symbol, and evaluates to the value defined by the named variable binding. An undefined binding will cause the expression to evaluate to a #undef symbol

Lists and calls

A list interpreted as a call must have at least one item. The interpretation will be chosen upon evaluating the first item; if it is a function or a special form, it will be interpreted as a call. Other lists will be interpreted as literal lists, with each item evaluated as a normal expression.

There is an important exception though: head of a list evaluating to #undef will trigger an error, unless you intentionally prefix your list with #list to explicitly generate a list.

For an ordinary function call, all given parameters are evaluated before calling the code associated with the call, and all parameters are evaluated by ordinary expression rules. A new scope is created, and values of parameters are stored there. Then, the expression given as the definition of the called function is evaluated, and its value is returned as the result. Special forms do not necessarily follow these criteria, and may behave differently.

Truthiness

In Juoni, a value being truthy is defined as having any of the following properties:

Dynamic, stdlib, and self-recursive scope

Juoni in general has dynamic scope. This means that functions have access to the environment of their caller at the time of the call. This is a crucial difference from many programming languages, where functions may capture their environment at the time of their declaration.

In general, function calls nest in a new scope descendant to the scope where the function was called. However, it is possible to declare stdlib functions, whose scopes nest from stdlib, without access to the surrounding environment. Stdlib is, in general, the bottommost scope, and contains crucial special forms and core functions. Crucially, this also one of the conditions for tail recursion, which will be detailed further in the language reference.

In addition, each function scope will contain a variable called #self. By using this variable, it is possible to self-recurse without growing stack even with otherwise normally nesting functions, as using #self will cause the scope to nest from the environment of the first non-#self call.

Word of warning though on #self: **#self does not create a strong binding to a scope. Unlike with other function objects, it may not be always possible to use #self regardless of the location where you call it. It is always guaranteed to be safe to use within the same function it was defined for.

Considerations on recursion

Juoni has relatively strict limitations on recursion. In general, recursive calls are not guaranteed to not grow stack.

In that light, following operations are however assured safe to recurse within a tail position:

Other known limitations

Standard library and constants

Constants with special meanings

Constant Meaning
#t True, as a boolean value. There is no explicit boolean type in Juoni, and booleans are expressed as symbols. Evaluates to the literal symbol #t
#f False, as a boolean value. Evaluates to its literal symbol
#undef Undefined. Accessing an undefined binding, or calling a function with invalid parameters results in #undefbeing returned.

Special forms

#define
1
2
3
4
5
6
7
8
> abc
=> (symbol) #undef

> (#define 'abc 123)
=> (number) 123

> abc
=> (number) 123
1
2
3
4
5
6
7
8
9
10
11
> (#typeof #eqv?)
=> (symbol) special_form

> (#define '== #eqv?)
=> (special_form) <special form:eqv?>

> (== 3 2)
=> (symbol) #f

> (== 1 1)
=> (symbol) #t

First parameter is evaluated normally, and should be a symbol. Then, the evaluated result of the second parameter will be bound to that symbol in the current scope. #define returns the value assigned.

If called outside stdlib scope and with a symbol starting with #, #stdlib_define_error will be returned instead. This is a protection measure against accidentally overwriting or overriding built-in functions. In addition, attempts to redefine #undef trigger an exception, as such attempts are typically a sign of an error.

#typeof
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> (#typeof #and)
=> (symbol) special_form

> (#typeof #list_reverse)
=> (symbol) function

> (#typeof "a")
=> (symbol) string

> (#typeof '#and)
=> (symbol) symbol

> (#typeof 3)
=> (symbol) number

> (#typeof (1 2 3))
=> (symbol) list

#typeof allows inquiry to the type of any given value, which is returned in form of a symbol.

#func, #func!, #func*, #func*!
1
2
3
4
5
6
> (#define 'triple (#func! '(x) '(#mul x 3)))
=> <...>

> (triple 3)
=> (number) 9

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> (#define 'x 3)
=> (number) 3

> (#define 'env_mul (#func '(y) '(#mul x y)))
=> <...>

> (env_mul 3)
=> (number) 9

> (#define 'x 1)
=> (number) 1

> (env_mul 4)
=> (number) 4

1
2
3
4
5
6
7
8
> (#list_reverse 1 2 3)
[ Interpretation/runtime error: more parameters given than required, got more than 1 ]

> (#define 'list_reverse* (#func* '(x) '(#list_reverse x)))
=> <...>

> (list_reverse* 1 2 3 4)
=> (list) list of type: nonempty list node, containing head: (value at <repl:21>, no modifiers, number '4'), tail: (nonempty list node, containing head: (value at <repl:19>, no modifiers, number '3'), tail: (nonempty list node, containing head: (value at <repl:17>, no modifiers, number '2'), tail: (nonempty list node, containing head: (value at <repl:15>, no modifiers, number '1'), tail: (empty list node))))

These functions create function values; functions are first class citizens in Juoni, and effectively are no different than any other value externally.

First argument is evaluated as a list of formal parameters, all of which must be symbols. This argument list may also be empty, if function has no parameters. The second argument is the definition of the function, which will be evaluated with appropriate formal parameters when invoked. Both of these must be quoted/marked as verbatim, unless you wish to create dynamically defined functions.

Forms with * create variadic functions. These functions must have at least one defined formal parameter, and the last formal parameter will receive all arguments not already assigned to prior formal parameters.

Forms with ! create stdlib functions, which do not inherit their enclosing environment, and instead only have access to the standard library and their formal parameters.

#quote
1
2
3
4
5
6
7
8
9
> (#str_debug 'a)
=> (string) JValue { modifier: None, location: JLocationData { file: "repl", pos: 12 }, value: Symbol(87) }

> (#str_debug (#quote a))
=> (string) JValue { modifier: None, location: JLocationData { file: "repl", pos: 20 }, value: Symbol(87) }

> (#str_debug (#quote 'a))
=> (string) JValue { modifier: Verbatim, location: JLocationData { file: "repl", pos: 20 }, value: Symbol(87) }

#quote effectively is an alternative syntax for preventing evaluation, quite akin to verbatim modifiers. It similarly also allows creating values with verbatim modifiers, potentially causing altered behavior in some functionality.

#eval, #eval_stdlib!, #eval_str
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
> x
=> (symbol) #undef

> (#define 'expr '(#define 'x 3))
=> (list) list of type: nonempty list node, containing head: (value at <repl:17>, no modifiers, symbol ID 5), tail: (nonempty list node, containing head: (value at <repl:25>, verbatim, symbol ID 31), tail: (nonempty list node, containing head: (value at <repl:28>, no modifiers, number '3'), tail: (empty list node)))

> (#eval expr)
=> (number) 3

> x
=> (number) 3

> (#eval_str "(#str_debug #and)")
=> (string) JValue { modifier: None, location: JLocationData { file: "native", pos: 0 }, value: SpecialForm(<special form, name 'and'>) }

> (#eval_str "393invalid21")
[ Interpretation/runtime error: Evaluation error:  --> 1:4
  |
1 | 393invalid21
  |    ^---
  |
  = expected EOI ]

Evaluate the provided expression, and then re-evaluate the resolved expression. This allows dynamically creating Juoni expressions, and evaluating them.

#eval evaluates in the current scope from native structures, whilst #eval_str parses and evaluates a string. #eval_stdlib! evaluates in standard library’s scope. **Use #eval_stdlib! with great care! Special forms are treated as normal objects, with almost all benefits and disabilities of such. You can potentially damage built-in functions depended on by the interpreter and cause a crash!

#catch! and #stop!
1
2
3
4
5
> (#catch! (#list_map))
=> (list) list of type: nonempty list node, containing head: (value at <unknown:0>, no modifiers, symbol ID 73), tail: (nonempty list node, containing head: (value at <unknown:0>, no modifiers, string 'insufficient arguments for function call: got 0, needs 2'), tail: (empty list node))

> (#catch! (#str_debug "OK"))
=> (list) list of type: nonempty list node, containing head: (value at <unknown:0>, no modifiers, symbol ID 72), tail: (nonempty list node, containing head: (value at <unknown:0>, no modifiers, string 'JValue { modifier: None, location: JLocationData { file: "repl", pos: 21 }, value: String("OK") }'), tail: (empty list node))

#catch! and #stop! relate to the error catching mechanic of the interpreter.

#stop! allows code to throw arbitrary errors, stopping interpretation in its tracks. #catch!, on the other hand, allows catching errors - it always returns a list either in form ('ok <result of the expression>) or ('error <error message as a string>).

List functions

#cons, #list, #head and #tail
1
2
3
4
5
6
7
8
9
10
11
12
13
14
> (#cons 1 (#cons 2 (#cons 3 (4 5 6))))
=> (list) list of type: nonempty list node, containing head: (value at <repl:7>, no modifiers, number '1'), tail: (nonempty list node, containing head: (value at <repl:16>, no modifiers, number '2'), tail: (nonempty list node, containing head: (value at <repl:25>, no modifiers, number '3'), tail: (nonempty list node, containing head: (value at <repl:28>, no modifiers, number '4'), tail: (nonempty list node, containing head: (value at <repl:30>, no modifiers, number '5'), tail: (nonempty list node, containing head: (value at <repl:32>, no modifiers, number '6'), tail: (empty list node))))))

> (#define 'x (1 2 3))
=> (list) list of type: nonempty list node, containing head: (value at <repl:13>, no modifiers, number '1'), tail: (nonempty list node, containing head: (value at <repl:15>, no modifiers, number '2'), tail: (nonempty list node, containing head: (value at <repl:17>, no modifiers, number '3'), tail: (empty list node)))

> (#head x)
=> (number) 1

> (#tail x)
=> (list) list of type: nonempty list node, containing head: (value at <repl:15>, no modifiers, number '2'), tail: (nonempty list node, containing head: (value at <repl:17>, no modifiers, number '3'), tail: (empty list node))

> (#list "a" "b" "c")
=> (list) list of type: nonempty list node, containing head: (value at <repl:7>, no modifiers, string 'a'), tail: (nonempty list node, containing head: (value at <repl:11>, no modifiers, string 'b'), tail: (nonempty list node, containing head: (value at <repl:15>, no modifiers, string 'c'), tail: (empty list node)))

List functions shown above behave as follows:

Conditionals, equality and flow control

#cond
1
2
3
4
5
6
7
8
9
10
11
; List reverse
    (#define '#list_reverse (#func! '(lst)
        '(#last
            (#define 'lr_inner (#func! '(lst rvlst)
                '(#cond ((#eqv? lst '()) rvlst)
                       ('#t (#self (#tail lst) (#cons (#head lst) rvlst)))
                )
            ))
            (lr_inner lst '())
        )
    ))

#cond is the standard conditional construct in Juoni. #cond expressions are in form (#cond (condition expression) (condition expression) ...). Sequentially, each condition is evaluated. If it is truthy, the respective expression is evaluated, and #cond returns the result of that expression. Otherwise, if nothing matches, #undef is returned.

#eqv?
1
2
3
4
5
(#eqv? (1 2 3 "d") (1 2 3 "d"))
=> (symbol) #t

> (#eqv? (2 2 3 "d") (1 2 3 "d"))
=> (symbol) #f

#eqv? checks the value equality of two or more values. Effectively, values must be of same type and same content to be equal, with following considerations:

#last

#last evaluates all of its arguments, but only returns the result from the last expression. It is useful for inducing side effects of some nature and sequencing operations, as shown in a example few sections above

#and, #or, #not
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> (#and #t #t #f)
=> (symbol) #f

> (#and #t #t 1)
=> (symbol) #t

> (#or #f #f #t)
=> (symbol) #t

> (#or #f #f #f)
=> (symbol) #f

> (#and 1 2 3 4 5 6 7 8 9)
=> (symbol) #t

> (#and 1 2 3 4 5 6 7 8 9 0)
=> (symbol) #f
1
2
3
4
5
6
7
8
9
> (#not '#t)
ok : => (symbol) #f
> (#not '#f)
ok : => (symbol) #t
> (#not "a")
ok : => (symbol) #f
> (#not 0)
ok : => (symbol) #t

#and and #or are short-circuiting variadic boolean special forms. Expressions are evaluated by their truthiness from left to right, and once there is certainty about the result, it is returned immediately without evaluating possibly remaining expression. Their meaning is fairly self-explanatory.

#not is the Boolean inverse, returning #f for anything truthy, and #t for anything that is not truthy.

Numerical functions

#sum, #sub, #mul, #div
1
2
3
4
5
> (#sum (#sub 9 0 1) (#div (#mul -2 4 -1) 2))
=> (number) 12

>  (#div 3 0 2)
[ Interpretation/runtime error: div: divide by zero ]

Basic arithmetic functions for whole numbers. Above example is (9 - 0 - 1) + ((-2 * 4 * -1) / 2). These forms are variadic (expecting at least two arguments), and work from left to right. All of their arguments are expected to be numbers - anything else will result in an exception. Their behavior is otherwise rather self-explanatory, assuming principles of common maths are known.

#<, #>, #>=, #<=
1
2
3
4
5
6
7
8
9
> (#< 1 2 3)
=> (symbol) #t

> (#> 1 2 3)
=> (symbol) #f

> (#> 1 2 2)
=> (symbol) #f

Comparison operators. They are only defined for numbers, and will cause an exception if a non-numeric type is given. They are variadic, and require at least two arguments. #< and #> require strict increasing or decreasing, whilst #<= and #>= allow equality.

String functions

#str, #str_annotated, #str_debug
1
2
3
4
5
6
7
8
> (#str "abc")
=> (string) abc

> (#str_annotated "abc")
=> (string) string 'abc'

> (#str_debug "abc")
=> (string) JValue { modifier: None, location: JLocationData { file: "repl", pos: 12 }, value: String("abc") }

These functions convert any value to a string representation. It comes in 3 different levels

#readln, #println, #print

Standard string input and output. #readln takes no arguments and reads a single trimmed line from the console. #println and #print take one argument, and print a normally converted string

#<>
1
2
> (#<> "Hello World, I can do " 1234 " strings, if not even MORE!")
=> (string) Hello World, I can do 1234 strings, if not even MORE!

String concatenation. Must provide at least 2 values, all of which are converted normally to a string.

#str_len, #str_substr
1
2
3
4
5
6
7
8
9
10
11
> (#define 'x "kävelyllä")
=> (string) kävelyllä

> (#str_len x)
=> (number) 9

> (#str_substr x 1)
=> (string) ävelyllä

> (#str_substr x 1 2)
=> (string) äv

#str_len returns the count of characters in a string, whilst #str_substr takes a substring of a string by its starting index, optionally also limiting the amount of characters. If starting index is out of bounds or count is zero, an empty string is returned.

#str_to_symbol
1
2
3
4
5
> (#str_to_symbol "abcdef")
ok : => (symbol) abcdef

> (#str_to_symbol "¤!not_valid")
error : => (string) str_to_symbol: not a grammatically valid name for a symbol

#str_to_symbol converts a string to a symbol. Useful for #define and #func, considering it is allowed and easily possible to dynamically build code with the help of a dynamic string-to-symbol conversion.

Assorted functions

#symbol?, #number?, #list?, #string?, #function?, #special_form?
1
2
3
4
5
> (#string? "abc")
=> (symbol) #t

> (#number? "abc")
=> (symbol) #f

Single-argument convenience functions for type-checking. Return #t or #f if the argument is of given type

#callable?
1
2
3
4
5
6
7
8
> (#callable? #eqv?)
=> (symbol) #t

> (#callable? #list_reverse)
=> (symbol) #t

> (#callable? '#undef)
=> (symbol) #f

A slightly more complex check, checks if the given argument is either a function or a special form - a.k.a something you can call.

#list_reverse
1
2
> (#list_reverse ("Hello" "World" "!"))
=> (list) list of type: nonempty list node, containing head: (value at <repl:32>, no modifiers, string '!'), tail: (nonempty list node, containing head: (value at <repl:24>, no modifiers, string 'World'), tail: (nonempty list node, containing head: (value at <repl:16>, no modifiers, string 'Hello'), tail: (empty list node)))

Reverses a list. Takes one argument, the list, and returns a reversed list.

#list_map
1
2
> (#list_map #str_debug (1 "Hi!" #eqv?))
=> (list) list of type: nonempty list node, containing head: (value at <unknown:0>, no modifiers, string 'JValue { modifier: None, location: JLocationData { file: "repl", pos: 23 }, value: Number(1) }'), tail: (nonempty list node, containing head: (value at <unknown:0>, no modifiers, string 'JValue { modifier: None, location: JLocationData { file: "repl", pos: 25 }, value: String("Hi!") }'), tail: (nonempty list node, containing head: (value at <unknown:0>, no modifiers, string 'JValue { modifier: None, location: JLocationData { file: "native", pos: 0 }, value: SpecialForm(<special form, name 'eqv?'>) }'), tail: (empty list node)))

Maps a list using a given function. Takes two arguments, a single-parameter function and a list. Returns a list with each parameter mapped using the given function.

#list_at, #list_set_at
1
2
3
4
5
> (#list_at (1 2 3) 2)
=> (number) 3

> (#list_set_at (1 2 3 4) 2 0)
=> (list) list of type: nonempty list node, containing head: (value at <stdlib:2900>, no modifiers, number '1'), tail: (nonempty list node, containing head: (value at <stdlib:2900>, no modifiers, number '2'), tail: (nonempty list node, containing head: (value at <stdlib:2900>, no modifiers, number '0'), tail: (nonempty list node, containing head: (value at <stdlib:2900>, no modifiers, number '4'), tail: (empty list node))))

These functions enable array-like usage of lists, allowing indexing and setting values at index. Do note that these operations are not O(1), as these functions manipulate singly-linked lists and not an actual array

#str_charlist
1
2
3
> (#str_charlist "ABCDEFX")
=> (list) list of type: nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'A'), tail: (nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'B'), tail: (nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'C'), tail: (nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'D'), tail: (nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'E'), tail: (nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'F'), tail: (nonempty list node, containing head: (value at <stdlib:4240>, no modifiers, string 'X'), tail: (empty list node)))))))

Converts a string to a list of characters.

#formatf, #printf, #printlnf, #formatf_lst
1
2
3
4
5
> (#formatf "Hello!")
=> (string) Hello!

> (#formatf "You know, my complexity has increased over %\% lately!" 150)
=> (string) You know, my complexity has increased over 150% lately!

Simple string interpolation, with following properties:

#formatf, #printf, #printlnf are all variadic functions, whilst #formatf_lst takes a list of arguments as its last parameter. #formatf returns a string, whilst #printf and #printlnf are convenience wrappers for immediately calling #print and #println with the result

#str_to_num
1
2
3
4
5
6
7
8
9
> (#str_to_num "abc")
=> (symbol) #undef

> (#str_to_num "-353")
=> (number) -353

> (#str_to_num "52487")
=> (number) 52487

Converts a string to a number, or returns #undef if this was not possible.