Part 11: Lexical Scope

Published on Jun 29th, 2021, in build a programming language, rust. 4 minute read.

This post is part of a series about learning Rust and building a small programming language.

After adding variables, I added boolean values and comparison operators, because why not. With that in place, I figured it would be a good time to add if statements. Parsing them is straightforward—you just look for the if keyword, followed by a bunch of stuff—so I won’t go into the details. But actually evaluating them was a bit more complicated.

The main issue is that of lexical scope, which is where the context (i.e., what variables are accessible) at each point in the program is defined by where in the original source code it is.

Let’s say you have some code:

let a = 1
if (condition) {
	print(a)
	let b = 2
}
print(b)

Entering the body of the if statement starts a new scope in which variables defined in the any encompassing scope can be accessed, but not vice versa. a, defined in the outer scope, can be read from the inner scope, but b, defined in the inner scope, cannot be accessed from the outer scope.

What this means for me is that, in the evaluator, it’s not just enough to have access to the current scope. All the parent scopes are also needed.

There are a couple ways I could approach this. One way would be to have something like a vector of contexts, where the last element is the current context. Accessing a parent context would just mean walking backwards through the vector. And to enter a new scope, you’d construct a new context and push it onto the vector, evaluate whatever you wanted in the new scope, and then remove it from the vector afterwards. This would work, but it risks needing to replace the vector’s backing storage every time a context is entered. It’s probably a premature optimization, but I decided to take a different approach to avoid the issue.

Another way of doing it is effectively building a singly-linked list, where each Context stores an optional reference to its parent. But a simple reference isn’t enough. The Context struct would need a generic lifetime parameter in order to know how long the reference to its parent lives for. And in order to specify what type the reference refers to, we would need to be able to know how long the parent context’s parent lives for. And in order to spell that type out we’d have to know how long the parent’s parent’s parent—you get the idea.

The solution I came up with was to wrap the context struct in an Rc, a reference-counted pointer. So, instead of each context having a direct reference to its parent, it owns an Rc that’s backed by the same storage. Though, that’s not quite enough, because the context needs to be mutable so code can do things like set variables. For that reason, it’s actually an Rc<RefCell<Context>>. I understand this pattern of interior mutability is common practice in Rust, but coming from languages where this sort of things is handled transparently, it’s one of the weirder things I’ve encountered.

Now, on to how this is actually implemented. It’s pretty simple. The Context struct stores the Rc I described above, but inside an Option so that the root context can have None as its parent.

struct Context {
	parent: Option<Rc<RefCell<Context>>>,
	variables: HashMap<String, Value>,
}

Then, instead of the various eval functions taking a reference directly to the context, they get a reference to the Rc-wrapped context instead.

fn eval_binary_op(left: &Node, op: &BinaryOp, right: &Node, context: &Rc<RefCell<Context>>) -> Value {
	let left_value = eval_expr(left, context);
	// ...
}

The constructors for Context also change a bit. There’s one that doesn’t have a parent, called root, in addition to new which does.

impl Context {
	fn root() -> Self {
		Self {
			parent: None,
			variables: HashMap::new(),
		}
	}

	fn new(parent: Rc<RefCell<Context>>) -> Self {
		Self {
			parent: Some(parent),
			variables: HashMap::new(),
		}
	}
}

Unlike the evaluation functions, Context::new takes an owned Rc, avoiding the infinite-lifetimes problem from earlier. This requires that, when constructing a new context, we just need to clone the existing Rc.

fn eval_if(condition: &Node, body: &[Statement], context: &Rc<RefCell<Context>>) {
	let body_context = Context::new(Rc::clone(context));
	let body_context_ref = Rc::new(RefCell::new(body_context));
}

After the new context is constructed, it too is wrapped in a RefCell and Rc for when the condition and body are evaluated. This is a little bit unweidly, but hey, it works.

Actually evaluating the if is simple enough that I won’t bother going through it in detail. It just evaluates the condition expression (confirming that it’s a boolean; there shall be no implicit conversions!) and, if it’s true, evaluates each of the statements in the body.

fn main() {
	let code = r#"let a = 1; if (a == 1) { dbg(a); }"#;
	let tokens = tokenize(&code);
	let statements = parse(&tokens);
	eval(&statements);
}

$ cargo run
Integer(1)