Tuesday, September 2, 2008

S-code

MIT Scheme has an internal format for code called ‘S-Code’. This is a set of data structures that represent the abstract syntax tree of the language with all the surface syntax expanded away. There are nodes for conditionals, nodes for literal values, nodes for variables, nodes for lambda expressions, etc. Each node has a particular evaluation rule. The interpreter traverses the syntax tree and evaluates each node according to the appropriate evaluation rule.

MIT Scheme uses a rather old high-tagged pointer format. This is a legacy from the Scheme 81 chip. A machine word in MIT Scheme is 32-bits, but the top six bits encode the type of word. The remaining twenty-six bits may encode an immediate value, like a small integer (fixnum), or it may encode the address at which a larger data structure may be found. Twenty-six bits of address was more than ample in 1981, but it is quite small by today's standards.

The MIT Scheme interpreter is essentially an emulator for a Scheme 81-like chip (although it has evolved substantially). The core of the interpreter is a loop that reads the expression ‘register’, extracts the tag bits and dispatches (with a switch statement) to the code that implements the evaluation rule for the expression.

My implementation of MIT Scheme uses a different strategy. Each S-Code object is an instance of a C# class. The details of how the instance is represented in memory are unimportant in this implementation because we aren't relying on the ability to inspect the raw bits. The interpreter traverses the S-Code by invoking a virtual function EvalStep defined on the abstract base class. Every concrete S-Code class must provide a concrete method that implements the appropriate evaluation rule. Rather than a big switch statement, the interpreter simply calls EvalStep to dispatch to the appropriate routine.

This requires a few slight changes to the interpreter. In MIT Scheme, literal objects can be embedded directly within the S-Code. If you refer to a number in your program, that number appears directly in the abstract syntax tree. Since every object in MIT Scheme has a tag, the interpreter will dispatch to the `self-evaluating' code when it encounters one of these embedded objects. That code simply returns the object itself.

In my version, it must be the case that every object that the interpreter acts upon is derived from the abstract class SCode and has an EvalStep virtual method. The implementors of the .NET runtime did not have the foresight to add an EvalStep method to the object, so the interpreter cannot dispatch on those. There are several solutions, but the easiest is this: self-evaluating objects are never allowed to appear directly in S-Code, they must be wrapped within a Quotation node. The implementation ensures this automatically. When an S-Code node is initialized, any subfield that is not itself an S-Code object is wrapped within a Quotation.

MIT Scheme is naturally tail recursive. The underlying dispatch loop is a while statement. The stack provided by the physical machine is not used, so the code only has to manage a virtual stack.

While the .NET CLR has the ability to perform tail-recursive calls, the C# compiler makes no use of it. This means we have to go out of our way to implement tail recursion. There are several techniques for this, but the only realistic one that allows us to use the normal function calling mechanism is a trampoline. (Baker's Cheney on the M.T.A technique won't work because the CLR prohibits the creation of linked objects on the stack.) In my implementation, the trampoline looks like this:
    public sealed class Interpreter
    {
        SCode expression;
        Environment environment;
        Continuation continuation;

        void Run ()
        {
            while (true)
                this.expression.EvalStep (this);
        }
    }
Each ‘bounce’ through the trampoline fetches the contents of the expression register and invokes the EvalStep method. The interpreter object itself is passed to the method. The method performs it's actions and modifies the interpreter's expression register before returning to the trampoline. The various EvalStep methods can ‘return’ a value by passing it on to the continuation object.

There's not much really interesting here although there is one amusing observation. To preserve tail recursion, all calls have to go through the trampoline to empty the C# stack. However, when invoking a Scheme continuation, we do not have to use the trampoline because there are no infinite chains of continuations (so control eventually returns with finite resource usage). Therefore, the routine that implements the continuation is called like a normal function. We have this curious inversion: when calling a Scheme function, we perform a return in C# in order to pop the C# stack and transfer control via the trampoline, but when performing a return in Scheme (invoking a continuation), we use a C# call and push the C# stack.

Next, a heap is fast, but the stack is faster...

No comments: