Building a Simple Functional Language Compiler in Rust
Introduction
For fun and learning, I built a compiler for a small toy functional (Turing complete 😉) language in Rust. The language is minimal: it supports defining functions, conditionals, and basic arithmetic. To keep things simple, all values are 32-bit floating point numbers, and every expression ends with a semicolon.
The compiler takes source code written in this language and converts it into WebAssembly, producing both WebAssembly Text (.wat) and binary (.wasm) files. Everything is hand-written—from the lexer and parser to the abstract syntax tree and code generator.
Language Overview
- Functions: define reusable blocks of code
- Conditionals:
if-then-elseexpressions - Arithmetic : addition, subtraction, multiplication, division
- Comments: lines starting with symbol "#"
- Types: all 32-bit floats
Example 1: computing the sum of the first n integers.
Example 2: Solving a quadratic equation
# Solution of a second
# order equation with coeffients a,b,c
def root1(a b c)
if discr(a, b, c) < 0
then 0
else (-b + sqrt(discr(a, b, c)))/(2*a);
def root2(a b c)
if discr(a, b, c) < 0
then 0
else (-b - sqrt(discr(a, b, c)))/(2*a);
Formal Definition
- Keywords:
def,if,then,else - Identifiers:
[a-zA-Z][a-zA-Z0-9]* - Numbers: decimal floating-point values
[0-9]?(.?[0-9]) - Expressions: arithmetic operations, comparisons, function calls, and conditional expressions
The Context Free Grammar (CFG) for the language is defined as follows:
Program ::= def Prototype Expression ; | def Prototype Expression ; Program
Prototype ::= Identifier(Params) | Identifier()
Params ::= Identifier Params | Identifier
Expression ::= Exp | IfExp
Exp ::= SubExp | Exp < SubExp | Exp > SubExp | Exp <> SubExp | Exp == SubExp
SubExp ::= Term | SubExp + Term | SubExp - Term | SubExp | Term
Term ::= Factor | Term * Factor | Term / Factor | Term & Factor
Factor ::= -Exp | ( Exp ) | Identifier | Number | FuncionCall
FuncionCall ::= Identifier(Args) | Identifier()
Args ::= Exp | Comma Args
IfExp ::= if Exp then Exp else Exp
Implementation
Lexer (a.k.a. Scanner)
The lexer is the first stage of the compiler, responsible for converting raw source code
into a sequence of tokens that the parser can understand.
In this project, the lexer is implemented as a generic Lexer<T>
struct that reads from any input implementing
std::io::Read, allowing it to handle files, strings, or other streams.
Internally, the lexer keeps track of the current character, the current lexeme (the string being built), and the line number for error reporting. It reads characters using a UTF-8 reader to handle all valid characters safely.
The lexer handles several types of tokens:
- Whitespace: skipped automatically, with line numbers incremented whenever a newline is encountered.
- Comments: lines starting with # are skipped until the end of the line.
- Identifiers: sequences of letters and digits, starting with a letter. These typically correspond to function names or variable names in the language.
- Numbers: sequences of digits, optionally containing a decimal point. Only 32-bit floating-point numbers are supported.
- Operators and punctuation: such as
+,-,*,... (seesrc/lexer.rsfor more)
The lexer uses helper functions to organize the logic:
get_char()reads the next character from the input.skip_whitespace()ignores spaces, tabs, and newlines.skip_comment()ignores all characters until the end of a line.get_identifier()andget_number()build identifiers and numeric literals, respectively, adding characters to the lexeme string.other()handles operators and punctuation, including multi-character tokens like<>(not equal) and==(equal).
By the end of this stage, the source code has been transformed into a stream of well-defined tokens that the parser can use to construct the abstract syntax tree. The lexer is entirely hand-written, giving full control over how tokens are recognized and providing a clear foundation for learning compiler design in Rust. You can find the source code here.
Parser
The parser transforms the stream of tokens from the lexer into an Abstract Syntax Tree (AST), representing the structure of the program. It is implemented as a hand-written recursive-descent parser (for LL(1) grammars), following a standard operator-precedence hierarchy:
factors > terms > subexpressions > expressions
This ensures that multiplication and division bind tighter than addition/subtraction, and comparison operators are evaluated at the top level.
The parser handles:
- Function definitions, including name, parameters, and body expressions
- Expressions, including arithmetic, variable references, function calls, and conditional
if-then-elseexpressions - Error logging, using an
ErrorLoggerthat records parsing errors with line numbers - Error recovery, via a synchronization mechanism that skips tokens until a logical resumption point (
;orend-of-file)
A key component is the main_loop(), which drives parsing by repeatedly reading function definitions until the end of the source.
The parser also distinguishes between variable references and function calls in parse_identifier_expr(), correctly collecting arguments when present.
Overall, the parser converts the source code into a structured, navigable AST, providing a solid foundation for generating WebAssembly code. Most other parsing functions are straightforward and follow standard recursive-descent patterns, so only the main loop, expression hierarchy, and argument handling require special attention. You can find the source code here.
AST and Code Generation
The parser produces an Abstract Syntax Tree (AST) that represents the structure of the program independently of the source syntax. In this compiler, the AST is intentionally minimal and expression-oriented, closely mirroring the language design where everything evaluates to a value.
AST Design
Expressions are represented by the ExprNode enum.
Each variant corresponds to a core language construct:
numeric literals, variables, unary and binary expressions, function calls,
and conditional expressions.
Recursive structures are modeled using Box<ExprNode>,
allowing complex expressions such as nested arithmetic
or conditionals to be represented naturally.
Rather than separating analysis and code generation into multiple passes,
each AST node knows how to emit its own WebAssembly Text (WAT)
representation via the to_wat() method.
This keeps the implementation compact and makes the mapping between
language constructs and WebAssembly instructions explicit.
Some notable design choices:
- Expression-oriented code generation: each expression emits instructions that leave its result on the WebAssembly stack.
- Operator-driven emission: binary and unary expressions delegate instruction selection to the Operator type, keeping operator semantics centralized.
- Conditionals map directly to Webassembly’s structured control
flow using
if (result f32),else, andend.
Function calls are handled uniformly, with a small exception for
built-in functions such as sqrt or abs.
These are mapped directly to WebAssembly instructions instead of emitted as
function calls, avoiding unnecessary indirection.
Functions and Prototypes
Functions are represented by a Prototype (name and parameters) and a body expression.
The prototype is responsible for emitting the function signature,
while the body emits the instructions computing the return value.
All functions return a single f32, matching the language’s simplified type system.
This design closely follows the idea that a function is simply a named expression with parameters.
Code Generator
The CodeGenerator is responsible for assembling the final WebAssembly module.
It iterates over the AST, emits function definitions, and exports all defined functions
so they can be called from JavaScript or other WebAssembly hosts.
The generation process is straightforward:
- Open a WebAssembly module.
- Emit each function definition by delegating to the AST.
- Export all function symbols.
- Close the module.
By keeping code generation mostly inside the AST nodes, the CodeGenerator remains small and declarative, acting as a coordinator rather than a complex transformation stage.
Wrapping Up
The modules described so far—lexer, parser, AST, and code generation—cover the core of the compiler pipeline. There are a few additional supporting modules in the repository (such as error handling and operator definitions), but they follow naturally from the design already discussed and do not introduce new concepts.
Rather than walking through every remaining file, I prefer to leave those details to the reader. The full source code is available on GitHub and is intentionally kept small and readable, making it easy to explore individual modules independently.
At this point, the interesting question becomes how all of these pieces
are wired together and how the compiler is actually executed.
That logic lives in main.rs.
Putting Everything Together (main.rs)
use minilang::code_generator::CodeGenerator;
use minilang::parser::Parser;
use std::process::Command;
use std::env;
use std::fs::File;
fn main() -> std::io::Result<()> {
let args: Vec<String> = env::args().collect();
if args.len() < 3 {
println!("Not enough arguments. Please specify input and output files names.");
return Ok(());
}
let src = File::open(args[1].as_str())?;
let target = File::create(args[2].as_str())?;
let mut parser = Parser::new(src);
parser.main_loop();
let err_logger = parser.get_error_logger();
if err_logger.has_errors() {
for error in err_logger.iter() {
println!("SYNTAX ERROR: {}", error);
}
} else {
// Generate WebAssembly text
CodeGenerator::new(parser.get_asts(), target).run()?;
// Generate binary WebAssembly
Command::new("sh")
.arg("-c")
.arg(format!("wat2wasm {}", args[2].as_str()))
.output()
.expect("failed to execute 'wat2wasm'");
}
Ok(())
}
If wat2wasm is not available, you can install it using brew install wabt.
Create a file named fib.txt with the following content:
and the run
It'll generate two files,target.wat and target.wasm.
Next create a javascript file fib.js
const { readFileSync } = require("fs");
const run = async () => {
const buffer = readFileSync("./target.wasm");
const module = await WebAssembly.compile(buffer);
const instance = await WebAssembly.instantiate(module);
console.log(instance.exports.fib(5)); // Call fibbonaci function
};
run();
and run it
The following example computes roots of a quadratic equation (assuming they are different from 0).
# Computes factorial sequence
# Solution of a second
# order equation with coeffients a,b,c
def discr(a b c)
b*b - 4*a*c;
def root1(a b c)
if discr(a, b, c) < 0
then 0
else (-b + sqrt(discr(a, b, c)))/(2*a);
def root2(a b c)
if discr(a, b, c) < 0
then 0
else (-b - sqrt(discr(a, b, c)))/(2*a);
Conclusion
This project was an exercise in building a compiler from first principles, deliberately avoiding frameworks or generator tools in favor of handwritten components. By keeping the language small and the type system uniform, the focus stays on the essential pipeline: lexing, parsing, AST construction, and code generation. Targeting WebAssembly made the output both concrete and inspectable, allowing each design decision to be traced directly to emitted WAT and wasm instructions. While the compiler is intentionally minimal, it provides a complete, end-to-end implementation that can serve as a foundation for experimentation, extension, or simply a deeper understanding of how compilers work under the hood. The full source code is available on GitHub