Parser Combinators: Building Parsers from Functions
Parser combinators are small functions that parse specific patterns and combine to form larger parsers. Instead of writing a monolithic parsing function or defining a grammar in a separate DSL, you...
Key Insights
- Parser combinators let you build complex parsers by composing small, single-purpose functions—each combinator does one thing and combines cleanly with others.
- Unlike parser generators that require separate grammar files and build steps, combinators live in your regular code, giving you full IDE support, type checking, and debugging.
- The approach shines for DSLs, configuration files, and protocols where grammar complexity is moderate and maintainability matters more than raw performance.
What Are Parser Combinators?
Parser combinators are small functions that parse specific patterns and combine to form larger parsers. Instead of writing a monolithic parsing function or defining a grammar in a separate DSL, you build parsers the same way you build any other program: by composing functions.
Consider your options when you need to parse structured text. Parser generators like yacc or ANTLR require learning a grammar syntax, running a code generation step, and debugging generated code you didn’t write. Hand-written recursive descent parsers give you control but quickly become tangled spaghetti as grammar complexity grows.
Parser combinators sit in the sweet spot. They’re code, not configuration. They compose naturally. They’re easy to test in isolation. When you need to parse a configuration file, a simple query language, or a custom protocol, combinators get you there faster than the alternatives.
The Parser as a Function
At its core, a parser is a function that takes input and returns either a successful parse result with remaining input, or a failure. Here’s the fundamental type:
type ParseResult<T> =
| { success: true; value: T; remaining: string }
| { success: false; expected: string; position: number };
type Parser<T> = (input: string, position: number) => ParseResult<T>;
Every parser follows this contract. It receives the input string and current position, then returns what it parsed (or why it failed) and where to continue parsing. This uniformity is what makes composition possible.
Let’s build the simplest useful parser—one that matches a specific character:
function char(expected: string): Parser<string> {
return (input, position) => {
if (position < input.length && input[position] === expected) {
return { success: true, value: expected, remaining: input, position: position + 1 };
}
return { success: false, expected: `'${expected}'`, position };
};
}
// Usage
const parseA = char('a');
parseA('abc', 0); // { success: true, value: 'a', position: 1, ... }
parseA('xyz', 0); // { success: false, expected: "'a'", position: 0 }
Notice how the parser doesn’t modify any state. It takes input, returns output. This functional purity makes parsers predictable and testable.
Primitive Parsers
Before combining parsers, you need building blocks. The satisfy parser is the foundation—it succeeds when a character passes a predicate:
function satisfy(predicate: (char: string) => boolean, description: string): Parser<string> {
return (input, position) => {
if (position < input.length && predicate(input[position])) {
return { success: true, value: input[position], remaining: input, position: position + 1 };
}
return { success: false, expected: description, position };
};
}
// Derive specific parsers from satisfy
const digit = satisfy(c => c >= '0' && c <= '9', 'digit');
const letter = satisfy(c => /[a-zA-Z]/.test(c), 'letter');
const whitespace = satisfy(c => /\s/.test(c), 'whitespace');
// Match a specific string
function literal(text: string): Parser<string> {
return (input, position) => {
if (input.startsWith(text, position)) {
return { success: true, value: text, remaining: input, position: position + text.length };
}
return { success: false, expected: `"${text}"`, position };
};
}
// Match end of input
const eof: Parser<null> = (input, position) => {
if (position >= input.length) {
return { success: true, value: null, remaining: input, position };
}
return { success: false, expected: 'end of input', position };
};
With satisfy, you can define any character-level parser. The pattern of building specific parsers from general ones repeats throughout combinator design.
Combinators: Sequencing and Choice
Now for the actual combining. The two fundamental operations are sequencing (parse A then B) and choice (try A, if it fails try B).
function sequence<T extends unknown[]>(
...parsers: { [K in keyof T]: Parser<T[K]> }
): Parser<T> {
return (input, position) => {
const results: unknown[] = [];
let currentPos = position;
for (const parser of parsers) {
const result = parser(input, currentPos);
if (!result.success) return result;
results.push(result.value);
currentPos = result.position;
}
return { success: true, value: results as T, remaining: input, position: currentPos };
};
}
function choice<T>(...parsers: Parser<T>[]): Parser<T> {
return (input, position) => {
for (const parser of parsers) {
const result = parser(input, position);
if (result.success) return result;
}
return { success: false, expected: 'one of multiple options', position };
};
}
function map<T, U>(parser: Parser<T>, fn: (value: T) => U): Parser<U> {
return (input, position) => {
const result = parser(input, position);
if (!result.success) return result;
return { ...result, value: fn(result.value) };
};
}
Let’s use these to parse a key-value pair like name=value:
const identifier = map(
sequence(letter, many(choice(letter, digit, char('_')))),
([first, rest]) => first + rest.join('')
);
const keyValue = map(
sequence(identifier, char('='), identifier),
([key, _, value]) => ({ key, value })
);
keyValue('host=localhost', 0);
// { success: true, value: { key: 'host', value: 'localhost' }, position: 14 }
The map combinator transforms results. Without it, you’d get raw arrays and characters. With it, you build meaningful data structures as you parse.
Repetition and Optional Parsing
Real grammars need repetition. The many combinator parses zero or more occurrences:
function many<T>(parser: Parser<T>): Parser<T[]> {
return (input, position) => {
const results: T[] = [];
let currentPos = position;
while (true) {
const result = parser(input, currentPos);
if (!result.success) break;
results.push(result.value);
currentPos = result.position;
}
return { success: true, value: results, remaining: input, position: currentPos };
};
}
function many1<T>(parser: Parser<T>): Parser<T[]> {
return (input, position) => {
const result = many(parser)(input, position);
if (!result.success || result.value.length === 0) {
return { success: false, expected: 'at least one match', position };
}
return result;
};
}
function optional<T>(parser: Parser<T>): Parser<T | null> {
return (input, position) => {
const result = parser(input, position);
if (result.success) return result;
return { success: true, value: null, remaining: input, position };
};
}
function sepBy<T, S>(parser: Parser<T>, separator: Parser<S>): Parser<T[]> {
return (input, position) => {
const first = parser(input, position);
if (!first.success) {
return { success: true, value: [], remaining: input, position };
}
const rest = many(map(sequence(separator, parser), ([_, val]) => val))(input, first.position);
return { ...rest, value: [first.value, ...rest.value] };
};
}
Parsing a comma-separated list of integers becomes straightforward:
const integer = map(many1(digit), digits => parseInt(digits.join(''), 10));
const ws = many(whitespace);
const comma = map(sequence(ws, char(','), ws), () => ',');
const intList = sepBy(integer, comma);
intList('1, 2, 3, 42', 0);
// { success: true, value: [1, 2, 3, 42], position: 12 }
Note that many is greedy—it consumes as much as possible. This works well for most grammars but can cause issues with ambiguous patterns. If you need backtracking, you’ll need to implement it explicitly or use a library that handles it.
Putting It Together: A JSON Parser
Here’s where combinators prove their worth. A minimal JSON parser in about 60 lines:
type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue };
// Forward declaration for recursion
let jsonValue: Parser<JsonValue>;
const jsonNull = map(literal('null'), () => null);
const jsonBool = choice(
map(literal('true'), () => true),
map(literal('false'), () => false)
);
const jsonNumber = map(
sequence(
optional(char('-')),
many1(digit),
optional(sequence(char('.'), many1(digit)))
),
([sign, int, frac]) => {
const numStr = (sign || '') + int.join('') + (frac ? '.' + frac[1].join('') : '');
return parseFloat(numStr);
}
);
const escapedChar = choice(
map(literal('\\"'), () => '"'),
map(literal('\\\\'), () => '\\'),
map(literal('\\n'), () => '\n'),
map(literal('\\t'), () => '\t'),
satisfy(c => c !== '"' && c !== '\\', 'string character')
);
const jsonString = map(
sequence(char('"'), many(escapedChar), char('"')),
([_, chars, __]) => chars.join('')
);
const jsonArray: Parser<JsonValue[]> = (input, pos) => {
const parser = map(
sequence(char('['), ws, sepBy(jsonValue, sequence(ws, char(','), ws)), ws, char(']')),
([_, __, values]) => values
);
return parser(input, pos);
};
const jsonObject: Parser<{ [key: string]: JsonValue }> = (input, pos) => {
const pair = map(
sequence(jsonString, ws, char(':'), ws, jsonValue),
([key, _, __, ___, value]) => [key, value] as [string, JsonValue]
);
const parser = map(
sequence(char('{'), ws, sepBy(pair, sequence(ws, char(','), ws)), ws, char('}')),
([_, __, pairs]) => Object.fromEntries(pairs)
);
return parser(input, pos);
};
jsonValue = (input, pos) =>
choice(jsonNull, jsonBool, jsonNumber, jsonString, jsonArray, jsonObject)(input, pos);
// Parse JSON with surrounding whitespace
const json = map(sequence(ws, jsonValue, ws, eof), ([_, value]) => value);
json('{"name": "test", "values": [1, 2, 3]}', 0);
// { success: true, value: { name: 'test', values: [1, 2, 3] }, ... }
The recursive structure handles nested objects naturally. Each parser is small, testable, and reusable. Adding a new JSON feature means adding one more combinator.
Tradeoffs and Next Steps
Parser combinators aren’t free. Performance lags behind generated parsers—all those function calls add overhead. For parsing gigabytes of data, consider alternatives.
Error reporting requires effort. The basic implementation above gives you failure position but not context. Production libraries implement strategies like tracking the “furthest failure” or collecting all expected tokens at a position.
When you’re ready to go deeper, explore established libraries:
- Parsec (Haskell): The original that popularized the approach
- nom (Rust): Zero-copy parsing with excellent performance
- FParsec (.NET): Full-featured with great error messages
- Parsimmon (JavaScript): Approachable for TypeScript projects
Parser combinators reward investment. Once you internalize the patterns, you’ll reach for them whenever structured text appears. They turn parsing from a dreaded task into straightforward function composition.