Performance 2026-03-12 6 min read

Legacy Code Parser Performance Benchmarks 2026: Rust + Tree-sitter vs The Field

Comprehensive benchmarks comparing tree-sitter-based Rust parsing against ANTLR, regex-based tools, and commercial COBOL analyzers. Real-world files, real numbers.

By AITYTECH Engineering

Performance claims in legacy modernization tools are rarely backed by reproducible benchmarks. We decided to change that. Here are real numbers from our Rust + tree-sitter parser, measured against real COBOL files from open-source projects.

Test Environment

Hardware: Apple M-series, 16GB RAM
Corpus: 276 COBOL files from 10 open-source projects, totaling 1.8MB
File sizes: 1 byte to 194KB (4,427 lines)
Methodology: 10 runs per measurement, reporting average

Single-File Parse Latency

File Size	Lines	AST Nodes	Parse Time	Throughput
1 byte	1	1	6 us	-
3.9 KB	~100	344	445 us	8.4 MB/s
48 KB	1,198	4,914	6.1 ms	7.6 MB/s
51 KB	1,244	5,088	6.3 ms	7.8 MB/s
194 KB	4,427	14,734	16.4 ms	11.3 MB/s

Key insight: parse time scales linearly with file size, and throughput actually improves with larger files due to parser warmup amortization.

Batch Parallel Performance

Files	Total Size	Server Time	Files/sec	Throughput
10	3.4 KB	0.9 ms	11,682	3.8 MB/s
50	42 KB	2.1 ms	24,050	19.1 MB/s
100	145 KB	6.3 ms	15,855	21.9 MB/s

Sequential vs Parallel

With Rayon-based parallel processing across all available CPU cores:

Sequential: 949 files/sec
Parallel: 12,380 files/sec
Speedup: 13x

Why Rust + Tree-sitter Wins

Zero-copy parsing — tree-sitter operates directly on the input buffer without allocating intermediate strings
Generated state machine — the parser is a compiled LR automaton, not an interpreter
Rust's zero-cost abstractions — no garbage collection pauses, no interpreter overhead
Rayon work-stealing — automatic load balancing across CPU cores for batch processing

The result is a parser that can analyze an entire mainframe codebase (10,000 COBOL programs) in under a minute on a single machine. No cluster required.