Legacy Code Parser Performance Benchmarks 2026: Rust + Tree-sitter vs The Field
Comprehensive benchmarks comparing tree-sitter-based Rust parsing against ANTLR, regex-based tools, and commercial COBOL analyzers. Real-world files, real numbers.
Performance claims in legacy modernization tools are rarely backed by reproducible benchmarks. We decided to change that. Here are real numbers from our Rust + tree-sitter parser, measured against real COBOL files from open-source projects.
Test Environment
- Hardware: Apple M-series, 16GB RAM
- Corpus: 276 COBOL files from 10 open-source projects, totaling 1.8MB
- File sizes: 1 byte to 194KB (4,427 lines)
- Methodology: 10 runs per measurement, reporting average
Single-File Parse Latency
| File Size | Lines | AST Nodes | Parse Time | Throughput |
|---|---|---|---|---|
| 1 byte | 1 | 1 | 6 us | - |
| 3.9 KB | ~100 | 344 | 445 us | 8.4 MB/s |
| 48 KB | 1,198 | 4,914 | 6.1 ms | 7.6 MB/s |
| 51 KB | 1,244 | 5,088 | 6.3 ms | 7.8 MB/s |
| 194 KB | 4,427 | 14,734 | 16.4 ms | 11.3 MB/s |
Key insight: parse time scales linearly with file size, and throughput actually improves with larger files due to parser warmup amortization.
Batch Parallel Performance
| Files | Total Size | Server Time | Files/sec | Throughput |
|---|---|---|---|---|
| 10 | 3.4 KB | 0.9 ms | 11,682 | 3.8 MB/s |
| 50 | 42 KB | 2.1 ms | 24,050 | 19.1 MB/s |
| 100 | 145 KB | 6.3 ms | 15,855 | 21.9 MB/s |
Sequential vs Parallel
With Rayon-based parallel processing across all available CPU cores:
- Sequential: 949 files/sec
- Parallel: 12,380 files/sec
- Speedup: 13x
Why Rust + Tree-sitter Wins
- Zero-copy parsing — tree-sitter operates directly on the input buffer without allocating intermediate strings
- Generated state machine — the parser is a compiled LR automaton, not an interpreter
- Rust's zero-cost abstractions — no garbage collection pauses, no interpreter overhead
- Rayon work-stealing — automatic load balancing across CPU cores for batch processing
The result is a parser that can analyze an entire mainframe codebase (10,000 COBOL programs) in under a minute on a single machine. No cluster required.