Performance 2026-03-12 6 min read

Legacy Code Parser Performance Benchmarks 2026: Rust + Tree-sitter vs The Field

Comprehensive benchmarks comparing tree-sitter-based Rust parsing against ANTLR, regex-based tools, and commercial COBOL analyzers. Real-world files, real numbers.

By AITYTECH Engineering

Performance claims in legacy modernization tools are rarely backed by reproducible benchmarks. We decided to change that. Here are real numbers from our Rust + tree-sitter parser, measured against real COBOL files from open-source projects.

Test Environment

Single-File Parse Latency

File SizeLinesAST NodesParse TimeThroughput
1 byte116 us-
3.9 KB~100344445 us8.4 MB/s
48 KB1,1984,9146.1 ms7.6 MB/s
51 KB1,2445,0886.3 ms7.8 MB/s
194 KB4,42714,73416.4 ms11.3 MB/s

Key insight: parse time scales linearly with file size, and throughput actually improves with larger files due to parser warmup amortization.

Batch Parallel Performance

FilesTotal SizeServer TimeFiles/secThroughput
103.4 KB0.9 ms11,6823.8 MB/s
5042 KB2.1 ms24,05019.1 MB/s
100145 KB6.3 ms15,85521.9 MB/s

Sequential vs Parallel

With Rayon-based parallel processing across all available CPU cores:

Why Rust + Tree-sitter Wins

  1. Zero-copy parsing — tree-sitter operates directly on the input buffer without allocating intermediate strings
  2. Generated state machine — the parser is a compiled LR automaton, not an interpreter
  3. Rust's zero-cost abstractions — no garbage collection pauses, no interpreter overhead
  4. Rayon work-stealing — automatic load balancing across CPU cores for batch processing

The result is a parser that can analyze an entire mainframe codebase (10,000 COBOL programs) in under a minute on a single machine. No cluster required.