Automated Business Rule Extraction from COBOL Using AST Analysis
How to automatically extract business rules, data flows, and program dependencies from COBOL source code using Abstract Syntax Tree analysis and tree-sitter queries.
One of the biggest challenges in legacy modernization is understanding what the code actually does. A 10,000-line COBOL program may contain hundreds of business rules — validation logic, calculation formulas, conditional workflows — buried in deeply nested IF-ELSE structures and PERFORM chains. Extracting these rules manually takes weeks. AST analysis can do it in seconds.
What Makes COBOL Business Rule Extraction Hard
COBOL business logic is notoriously difficult to extract because:
- GOTO spaghetti — older programs use ALTER and GO TO extensively, creating non-linear control flow
- COPY members — shared data definitions are included via COPY statements, splitting context across files
- Level-88 conditions — COBOL's unique condition names create implicit business rules in the DATA DIVISION
- PERFORM THRU — paragraph ranges make it hard to determine execution scope
- Implicit type coercion — PIC clauses define data types, but MOVE operations perform silent conversions
AST-Based Approach
Using tree-sitter, we can parse COBOL into a structured AST and then run queries to extract specific patterns. Here are the key extraction techniques:
1. Variable Discovery
Tree-sitter query to find all data items with their PIC clauses:
(data_description
(level_number) @level
(entry_name) @name
(picture_clause (pic_string) @pic)?
) @item
This gives us every variable definition with its level number, name, and data type — the foundation for understanding data flow.
2. Conditional Logic Mapping
Business rules typically live inside IF statements and EVALUATE (COBOL's CASE/SWITCH):
(if_statement
(condition) @condition
) @rule
(evaluate_statement
(evaluate_subject) @subject
(evaluate_when
(evaluate_object) @when_value
) @branch
) @switch
3. Calculation Rules
Financial calculations are often the most critical business rules:
(compute_statement) @calc
(add_statement) @calc
(subtract_statement) @calc
(multiply_statement) @calc
(divide_statement) @calc
4. External Dependencies
CALL statements and COPY members reveal program dependencies:
(call_statement) @external_call
(copy_statement) @copy_include
From AST to Business Rules
The raw AST gives us structure. The next step is semantic analysis:
- Data flow tracing — follow MOVE statements to track how values propagate through variables
- Condition grouping — cluster related IF/EVALUATE blocks that reference the same variables
- Cross-reference — link PERFORM targets to paragraph definitions to understand call chains
- Rule annotation — match patterns to known business rule templates (validation, calculation, routing)
Our parser service provides the AST foundation. The analysis layer can be built on top using the query API to extract exactly the patterns relevant to your modernization project.
Real-World Impact
In a recent analysis of a banking COBOL system (42 programs, ~15,000 lines each), AST-based extraction identified:
- 847 business rules across all programs
- 234 cross-program dependencies via CALL chains
- 156 shared data structures through COPY members
- 23 dead code paragraphs never reached by any PERFORM
What would have taken a team of 4 analysts approximately 3 months was completed in under 2 hours of automated analysis plus 2 days of human review.