Japanese Enterprise 2026-03-20 9 min read

Parsing Japanese Enterprise COBOL: Shift-JIS, EBCDIC, and DBCS Challenges

Japanese mainframe COBOL systems present unique parsing challenges: mixed single/double-byte character sets, EBCDIC encoding, and industry-specific extensions for banking and insurance.

By AITYTECH Engineering

Japan runs one of the world's largest installed bases of COBOL systems. Major banks, insurance companies, and government agencies process trillions of yen daily through COBOL programs originally written in the 1970s and 1980s. But parsing Japanese COBOL is fundamentally different from parsing English COBOL — and most analysis tools can't handle it.

The Encoding Challenge

Japanese mainframe COBOL uses multiple character encodings, often within the same source file:

EBCDIC Katakana — IBM's mainframe character set, different from ASCII EBCDIC
Shift-JIS — mixed single-byte (ASCII) and double-byte (Kanji) encoding used in comments and literals
EUC-JP — Extended Unix Code, used in open-system COBOL (GnuCOBOL)
DBCS (Double-Byte Character Set) — IBM's double-byte encoding for Kanji in COBOL data items

The critical issue: COBOL's column-based format breaks with double-byte characters. A Kanji character occupies two bytes but one display column. A parser that counts bytes instead of display positions will misidentify column boundaries, treating program text as comments or sequence numbers.

Japanese COBOL Language Extensions

IBM Enterprise COBOL for z/OS includes Japanese-specific extensions:

PIC N — national (Unicode/DBCS) data items: 01 WS-NAME PIC N(10)
USAGE DISPLAY-1 — DBCS display format
NATIONAL-OF / DISPLAY-OF — character set conversion functions
SHIFT-IN / SHIFT-OUT — inline encoding switches within string literals

These extensions are not part of the COBOL-85 standard and are missing from most open-source COBOL parsers. Our tree-sitter grammar includes support for IBM Enterprise COBOL extensions, including Japanese-specific constructs.

Industry-Specific Patterns

Japanese enterprise COBOL follows conventions defined by IPA (Information-technology Promotion Agency) and SEC (Software Engineering Center):

ETSS documentation format — standardized program documentation headers
ESCR naming conventions — variable and paragraph naming rules for maintainability
Waterfall documentation structure — basic design, detailed design, test specification, and test report templates

Our parser can extract these structural elements from the AST, making it possible to generate IPA/SEC-compliant documentation automatically from source code.

HULFT and DataSpider Integration

Japanese enterprise systems commonly use HULFT (used by 8,700+ companies) for file transfer and DataSpider Servista for system integration. COBOL programs that interact with these systems have specific patterns:

HULFT-triggered batch jobs defined in JCL
File format definitions matching HULFT transfer configurations
DataSpider API calls through COBOL CALL statements

Parsing both the COBOL programs and their associated JCL jobs reveals the complete data flow through HULFT transfer chains — critical information for modernization planning.