创作一个简单的编程语言,首先生成custom_arc_lexer.g4文件
创作一个简单的编程语言
首先生成custom_arc_lexer.g4文件
lexer grammar custom_arc_lexer;// Add arc language keywords (subset example)
FN: 'fn';
AND: 'and';
OR: 'or';
IF: 'if';
COND: 'cond';
CASE: 'case';
DEF: 'def';
LET: 'let';
WITH: 'with';
MAC: 'mac';
DO: 'do';
DELAY: 'delay';
SETBANG: 'set!';
ELSE: 'else';
ARROW: '=>';// Add Chinese keywords example (using English rule names)
HANSHU: '鍑芥暟';
RUGUO: '濡傛灉';
FOUZE: '鍚﹀垯';
XUNHUAN: '寰幆';
FANHUI: '杩斿洖';
DINGYI: '瀹氫箟';
SHEZHI: '璁剧疆';
DAYIN: '鎵撳嵃';
TRUE: 'true' | '#t';
FALSE: 'false' | '#f';// Identifiers
ID: [a-zA-Z_][a-zA-Z0-9_-]*;// Numbers (simplified)
NUMBER: '-'? [0-9]+ ('.' [0-9]+)?;// Whitespace and comments
WS: [ \t\r\n]+ -> skip;
COMMENT: ';' ~[\r\n]* -> skip;// Symbols
LPAREN: '(';
RPAREN: ')';
LBRACK: '[';
RBRACK: ']';
QUOTE: '\'';
BACKQUOTE: '`';
COMMA: ',';
COMMA_AT: ',@';// Operators
PLUS: '+';
MINUS: '-';
MULTIPLY: '*';
DIVIDE: '/';
MODULO: '%';
EQUAL: '=';
GT: '>';
LT: '<';
GTE: '>=';
LTE: '<=';// String literal (simplified)
STRING: '"' ( ESC_SEQ | ~["\\] )* '"';
fragment ESC_SEQ: '\\' [btnfr\\\"'] // \b \t \n \f \r \\ \" \'| '\\' 'x' HEX_DIGIT HEX_DIGIT;
fragment HEX_DIGIT : [0-9a-fA-F];
安装解析器pip install antlr4-tools
pip install antlr4-tools
安装完成
pip install antlr4-tools
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting antlr4-toolsDownloading https://pypi.tuna.tsinghua.edu.cn/packages/76/e9/f3d327df348a906a83201d2a5e559194945135879b6920dbbce9c1d6fe79/antlr4_tools-0.2.2-py3-none-any.whl (4.4 kB)
Collecting install-jdk (from antlr4-tools)Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7e/5e/af84054b0ff9f9fbe49a7079d46ba8b4ee7ab6192a0310d4bd2c91254626/install_jdk-1.1.0-py3-none-any.whl (15 kB)
Installing collected packages: install-jdk, antlr4-tools
Successfully installed antlr4-tools-0.2.2 install-jdk-1.1.0
调试
问题:字符类 [btnfr\\\"']
中的 \"
有报错
据说在某些ANTLR版本中可能引发警告,但我碰到的是报错:
antlr4 custom_arc_lexer.g4
error(156): custom_arc_lexer.g4:69:11: invalid escape sequence \"
把中括号里的一个斜杠去掉,报错问题解决
E:\work\arc>antlr4 custom_arc_lexer.g4
没有报错,啥也没有输出。
执行antlr4-parse custom_arc_lexer.g4 prog -tree报错
antlr4-parse custom_arc_lexer.g4 prog -tree
Exception in thread "main" java.lang.ClassCastException: class org.antlr.v4.gui.Interpreter$IgnoreTokenVocabGrammar cannot be cast to class org.antlr.v4.tool.LexerGrammar (org.antlr.v4.gui.Interpreter$IgnoreTokenVocabGrammar and org.antlr.v4.tool.LexerGrammar are in unnamed module of loader 'app')at org.antlr.v4.semantics.SymbolChecks.checkForModeConflicts(SymbolChecks.java:313)at org.antlr.v4.semantics.SemanticPipeline.process(SemanticPipeline.java:110)at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:371)at org.antlr.v4.Tool.process(Tool.java:359)at org.antlr.v4.tool.Grammar.<init>(Grammar.java:364)at org.antlr.v4.gui.Interpreter$IgnoreTokenVocabGrammar.<init>(Interpreter.java:42)at org.antlr.v4.gui.Interpreter.interp(Interpreter.java:141)at org.antlr.v4.gui.Interpreter.main(Interpreter.java:277)
javac *.java编译报错
javac *.java
javac *.java
ArcLexerDemo.java:14: : GBK IJӳַ (0x81)// 创建字符?^
ArcLexerDemo.java:17: : GBK IJӳַ (0xA8)// 实例化我们生成的词法分析?^
ArcLexerDemo.java:20: : GBK IJӳַ (0x8A)// 通过 CommonTokenStream ? token 收集起来^
ArcLexerDemo.java:23: : GBK IJӳַ (0x89)// 这里我们只打印所? token(调试用?^
ArcLexerDemo.java:23: : GBK IJӳַ (0x89)// 这里我们只打印所? token(调试用?^
ArcLexerDemo.java:24: : GBK IJӳַ (0x89)tokens.fill(); // 把所? token 拉进?^
ArcLexerDemo.java:24: : GBK IJӳַ (0xA5)tokens.fill(); // 把所? token 拉进?
在powershell下报错:
javac *.java
ArcLexerDemo.java:14: 错误: 编码 GBK 的不可映射字符 (0x81)
// 鍒涘缓瀛楃娴?
^
ArcLexerDemo.java:17: 错误: 编码 GBK 的不可映射字符 (0xA8)
// 瀹炰緥鍖栨垜浠敓鎴愮殑璇嶆硶鍒嗘瀽鍣?
^
ArcLexerDemo.java:20: 错误: 编码 GBK 的不可映射字符 (0x8A)
// 閫氳繃 CommonTokenStream 鎶? token 鏀堕泦璧锋潵
根本原因
- 编码冲突:Java默认使用系统编码(Windows中文版通常是GBK),而现代开发环境(如IDE)默认使用UTF-8。
- 错误表现:GBK无法解析UTF-8特有的字符(如中文标点或非常用汉字),导致
javac
报出“非法字符”错误。
使用这条命令试试:javac -encoding UTF-8 *.java
javac -encoding UTF-8 *.java报错
javac -encoding UTF-8 *.java
ArcLexerDemo.java:2: 错误: 程序包org.antlr.v4.runtime不存在
import org.antlr.v4.runtime.*;
^
ArcLexerDemo.java:3: 错误: 程序包org.antlr.v4.runtime.tree不存在
import org.antlr.v4.runtime.tree.*;
^
custom_arc_lexer.java:2: 错误: 程序包org.antlr.v4.runtime不存在
import org.antlr.v4.runtime.Lexer;
^
custom_arc_lexer.java:3: 错误: 程序包org.antlr.v4.runtime不存在
import org.antlr.v4.runtime.CharStream;
^
custom_arc_lexer.java:4: 错误: 程序包org.antlr.v4.runtime不存在
import org.antlr.v4.runtime.Token;
^
custom_arc_lexer.java:5: 错误: 程序包org.antlr.v4.runtime不存在
import org.antlr.v4.runtime.TokenStream;
^
custom_arc_lexer.java:8: 错误: 程序包org.antlr.v4.runtime.dfa不存在
import org.antlr.v4.runtime.dfa.DFA;