parsing - Require newline or EOF after statement match -
just looking simple way of getting antlr4 generate parser following (ignore after ;
):
int #i ; defines int int #j ; see how have go line statement?
my parser following:
compilationunit: (statement end?)* statement end? eof ; statement: intdef | ws ; // 10 - 1f block. intdef: 'intdef' identifier ; // lexer. identifier: '#' letter letterordigit*; fragment letter: [a-za-z_]; fragment letterordigit: [a-za-z0-9$_]; // whitespace, fragments , terminals. ws: [ \t\r\n\u000c]+ -> skip; //comment: '/*' .*? '*/' -> channel(hidden); end: (';' ~[\r\n]*) | '\n';
in essence, time have statement
, need require newline before entered. don't care if there's 3 new lines , on second 1 bunch of tabs persist, long there's new line.
the issue is, antlr4 parse tree seems giving me errors inputs such as:
.
(pretend dot isnt there, literally no input)
int #i int #j
woops, got 2 on same line!
any ideas on how can achieve this? appreciate help.
i've simplified grammar bit made require end-of-line sequence after each statement parse correctly.
grammar testnl; program: (statement )* eof ; statement: 'int' identifier eol; identifier: '#' letter letterordigit*; fragment letter: [a-za-z_]; fragment letterordigit: [a-za-z0-9$_]; eol: ';' .*? '\r\n' | ';' .*? '\n' ; ws: [ \t\r\n\u000c]+ -> skip;
it parses
int #i ; int #j; [@0,0:2='int',<'int'>,1:0] [@1,4:5='#i',<identifier>,1:4] [@2,7:9=';\r\n',<eol>,1:7] [@3,10:12='int',<'int'>,2:0] [@4,14:15='#j',<identifier>,2:4] [@5,16:18=';\r\n',<eol>,2:6] [@6,19:18='<eof>',<eof>,3:0]
it ignore stuff after semicolon part of eol token:
[@0,0:2='int',<'int'>,1:0] [@1,4:5='#i',<identifier>,1:4] [@2,7:20='; ignore this\n',<eol>,1:7] [@3,21:23='int',<'int'>,2:0] [@4,25:26='#j',<identifier>,2:4] [@5,27:28=';\n',<eol>,2:6] [@6,29:28='<eof>',<eof>,3:0]
using either linefeed or carriagereturn-linefeed fine. you're looking for?
edit
per op comment, made small change allow consecutive eol tokens, , move eol token statement
reduce repetition:
grammar testnl;
program: ( statement eol )* eof ; statement: 'int' identifier; identifier: '#' letter letterordigit*; fragment letter: [a-za-z_]; fragment letterordigit: [a-za-z0-9$_]; eol: ';' .*? ('\r\n')+ | ';' .*? ('\n')+ ; ws: [ \t\r\n\u000c]+ -> skip;
Comments
Post a Comment