www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Darser: A LL(1) to Recursive Decent Parser/AST/Visitor Generator

reply Robert Schadek <rschadek symmetryinvestments.com> writes:
To get graphqld up and running I needed a parser/ast/visitor.
Being lazy, I created parser/ast/visitor generated for that.

Darser is the result.

Given a language BNF, as e.yaml, darser will generate a recursive 
decent parser, a set of classes making up the AST, a visitor 
class and a AST printer class. The parser, AST, and visitor can 
be extended by hand written extensions.

Given a yaml file like this:
```
PrimaryExpression:
     Identifier: [ identifier#value ]
     Float: [ float64#value ]
     Integer: [ integer#value ]
     Parenthesis: [lparen, Expression#expr, rparen]
```

darser will create a parser function as such:
```D
PrimaryExpression parsePrimaryExpressionImpl() {
	string[] subRules;
	subRules = ["Identifier"];
	if(this.lex.front.type == TokenType.identifier) {
		Token value = this.lex.front;
		this.lex.popFront();

		return new PrimaryExpression(PrimaryExpressionEnum.Identifier
			, value
		);
	} else if(this.lex.front.type == TokenType.float64) {
		Token value = this.lex.front;
		this.lex.popFront();

		return new PrimaryExpression(PrimaryExpressionEnum.Float
			, value
		);
	} else if(this.lex.front.type == TokenType.integer) {
		Token value = this.lex.front;
		this.lex.popFront();

		return new PrimaryExpression(PrimaryExpressionEnum.Integer
			, value
		);
	} else if(this.lex.front.type == TokenType.lparen) {
		this.lex.popFront();
		subRules = ["Parenthesis"];
		if(this.firstExpression()) {
			Expression expr = this.parseExpression();
			subRules = ["Parenthesis"];
			if(this.lex.front.type == TokenType.rparen) {
				this.lex.popFront();

				return new PrimaryExpression(PrimaryExpressionEnum.Parenthesis
					, expr
				);
			}
			auto app = appender!string();
			formattedWrite(app,
				"Found a '%s' while looking for",
				this.lex.front
			);
			throw new ParseException(app.data,
				__FILE__, __LINE__,
				subRules,
				["rparen"]
			);

		}
		auto app = appender!string();
		formattedWrite(app,
			"Found a '%s' while looking for",
			this.lex.front
		);
		throw new ParseException(app.data,
			__FILE__, __LINE__,
			subRules,
			["float64 -> PostfixExpression",
			 "identifier -> PostfixExpression",
			 "integer -> PostfixExpression",
			 "lparen -> PostfixExpression"]
		);

	}
	auto app = appender!string();
	formattedWrite(app,
		"Found a '%s' while looking for",
		this.lex.front
	);
	throw new ParseException(app.data,
		__FILE__, __LINE__,
		subRules,
		["identifier","float64","integer","lparen"]
	);

}
```

and an AST class like that:
```D
enum PrimaryExpressionEnum {
	Identifier,
	Float,
	Integer,
	Parenthesis,
}

class PrimaryExpression {
	PrimaryExpressionEnum ruleSelection;
	Token value;
	Expression expr;

	this(PrimaryExpressionEnum ruleSelection, Token value) {
		this.ruleSelection = ruleSelection;
		this.value = value;
	}

	this(PrimaryExpressionEnum ruleSelection, Expression expr) {
		this.ruleSelection = ruleSelection;
		this.expr = expr;
	}

	void visit(Visitor vis) {
		vis.accept(this);
	}

	void visit(Visitor vis) const {
		vis.accept(this);
	}
}
```

The lexer has to be hand written.
Mar 20
next sibling parent reply Robert Schadek <rschadek symmetryinvestments.com> writes:
https://code.dlang.org/packages/darser

https://github.com/burner/Darser
Mar 20
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 20 March 2019 at 17:22:07 UTC, Robert Schadek wrote:
 https://code.dlang.org/packages/darser

 https://github.com/burner/Darser
Have you had a look at fancypars? if not you might want to look at the lexer_generation of it. And the way it represents the grammar.
Mar 20
prev sibling next sibling parent reply Cym13 <cpicard openmailbox.org> writes:
On Wednesday, 20 March 2019 at 17:20:48 UTC, Robert Schadek wrote:
 To get graphqld up and running I needed a parser/ast/visitor.
 Being lazy, I created parser/ast/visitor generated for that.

 [...]
This looks nice! I'm familiar with pegged which uses PEG grammars, could you maybe comment on the differences and possible benefits of Darser in comparison?
Mar 20
next sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 20 March 2019 at 21:30:29 UTC, Cym13 wrote:
 On Wednesday, 20 March 2019 at 17:20:48 UTC, Robert Schadek 
 wrote:
 To get graphqld up and running I needed a parser/ast/visitor.
 Being lazy, I created parser/ast/visitor generated for that.

 [...]
This looks nice! I'm familiar with pegged which uses PEG grammars, could you maybe comment on the differences and possible benefits of Darser in comparison?
I'm interested in that too. I suppose it doesn't support left-recursive grammars, like Pegged does?
Mar 20
prev sibling parent reply Robert Schadek <rschadek symmetryinvestments.com> writes:
On Wednesday, 20 March 2019 at 21:30:29 UTC, Cym13 wrote:

 This looks nice! I'm familiar with pegged which uses PEG 
 grammars, could you maybe comment on the differences and 
 possible benefits of Darser in comparison?
Pegged can recognise a lot more than LL(1) (left-recursion,retry,...), Darser can not. Pegged is really smart, Darser is really stupid. Pegged error messages are really bad, Darser's are really good. The Darser AST has actual classes you can set breakpoint on, pegged does not. Darser has a in-build visitor class generated, pegged does not. Stepping through a parse of some input is really easy in Darser, just set your breakpoints inside the parser class, in pegged that is not possible. Pegged runs a CT, Darser puts out files you have to compile.
Mar 21
parent drug <drug2004 bk.ru> writes:
On 21.03.2019 12:06, Robert Schadek wrote:
 On Wednesday, 20 March 2019 at 21:30:29 UTC, Cym13 wrote:
 
 This looks nice! I'm familiar with pegged which uses PEG grammars, 
 could you maybe comment on the differences and possible benefits of 
 Darser in comparison?
Pegged can recognise a lot more than LL(1) (left-recursion,retry,...), Darser can not. Pegged is really smart, Darser is really stupid. Pegged error messages are really bad, Darser's are really good. The Darser AST has actual classes you can set breakpoint on, pegged does not. Darser has a in-build visitor class generated, pegged does not. Stepping through a parse of some input is really easy in Darser, just set your breakpoints inside the parser class, in pegged that is not possible. Pegged runs a CT, Darser puts out files you have to compile.
This really should be somewhere in Darser readme
Mar 21
prev sibling parent Doc Andrew <x x.com> writes:
I've been looking for something exactly like this - thanks! It 
seems like the code it generates is very clean too.

-Doc
Mar 20