www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compile Time D Expression Parser?

reply d coder <dlang.coder gmail.com> writes:
Greetings

I need to parse simple D expressions at compile time. I was wondering if
somebody on the list has some example code that could be of help to me.

I am working on an opensource constraint solver  and expressions that I
need to parse can be reasonably complex such as "x + y*n < 32 && x > 4". I
want to code a string mixin that parses such expressions and writes out
code that creates a parse tree for the given expression.

Thanks and Regards
- Puneet
Feb 25 2012
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 26.02.2012 03:25, schrieb d coder:
 Greetings

 I need to parse simple D expressions at compile time. I was wondering if
 somebody on the list has some example code that could be of help to me.

 I am working on an opensource constraint solver  and expressions that I
 need to parse can be reasonably complex such as "x + y*n < 32 && x > 4".
 I want to code a string mixin that parses such expressions and writes
 out code that creates a parse tree for the given expression.

 Thanks and Regards
 - Puneet
How different is what you want to do from CTFE? http://dlang.org/function.html
Feb 25 2012
next sibling parent d coder <dlang.coder gmail.com> writes:
 How different is what you want to do from CTFE?
 http://dlang.org/function.html
I do not want to evaluate the expression in the string at compile time. I want to parse it at compile time and later at run time, I am required to process the parse tree and evaluate it in a different fashion (using binary decision diagrams). In summary, I need CTFE to parse the expression and create a parse tree from it. I think this requires a string mixin that takes the expression string as input and outputs code that builds the corresponding parse tree. Regards - Puneet
Feb 26 2012
prev sibling next sibling parent reply kenji hara <k.hara.pg gmail.com> writes:
Hisayuki Mima's ctpg is compile-time parser generater, and the
generated parser works in compile time!
https://github.com/youkei/ctpg

Kenji Hara

2012/2/26 d coder <dlang.coder gmail.com>:
 How different is what you want to do from CTFE?
 http://dlang.org/function.html
I do not want to evaluate the expression in the string at compile time. I want to parse it at compile time and later at run time, I am required to process the parse tree and evaluate it in a different fashion (using binary decision diagrams). In summary, I need CTFE to parse the expression and create a parse tree from it. I think this requires a string mixin that takes the expression string as input and outputs code that builds the corresponding parse tree. Regards - Puneet
Feb 26 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/26/12 5:07 AM, kenji hara wrote:
 Hisayuki Mima's ctpg is compile-time parser generater, and the
 generated parser works in compile time!
 https://github.com/youkei/ctpg

 Kenji Hara
This seems to be great work, but a 30-seconds skim did not reveal to me how exactly it works. Are there some examples or more documentation? Thanks, Andrei
Feb 26 2012
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 26.02.2012 15:07, kenji hara wrote:
 Hisayuki Mima's ctpg is compile-time parser generater, and the
 generated parser works in compile time!
 https://github.com/youkei/ctpg

 Kenji Hara
Nice! I'm curious as to which parsing method the generated parser does employ (it isn't immediately obvious ;) ). Examples look good, but I don't seem to get how recursive sample fails match(?) e.g. "aaaaa" given the grammar: recursive -> A $ A -> a A a | a I mean it's any odd-length sequence of a. -- Dmitry Olshansky
Feb 27 2012
parent reply Hisayuki Mima <youxkei gmail.com> writes:
(2012/02/27 19:42), Dmitry Olshansky wrote:
 On 26.02.2012 15:07, kenji hara wrote:
 Hisayuki Mima's ctpg is compile-time parser generater, and the
 generated parser works in compile time!
 https://github.com/youkei/ctpg

 Kenji Hara
Nice! I'm curious as to which parsing method the generated parser does employ (it isn't immediately obvious ;) ). Examples look good, but I don't seem to get how recursive sample fails match(?) e.g. "aaaaa" given the grammar: recursive -> A $ A -> a A a | a I mean it's any odd-length sequence of a.
Parsing method which the generated parser employs is Recursive Descent Parsing. And the behavior of the parsers recursive and A is a little bit complicated. The parser recursive matches the sequence of 'a' whose length is 2^n-1 such as 1, 3, 7 and 15. Anyway, I'm going to write more documentation of ctpg within a few days. I hope it'll help you. Hisayuki Mima
Feb 27 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 27.02.2012 20:36, Hisayuki Mima wrote:
 (2012/02/27 19:42), Dmitry Olshansky wrote:
 On 26.02.2012 15:07, kenji hara wrote:
 Hisayuki Mima's ctpg is compile-time parser generater, and the
 generated parser works in compile time!
 https://github.com/youkei/ctpg

 Kenji Hara
Nice! I'm curious as to which parsing method the generated parser does employ (it isn't immediately obvious ;) ). Examples look good, but I don't seem to get how recursive sample fails match(?) e.g. "aaaaa" given the grammar: recursive -> A $ A -> a A a | a I mean it's any odd-length sequence of a.
Parsing method which the generated parser employs is Recursive Descent Parsing.
OK, does it check for Left recursive grammars?
 And the behavior of the parsers recursive and A is a little bit
 complicated.
 The parser recursive matches the sequence of 'a' whose length is 2^n-1
 such as 1, 3, 7 and 15.
Well I'm missing something about that BNF-grammar(right?) but undoubtedly: recursive -> A $ -> a A a $ -> a a A a a $ - > a a a a a $ As for task of parsing only 2^n-1 sequences of "a" by CFG that's news for me.
 Anyway, I'm going to write more documentation of ctpg within a few days.
 I hope it'll help you.
Please do, the project with such potential should not stay undocumented. -- Dmitry Olshansky
Feb 27 2012
parent reply Hisayuki Mima <youxkei gmail.com> writes:
(2012/02/28 2:22), Dmitry Olshansky wrote:
 OK, does it check for Left recursive grammars?
No, it doesn't. So currently left recursive grammar causes infinite loop. I know the way to deal with the left recursive grammar well which is using memoization, but memoization causes very bad performance.
 Well I'm missing something about that BNF-grammar(right?) but undoubtedly:
 recursive -> A $ -> a A a $ -> a a A a a $ - > a a a a a $
 As for task of parsing only 2^n-1 sequences of "a" by CFG that's news
 for me.
If it is BNF, the expansion you said (recursive -> A $ -> a A a $ -> a a A a a $ - > a a a a a $) is right. But the DSL which ctpg uses to generate parser is based on PEG. Unlike BNF's choice operator "|", PEG's choice operator "/" is ordered. For details, see: http://en.wikipedia.org/wiki/Parsing_expression_grammar#Definition . Thanks, Hisayuki Mima
Feb 27 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 28.02.2012 9:19, Hisayuki Mima wrote:
 (2012/02/28 2:22), Dmitry Olshansky wrote:
 OK, does it check for Left recursive grammars?
No, it doesn't. So currently left recursive grammar causes infinite loop. I know the way to deal with the left recursive grammar well which is using memoization, but memoization causes very bad performance.
I haven't thought that was PEG, as these grammars are harder to rewrite automatically.
 Well I'm missing something about that BNF-grammar(right?) but
 undoubtedly:
 recursive -> A $ -> a A a $ -> a a A a a $ - > a a a a a $
 As for task of parsing only 2^n-1 sequences of "a" by CFG that's news
 for me.
If it is BNF, the expansion you said (recursive -> A $ -> a A a $ -> a a A a a $ - > a a a a a $) is right. But the DSL which ctpg uses to generate parser is based on PEG. Unlike BNF's choice operator "|", PEG's choice operator "/" is ordered. For details, see: http://en.wikipedia.org/wiki/Parsing_expression_grammar#Definition .
Ordered is fine, but it still means that once first alternative fails (in this case when we finally hit the end of input) engine has to backtrack(!) then try second, then 3rd etc. More then that, if all of this fails it has to "backtrack" through all choices made on the way and pick other alternatives. This backtracking is what packrat parser optimizes via memoization. By the way this is what makes doing semantic actions problematic in packrat parsing as they have to be 'reverted' once it backtracks.
 Thanks,
 Hisayuki Mima
-- Dmitry Olshansky
Feb 28 2012
prev sibling parent reply d coder <dlang.coder gmail.com> writes:
 Parsing method which the generated parser employs is Recursive Descent
 Parsing.
 And the behavior of the parsers recursive and A is a little bit
 complicated.
 The parser recursive matches the sequence of 'a' whose length is 2^n-1
 such as 1, 3, 7 and 15.
 Anyway, I'm going to write more documentation of ctpg within a few days.
 I hope it'll help you.
It sure will. Meanwhile, it would help if you can share some example code. I surely do have an EBNF notation for the expressions I want to parse. Is it possible to parse any EBNF of arbitrary complexity using your parser generator? Regards - Puneet
Feb 27 2012
parent Hisayuki Mima <youxkei gmail.com> writes:
(2012/02/28 2:24), d coder wrote:
 Is it possible to parse any EBNF of arbitrary complexity using your
 parser generator?
Yes, but some rewrites is required. 1) If your EBNF have (indirect) left recursion, you have to eliminate it because the DSL which ctpg uses to generate parser is based on PEG, parsing expression grammar. For details, see: http://en.wikipedia.org/wiki/Parsing_expression_grammar . 2) As the example of parsing simple arithmetic expression shows, the parser generated by ctpg have type such as int and string. EBNF doesn't have type system so you have to add appropriate type conversion into your EBNF. Thanks, Hisayuki Mima
Feb 27 2012
prev sibling parent reply d coder <dlang.coder gmail.com> writes:
 Hisayuki Mima's ctpg is compile-time parser generater, and the
 generated parser works in compile time!
 https://github.com/youkei/ctpg

 Kenji Hara
Thanks Kenji This is very interesting. Also I think ctpg is being actively developed. I will try to figure out how to make use of it. Regards - Puneet
Feb 26 2012
parent Hisayuki Mima <youxkei gmail.com> writes:
(2012/02/26 20:56), d coder wrote:
     Hisayuki Mima's ctpg is compile-time parser generater, and the
     generated parser works in compile time!
     https://github.com/youkei/ctpg

     Kenji Hara


 Thanks Kenji

 This is very interesting. Also I think ctpg is being actively developed.
 I will try to figure out how to make use of it.

 Regards
 - Puneet
Hello, Puneet I'm writing ctpg and really sorry about too little documentation of it. I'm going to at least some sample codes which sketchily shows how to use ctpg within a few days. By the way, ctpg is only parser (and converter) generator. If you want to parse some expressions, you have to write DSL ,which is like PEG or BNF, in order to generate parsers. If that helps, generated parser works in both compile time and run time. Hisayuki Mima
Feb 26 2012
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 02/26/2012 03:25 AM, d coder wrote:
 Greetings

 I need to parse simple D expressions at compile time. I was wondering if
 somebody on the list has some example code that could be of help to me.

 I am working on an opensource constraint solver  and expressions that I
 need to parse can be reasonably complex such as "x + y*n < 32 && x > 4".
 I want to code a string mixin that parses such expressions and writes
 out code that creates a parse tree for the given expression.

 Thanks and Regards
 - Puneet
Operator precedence parsers are simple to implement: http://effbot.org/zone/simple-top-down-parsing.htm
Feb 26 2012
parent reply d coder <dlang.coder gmail.com> writes:
 Operator precedence parsers are simple to implement:

 http://effbot.org/zone/simple-**top-down-parsing.htm<http://effbot.org/zone/simple-top-down-parsing.htm>
Timon I want to do all this parsing at compile time in D (using mixins). I have just started working on this. I am not sure if CTFE allows so much flexibility. I believe I will have to use functional style as in the link provided by Kenji. Let me know if I am missing something. Regards - Puneet
Feb 26 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 02/26/2012 01:00 PM, d coder wrote:
     Operator precedence parsers are simple to implement:

     http://effbot.org/zone/simple-__top-down-parsing.htm
     <http://effbot.org/zone/simple-top-down-parsing.htm>


 Timon

 I want to do all this parsing at compile time in D (using mixins). I
 have just started working on this. I am not sure if CTFE allows so much
 flexibility. I believe I will have to use functional style as in the
 link provided by Kenji. Let me know if I am missing something.

 Regards
 - Puneet
I know. CTFE is flexible enough and implementing this would be quite simple. However, if ctpg serves your needs well, just use that.
Feb 26 2012
parent reply d coder <dlang.coder gmail.com> writes:
 I know. CTFE is flexible enough and implementing this would be quite
 simple. However, if ctpg serves your needs well, just use that.
Thanks Timon You mean I do not need to use function style when using CTFE? I will try parsing myself as you suggested. This approach would give me a better handle. Regards - Puneet
Feb 26 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 02/26/2012 01:55 PM, d coder wrote:
     I know. CTFE is flexible enough and implementing this would be quite
     simple. However, if ctpg serves your needs well, just use that.


 Thanks Timon

    You mean I do not need to use function style when using CTFE?
    I will try parsing myself as you suggested. This approach would give
 me a better handle.

 Regards
 - Puneet
Almost the complete language is available in CTFE, therefore classes could be used to implement the parse tree representation. However, a limitation that does exist is that classes that were created in CTFE cannot yet be stored in static variables or enums. How will the interface to your library look like?
Feb 26 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 02/26/2012 02:07 PM, Timon Gehr wrote:
 On 02/26/2012 01:55 PM, d coder wrote:
 I know. CTFE is flexible enough and implementing this would be quite
 simple. However, if ctpg serves your needs well, just use that.


 Thanks Timon

 You mean I do not need to use function style when using CTFE?
 I will try parsing myself as you suggested. This approach would give
 me a better handle.

 Regards
 - Puneet
Almost the complete language is available in CTFE, therefore classes could be used to implement the parse tree representation. However, a limitation that does exist is that classes
objects, obviously
 that were created in CTFE
 cannot yet be stored in static variables or enums. How will the
 interface to your library look like?
Feb 26 2012
prev sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
 Almost the complete language is available in CTFE, therefore classes
could be used to implement the parse tree representation. However, a limitation that does exist is that classes that were created in CTFE cannot yet be stored in static variables or enums. How will the interface to your library look like? I recently wrote a parsing expression grammar module in D, also to create grammars and parsers at compile-time and parse inputs at CT. (PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar) Usage is like this: mixin Grammar( "JSON <- '{' ( Pair ( ',' Pair )* )? '}'" , "Pair <- String ':' Value" , "Value <- String / Number / JSON / Array / True / False / Null" , `True <- "true"` (..., rest of JSON grammar) ); enum result = JSON.parse( `{ "Hello":42, "World!": { "a":[0,1,2,3] } }`); I deliberatly used classes to construct the parsers, for I wanted an extended class template example in a big tutorial on templates I'm writing. For now, the resulting grammars are space-insensitive, because I grew tired of always inserting Spaces parsers everywhere. The parse tree is done with a tree struct. Since it holds strings (captures), it can be manipulated at CT to recreate new code. Any tree-walking function can collect the captures to build a D code string which can then be mixed in. For exampe, last week, I created a template constraints parser, to then test the resulting tree with template parameters. It tests the entire constraint with passed parameters and, if it fails, it recursively tests the sub-constraints to find ones that return false. So, given "if (isIntegral!T && !(isFloatingPoint!(U) || is(U : W)))", it will test "isIntegral!T" and so on. All in all, it's quite fun and works OK, it just slows down compilation a bit. What was just an example in a tutorial is slowly becoming its own project. I think I'll put it on Github in a week or two. Philippe
Feb 26 2012