digitalmars.D - =?UTF-8?B?W0dTb0PigJkxMV0gTGV4aW5nIGFuZCBwYXJzaW5n?=

Ilya Pupatenko (65/65) Mar 22 2011 Hi,

Robert Jacques (34/79) Mar 22 2011 I'm not qualified to speak on Spirits internal architecture; I've only

spir (8/86) Mar 23 2011 How does one solve the issues of self/mutual/circular pattern recursion ...

Robert Jacques (8/128) Mar 23 2011 How do you solve it at runtime? Then apply CTFE. Alternatively, how woul...

Ilya Pupatenko (7/39) Mar 23 2011 Ok, it sounds good. But still in most cases we are not interesting only

Robert Jacques (31/70) Mar 23 2011 I don't have any experience with using parser generators, but using arra...

BlazingWhitester (5/79) Mar 23 2011 Mimicking spirit might not be a good idea. It looks sort of like BNF
spir (25/30) Mar 23 2011 Do you mean the grammer itself to be D source code? (Instead of EBNF-lik...

Ilya Pupatenko <pupatenko gmail.com> writes:

Hi,

First of all, I want to be polite so I have to introduce myself (you can 
skip this paragraph if you feel tired of newcomer-students’ posts). My 
name is Ilya, I’m a Master student of IT department of Novosibirsk State 
University (Novosibirsk, Russia). In Soviet period Novosibirsk became on 
of the most important science center in the country and now there are 
very close relations between University and Academy of Science. That’s 
why it’s difficult and very interesting to study here. But I’m not 
planning to study or work this summer, so I’ll be able to work (nearly) 
full time on GSoC project. My primary specialization is seismic 
tomography inverse problems, but I’m also interested in programming 
language implementation and compilation theory. I have good knowledge of 

knowledge of compilation theory, some experience in implementing lexers, 
parsers and translators, basic knowledge of lex/yacc/antlr and some 
knowledge of Boost.Spirit library. I’m not an expert in D now, but I 
willing to learn and to solve difficult tasks, that’s why I decided to 
apply on the GSoC.

I’m still working on my proposal (on task “Lexing and Parsing”), but I 
want to write some general ideas and ask some questions.

1. It is said that “it is possible to write a highly-integrated 
lexer/perser generator in D without resorting to additional tools”. As I 
understand, the library should allow programmer to write grammar 
directly in D (ideally, the syntax should be somehow similar to EBNF) 
and the resulting parser will be generated by D compiler while compiling 
the program. This method allows integration of parsing in D code; it can 
make code simpler and even sometimes more efficient.
There is a library for C++ (named Boost.Spirit) that follows the same 
idea. It provide (probably not ideal but very nice) “EBNF-like” syntax 
to write a grammar, it’s quite powerful, fast and flexible. There are 
three parts in this library (actually there are 4 parts but we’re not 
interested in Spirit.Classic now):
• Spirit.Qi (parser library that allows to build recursive descent parsers);
• Spirit.Karma (generator library);
• Spirit.Lex (library usable to create tokenizers).
The Spirit library uses “C++ template black magic” heavily (for example, 
via Boost.Fusion). But D has greater metaprogramming abilities, so it is 
possible to implement the same functionality in easier and “clean” way.
So, the question is: is it a good idea if at least parser library 
architecture will be somewhat similar to Spirit one? Of course it is not 
about “blind” copying; but creating architecture for such a big system 
completely from scratch is quite difficult indeed. If to be exact, I 
like an idea of parser attributes, I like the way semantic actions are 
described, and the “auto-rules” seems really useful.

2. Boost.Spirit is really large and complicated library. And I doubt 
that it is possible to implement library of comparable level in three 
months. That’s why it is extremely important to have a plan (which 
features should be implemented and how much time will it take). I’m 
still working on it but I have some preliminary questions.
Should I have a library that is proposed and accepted in Phobos before 
the end of GSoC? Or there is no such strict timeframe and I can propose 
a library when all features I want to see are implemented and tested well?
And another question. Is it ok to concentrate first on parser library 
and then “move” to other parts? Of course I can choose another part to 
start work on, but it seems to me that parser is most useful and 
interesting part.

3. Finally, what will be next. I’ll try to make a plan (which parts 
should be implemented and when). Then I guess I need to describe the 
proposed architecture in more details, and probably provide some usage 
examples(?). Is it ok, if I publish ideas there to get reviews?
Anyway, I’ll need some time to work on it.

Ilya.


trying (just for fun) to implement some tiny part of Spirit in D. 
Submitting bugs seems to be important part of the task too.

Mar 22 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 22 Mar 2011 18:27:51 -0400, Ilya Pupatenko <pupatenko gmail.com>  
wrote:

 Hi,

 First of all, I want to be polite so I have to introduce myself (you can  
 skip this paragraph if you feel tired of newcomer-students’ posts). My  
 name is Ilya, I’m a Master student of IT department of Novosibirsk State  
 University (Novosibirsk, Russia). In Soviet period Novosibirsk became on  
 of the most important science center in the country and now there are  
 very close relations between University and Academy of Science. That’s  
 why it’s difficult and very interesting to study here. But I’m not  
 planning to study or work this summer, so I’ll be able to work (nearly)  
 full time on GSoC project. My primary specialization is seismic  
 tomography inverse problems, but I’m also interested in programming  
 language implementation and compilation theory. I have good knowledge of  

 knowledge of compilation theory, some experience in implementing lexers,  
 parsers and translators, basic knowledge of lex/yacc/antlr and some  
 knowledge of Boost.Spirit library. I’m not an expert in D now, but I  
 willing to learn and to solve difficult tasks, that’s why I decided to  
 apply on the GSoC.

 I’m still working on my proposal (on task “Lexing and Parsing”), but I  
 want to write some general ideas and ask some questions.

 1. It is said that “it is possible to write a highly-integrated  
 lexer/perser generator in D without resorting to additional tools”. As I  
 understand, the library should allow programmer to write grammar  
 directly in D (ideally, the syntax should be somehow similar to EBNF)  
 and the resulting parser will be generated by D compiler while compiling  
 the program. This method allows integration of parsing in D code; it can  
 make code simpler and even sometimes more efficient.
 There is a library for C++ (named Boost.Spirit) that follows the same  
 idea. It provide (probably not ideal but very nice) “EBNF-like” syntax  
 to write a grammar, it’s quite powerful, fast and flexible. There are  
 three parts in this library (actually there are 4 parts but we’re not  
 interested in Spirit.Classic now):
 • Spirit.Qi (parser library that allows to build recursive descent  
 parsers);
 • Spirit.Karma (generator library);
 • Spirit.Lex (library usable to create tokenizers).
 The Spirit library uses “C++ template black magic” heavily (for example,  
 via Boost.Fusion). But D has greater metaprogramming abilities, so it is  
 possible to implement the same functionality in easier and “clean” way.
 So, the question is: is it a good idea if at least parser library  
 architecture will be somewhat similar to Spirit one? Of course it is not  
 about “blind” copying; but creating architecture for such a big system  
 completely from scratch is quite difficult indeed. If to be exact, I  
 like an idea of parser attributes, I like the way semantic actions are  
 described, and the “auto-rules” seems really useful.

I'm not qualified to speak on Spirits internal architecture; I've only  
used it once for something very simple and ran into a one-liner bug which  
remains unfixed 7+ years later. But the basic API of Spirit would be wrong  
for D. “it is possible to write a highly-integrated lexer/perser generator  
in D without resorting to additional tools” does not mean "the library  
should allow programmer to write grammar directly in D (ideally, the  
syntax should be somehow similar to EBNF)" it means that the library  
should allow you to write a grammar in EBNF and then through a combination  
of templates, string mixins and compile-time function evaluation generate  
the appropriate (hopefully optimal) parser. D's compile-time programming  
abilities are strong enough to do the code generation job usually left to  
separate tools. Ultimately a user of the library should be able to declare  
a parser something like this:

// Declare a parser for Wikipedia's EBNF sample language
Parser!`
(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
            'BEGIN' , white space ,
            { assignment , ";" , white space } ,
            'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                      | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                      | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                      | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
` wikiLangParser;

Mar 22 2011

spir <denis.spir gmail.com> writes:

On 03/23/2011 05:39 AM, Robert Jacques wrote:
 On Tue, 22 Mar 2011 18:27:51 -0400, Ilya Pupatenko <pupatenko gmail.com> wrote:

 Hi,

 First of all, I want to be polite so I have to introduce myself (you can skip
 this paragraph if you feel tired of newcomer-students’ posts). My name is
 Ilya, I’m a Master student of IT department of Novosibirsk State University
 (Novosibirsk, Russia). In Soviet period Novosibirsk became on of the most
 important science center in the country and now there are very close
 relations between University and Academy of Science. That’s why it’s
 difficult and very interesting to study here. But I’m not planning to study
 or work this summer, so I’ll be able to work (nearly) full time on GSoC
 project. My primary specialization is seismic tomography inverse problems,
 but I’m also interested in programming language implementation and

 “intermediate” knowledge of D language, knowledge of compilation theory,
some
 experience in implementing lexers, parsers and translators, basic knowledge
 of lex/yacc/antlr and some knowledge of Boost.Spirit library. I’m not an
 expert in D now, but I willing to learn and to solve difficult tasks, that’s
 why I decided to apply on the GSoC.

 I’m still working on my proposal (on task “Lexing and Parsing”), but I
want
 to write some general ideas and ask some questions.

 1. It is said that “it is possible to write a highly-integrated lexer/perser
 generator in D without resorting to additional tools”. As I understand, the
 library should allow programmer to write grammar directly in D (ideally, the
 syntax should be somehow similar to EBNF) and the resulting parser will be
 generated by D compiler while compiling the program. This method allows
 integration of parsing in D code; it can make code simpler and even sometimes
 more efficient.
 There is a library for C++ (named Boost.Spirit) that follows the same idea.
 It provide (probably not ideal but very nice) “EBNF-like” syntax to write a
 grammar, it’s quite powerful, fast and flexible. There are three parts in
 this library (actually there are 4 parts but we’re not interested in
 Spirit.Classic now):
 • Spirit.Qi (parser library that allows to build recursive descent parsers);
 • Spirit.Karma (generator library);
 • Spirit.Lex (library usable to create tokenizers).
 The Spirit library uses “C++ template black magic” heavily (for example,
via
 Boost.Fusion). But D has greater metaprogramming abilities, so it is possible
 to implement the same functionality in easier and “clean” way.
 So, the question is: is it a good idea if at least parser library
 architecture will be somewhat similar to Spirit one? Of course it is not
 about “blind” copying; but creating architecture for such a big system
 completely from scratch is quite difficult indeed. If to be exact, I like an
 idea of parser attributes, I like the way semantic actions are described, and
 the “auto-rules” seems really useful.

 I'm not qualified to speak on Spirits internal architecture; I've only used it
 once for something very simple and ran into a one-liner bug which remains
 unfixed 7+ years later. But the basic API of Spirit would be wrong for D. “it
 is possible to write a highly-integrated lexer/perser generator in D without
 resorting to additional tools” does not mean "the library should allow
 programmer to write grammar directly in D (ideally, the syntax should be
 somehow similar to EBNF)" it means that the library should allow you to write a
 grammar in EBNF and then through a combination of templates, string mixins and
 compile-time function evaluation generate the appropriate (hopefully optimal)
 parser. D's compile-time programming abilities are strong enough to do the code
 generation job usually left to separate tools. Ultimately a user of the library
 should be able to declare a parser something like this:

 // Declare a parser for Wikipedia's EBNF sample language
 Parser!`
 (* a simple program syntax in EBNF − Wikipedia *)
 program = 'PROGRAM' , white space , identifier , white space ,
 'BEGIN' , white space ,
 { assignment , ";" , white space } ,
 'END.' ;
 identifier = alphabetic character , { alphabetic character | digit } ;
 number = [ "-" ] , digit , { digit } ;
 string = '"' , { all characters − '"' } , '"' ;
 assignment = identifier , ":=" , ( number | identifier | string ) ;
 alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
 | "H" | "I" | "J" | "K" | "L" | "M" | "N"
 | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
 | "V" | "W" | "X" | "Y" | "Z" ;
 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 white space = ? white space characters ? ;
 all characters = ? all visible characters ? ;
 ` wikiLangParser;

How does one solve the issues of self/mutual/circular pattern recursion at 
compile-time?

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Mar 23 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 23 Mar 2011 08:22:10 -0400, spir <denis.spir gmail.com> wrote:
 On 03/23/2011 05:39 AM, Robert Jacques wrote:
 On Tue, 22 Mar 2011 18:27:51 -0400, Ilya Pupatenko  
 <pupatenko gmail.com> wrote:

 Hi,

 First of all, I want to be polite so I have to introduce myself (you  
 can skip
 this paragraph if you feel tired of newcomer-students’ posts). My name  
 is
 Ilya, I’m a Master student of IT department of Novosibirsk State  
 University
 (Novosibirsk, Russia). In Soviet period Novosibirsk became on of the  
 most
 important science center in the country and now there are very close
 relations between University and Academy of Science. That’s why it’s
 difficult and very interesting to study here. But I’m not planning to  
 study
 or work this summer, so I’ll be able to work (nearly) full time on GSoC
 project. My primary specialization is seismic tomography inverse  
 problems,
 but I’m also interested in programming language implementation and

 “intermediate” knowledge of D language, knowledge of compilation  
 theory, some
 experience in implementing lexers, parsers and translators, basic  
 knowledge
 of lex/yacc/antlr and some knowledge of Boost.Spirit library. I’m not  
 an
 expert in D now, but I willing to learn and to solve difficult tasks,  
 that’s
 why I decided to apply on the GSoC.

 I’m still working on my proposal (on task “Lexing and Parsing”), but I  
 want
 to write some general ideas and ask some questions.

 1. It is said that “it is possible to write a highly-integrated  
 lexer/perser
 generator in D without resorting to additional tools”. As I  
 understand, the
 library should allow programmer to write grammar directly in D  
 (ideally, the
 syntax should be somehow similar to EBNF) and the resulting parser  
 will be
 generated by D compiler while compiling the program. This method allows
 integration of parsing in D code; it can make code simpler and even  
 sometimes
 more efficient.
 There is a library for C++ (named Boost.Spirit) that follows the same  
 idea.
 It provide (probably not ideal but very nice) “EBNF-like” syntax to  
 write a
 grammar, it’s quite powerful, fast and flexible. There are three parts  
 in
 this library (actually there are 4 parts but we’re not interested in
 Spirit.Classic now):
 • Spirit.Qi (parser library that allows to build recursive descent  
 parsers);
 • Spirit.Karma (generator library);
 • Spirit.Lex (library usable to create tokenizers).
 The Spirit library uses “C++ template black magic” heavily (for  
 example, via
 Boost.Fusion). But D has greater metaprogramming abilities, so it is  
 possible
 to implement the same functionality in easier and “clean” way.
 So, the question is: is it a good idea if at least parser library
 architecture will be somewhat similar to Spirit one? Of course it is  
 not
 about “blind” copying; but creating architecture for such a big system
 completely from scratch is quite difficult indeed. If to be exact, I  
 like an
 idea of parser attributes, I like the way semantic actions are  
 described, and
 the “auto-rules” seems really useful.

 I'm not qualified to speak on Spirits internal architecture; I've only  
 used it
 once for something very simple and ran into a one-liner bug which  
 remains
 unfixed 7+ years later. But the basic API of Spirit would be wrong for  
 D. “it
 is possible to write a highly-integrated lexer/perser generator in D  
 without
 resorting to additional tools” does not mean "the library should allow
 programmer to write grammar directly in D (ideally, the syntax should be
 somehow similar to EBNF)" it means that the library should allow you to  
 write a
 grammar in EBNF and then through a combination of templates, string  
 mixins and
 compile-time function evaluation generate the appropriate (hopefully  
 optimal)
 parser. D's compile-time programming abilities are strong enough to do  
 the code
 generation job usually left to separate tools. Ultimately a user of the  
 library
 should be able to declare a parser something like this:

 // Declare a parser for Wikipedia's EBNF sample language
 Parser!`
 (* a simple program syntax in EBNF − Wikipedia *)
 program = 'PROGRAM' , white space , identifier , white space ,
 'BEGIN' , white space ,
 { assignment , ";" , white space } ,
 'END.' ;
 identifier = alphabetic character , { alphabetic character | digit } ;
 number = [ "-" ] , digit , { digit } ;
 string = '"' , { all characters − '"' } , '"' ;
 assignment = identifier , ":=" , ( number | identifier | string ) ;
 alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
 | "H" | "I" | "J" | "K" | "L" | "M" | "N"
 | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
 | "V" | "W" | "X" | "Y" | "Z" ;
 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 white space = ? white space characters ? ;
 all characters = ? all visible characters ? ;
 ` wikiLangParser;

 How does one solve the issues of self/mutual/circular pattern recursion  
 at compile-time?

 Denis

How do you solve it at runtime? Then apply CTFE. Alternatively, how would  
you solve it with a functional language, then apply templates. I think you  
missed a point though; Parser generates a parser given a EBNF grammar. And  
therefore would internally behave like any other DSL -> code generation  
tool (except it would be a library).

P.S. self/mutual/circular pattern recursion occurs all the time in  
templates and CTFE.

Mar 23 2011

Ilya Pupatenko <pupatenko gmail.com> writes:

 I'm not qualified to speak on Spirits internal architecture; I've only
 used it once for something very simple and ran into a one-liner bug
 which remains unfixed 7+ years later. But the basic API of Spirit would
 be wrong for D. “it is possible to write a highly-integrated
 lexer/perser generator in D without resorting to additional tools” does
 not mean "the library should allow programmer to write grammar directly
 in D (ideally, the syntax should be somehow similar to EBNF)" it means
 that the library should allow you to write a grammar in EBNF and then
 through a combination of templates, string mixins and compile-time
 function evaluation generate the appropriate (hopefully optimal) parser.
 D's compile-time programming abilities are strong enough to do the code
 generation job usually left to separate tools. Ultimately a user of the
 library should be able to declare a parser something like this:

 // Declare a parser for Wikipedia's EBNF sample language
 Parser!`
 (* a simple program syntax in EBNF − Wikipedia *)
 program = 'PROGRAM' , white space , identifier , white space ,
             'BEGIN' , white space ,
             { assignment , ";" , white space } ,
             'END.' ;
 identifier = alphabetic character , { alphabetic character | digit } ;
 number = [ "-" ] , digit , { digit } ;
 string = '"' , { all characters − '"' } , '"' ;
 assignment = identifier , ":=" , ( number | identifier | string ) ;
 alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                       | "V" | "W" | "X" | "Y" | "Z" ;
 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 white space = ? white space characters ? ;
 all characters = ? all visible characters ? ;
 ` wikiLangParser;

Ok, it sounds good. But still in most cases we are not interesting only 
if input text match specified grammar. We want to perform some semantic 
actions while parsing, for example build some kind of AST, evaluate an 
expression and so on. But I have no idea how can I ask this parser to 
perform user-defined actions for example for 'string' and 'number' 
"nodes" in this case.


Ilya.

Mar 23 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 23 Mar 2011 13:31:04 -0400, Ilya Pupatenko <pupatenko gmail.com>  
wrote:

 I'm not qualified to speak on Spirits internal architecture; I've only
 used it once for something very simple and ran into a one-liner bug
 which remains unfixed 7+ years later. But the basic API of Spirit would
 be wrong for D. “it is possible to write a highly-integrated
 lexer/perser generator in D without resorting to additional tools” does
 not mean "the library should allow programmer to write grammar directly
 in D (ideally, the syntax should be somehow similar to EBNF)" it means
 that the library should allow you to write a grammar in EBNF and then
 through a combination of templates, string mixins and compile-time
 function evaluation generate the appropriate (hopefully optimal) parser.
 D's compile-time programming abilities are strong enough to do the code
 generation job usually left to separate tools. Ultimately a user of the
 library should be able to declare a parser something like this:

 // Declare a parser for Wikipedia's EBNF sample language
 Parser!`
 (* a simple program syntax in EBNF − Wikipedia *)
 program = 'PROGRAM' , white space , identifier , white space ,
             'BEGIN' , white space ,
             { assignment , ";" , white space } ,
             'END.' ;
 identifier = alphabetic character , { alphabetic character | digit } ;
 number = [ "-" ] , digit , { digit } ;
 string = '"' , { all characters − '"' } , '"' ;
 assignment = identifier , ":=" , ( number | identifier | string ) ;
 alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                       | "V" | "W" | "X" | "Y" | "Z" ;
 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 white space = ? white space characters ? ;
 all characters = ? all visible characters ? ;
 ` wikiLangParser;

 Ok, it sounds good. But still in most cases we are not interesting only  
 if input text match specified grammar. We want to perform some semantic  
 actions while parsing, for example build some kind of AST, evaluate an  
 expression and so on. But I have no idea how can I ask this parser to  
 perform user-defined actions for example for 'string' and 'number'  
 "nodes" in this case.

I don't have any experience with using parser generators, but using arrays  
of delegates works really well for GUI libraries. For example:

wikiLangParser.digit ~= (ref wikiLangParser.Token digit) {
	auto tokens = digit.tokens;
	assert(tokens.length == 1);
	digit.value = 0 + (token.front.value.get!string.front - '0');
}

wikiLangParser.number ~= (ref wikiLangParser.Token number) {
	auto tokens = number.tokens;
	assert(!tokens.empty);

	bool negative = false
	if(tokens.front.get!string == "-") {
		negative = true;
		tokens.popFront;
	}
	
	int value = 0;
	foreach(token; tokens) {
		value = value * 10 + token.value.get!int;
	}

	if(negative)
		value = -value;

	number.value = value;
}

debug {
	wikiLangParser.number ~= (ref wikiLangParser.Token number) {
		writeln("Parsed number (",number.value,")");
	}
}

Mar 23 2011

BlazingWhitester <max.klyga gmail.com> writes:

On 2011-03-23 00:27:51 +0200, Ilya Pupatenko said:

 Hi,
 
 First of all, I want to be polite so I have to introduce myself (you 
 can skip this paragraph if you feel tired of newcomer-students’ posts). 
 My name is Ilya, I’m a Master student of IT department of Novosibirsk 
 State University (Novosibirsk, Russia). In Soviet period Novosibirsk 
 became on of the most important science center in the country and now 
 there are very close relations between University and Academy of 
 Science. That’s why it’s difficult and very interesting to study here. 
 But I’m not planning to study or work this summer, so I’ll be able to 
 work (nearly) full time on GSoC project. My primary specialization is 
 seismic tomography inverse problems, but I’m also interested in 
 programming language implementation and compilation theory. I have good 

 language, knowledge of compilation theory, some experience in 
 implementing lexers, parsers and translators, basic knowledge of 
 lex/yacc/antlr and some knowledge of Boost.Spirit library. I’m not an 
 expert in D now, but I willing to learn and to solve difficult tasks, 
 that’s why I decided to apply on the GSoC.
 
 I’m still working on my proposal (on task “Lexing and Parsing”), but I 
 want to write some general ideas and ask some questions.
 
 1. It is said that “it is possible to write a highly-integrated 
 lexer/perser generator in D without resorting to additional tools”. As 
 I understand, the library should allow programmer to write grammar 
 directly in D (ideally, the syntax should be somehow similar to EBNF) 
 and the resulting parser will be generated by D compiler while 
 compiling the program. This method allows integration of parsing in D 
 code; it can make code simpler and even sometimes more efficient.
 There is a library for C++ (named Boost.Spirit) that follows the same 
 idea. It provide (probably not ideal but very nice) “EBNF-like” syntax 
 to write a grammar, it’s quite powerful, fast and flexible. There are 
 three parts in this library (actually there are 4 parts but we’re not 
 interested in Spirit.Classic now):
 • Spirit.Qi (parser library that allows to build recursive descent parsers);
 • Spirit.Karma (generator library);
 • Spirit.Lex (library usable to create tokenizers).
 The Spirit library uses “C++ template black magic” heavily (for 
 example, via Boost.Fusion). But D has greater metaprogramming 
 abilities, so it is possible to implement the same functionality in 
 easier and “clean” way.
 So, the question is: is it a good idea if at least parser library 
 architecture will be somewhat similar to Spirit one? Of course it is 
 not about “blind” copying; but creating architecture for such a big 
 system completely from scratch is quite difficult indeed. If to be 
 exact, I like an idea of parser attributes, I like the way semantic 
 actions are described, and the “auto-rules” seems really useful.
 
 2. Boost.Spirit is really large and complicated library. And I doubt 
 that it is possible to implement library of comparable level in three 
 months. That’s why it is extremely important to have a plan (which 
 features should be implemented and how much time will it take). I’m 
 still working on it but I have some preliminary questions.
 Should I have a library that is proposed and accepted in Phobos before 
 the end of GSoC? Or there is no such strict timeframe and I can propose 
 a library when all features I want to see are implemented and tested 
 well?
 And another question. Is it ok to concentrate first on parser library 
 and then “move” to other parts? Of course I can choose another part to 
 start work on, but it seems to me that parser is most useful and 
 interesting part.
 
 3. Finally, what will be next. I’ll try to make a plan (which parts 
 should be implemented and when). Then I guess I need to describe the 
 proposed architecture in more details, and probably provide some usage 
 examples(?). Is it ok, if I publish ideas there to get reviews?
 Anyway, I’ll need some time to work on it.
 
 Ilya.
 

 trying (just for fun) to implement some tiny part of Spirit in D. 
 Submitting bugs seems to be important part of the task too.

Mimicking spirit might not be a good idea. It looks sort of like BNF 
grammar, but because of operator abuse, there is just so many noise.
A better idea might be using D compile time function evaluation to 
parse strings with grammars

Mar 23 2011

spir <denis.spir gmail.com> writes:

On 03/22/2011 11:27 PM, Ilya Pupatenko wrote:
 As I understand, the library should allow programmer to write grammar directly
 in D (ideally, the syntax should be somehow similar to EBNF) and the resulting
 parser will be generated by D compiler while compiling the program. This method
 allows integration of parsing in D code; it can make code simpler and even
 sometimes more efficient.

Do you mean the grammer itself to be D source code? (Instead of EBNF-like plain 
text compiled by a parser generator.) If yes, then you may have a look at 
pyparsing for a similar tool: a python parsing lib in which the grammer is 
written in python. The particular point of pyparsing is that it is based on 
PEG, which you may lkike or not.

I have used it for a while, it works very fine, practically. What I don't like 
is its using of syntactic tricks to make pattern expressions supposedly more 
"natural", but in fact creates an obstacle in beeing itself a parallel language 
to be learnt. For this reason and some others, I wrote a variant (2, in fact) 
where patterns really are plain source code without tricks, eg:
	digits = String(Klass("0..9"))
	sign = Choice(Char('+'), Char('-'))
	integer = Compose(Optional(sign), digits)
I have an implementation of such a parsing lib in and for D (mostly working, 
but probably with many points to enhance, in particular for performance). It 
allows associating "match actions" to patterns:
	integer = (new Compose(Optional(sign), digits)) (toInt);
	plus = (new Char('+')) (drop)
	intSum = new Compose(integer, plus, integer) (doSum);

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Mar 23 2011

D Programming

C/C++ Programming

Other

digitalmars.D - =?UTF-8?B?W0dTb0PigJkxMV0gTGV4aW5nIGFuZCBwYXJzaW5n?=