digitalmars.D.announce - Pry v0.3.1 is out!

Dmitry Olshansky (21/21) Jan 14 2017 Pry is a new pragmatic parser combinators library.

Dicebot (13/15) Jan 15 2017 protected-headers="v1"

Dmitry Olshansky (20/25) Jan 15 2017 Each parser has a value type that it extracts from the stream. It can be...

Dicebot (5/8) Jan 15 2017 Thanks for explanation! This is indeed very promising and much in

Dmitry Olshansky (9/12) Jan 15 2017 Actually testing the latest version with LDC I found out that
Bastiaan Veelo (4/6) Jan 15 2017 Interesting. How about left-recursion? (I added support for

Dmitry Olshansky (8/14) Jan 16 2017 I think left-recursion is better handled at the grammar level.

Bastiaan Veelo (17/20) Jan 17 2017 Handling left-recursion by grammar transformation often has

Dmitry Olshansky (6/25) Jan 17 2017 I do not suggest to change the grammar itself, I think that processing

Bastiaan Veelo (31/54) Jan 17 2017 Interesting. Would that work with indirect left-recursion? That

Bastiaan Veelo (11/13) Jan 17 2017 I didn't profile, but apart from the time-complexity that is

Dmitry Olshansky (6/16) Jan 20 2017 I considered to work with Pegged but I decided against it, because my

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

Pry is a new pragmatic parser combinators library.

https://github.com/DmitryOlshansky/pry

(also available on Dub)

It's still in the early stages but I think it might be a good time to 
gauge some interest.

Two key areas of focus are (compared to say Pegged):
- performance, on par with hand-written code or die
- versatility, generating some goofy parse tree is not a goal, the goal 
is extraction of data the way the user specifies

For now it contains a modest example of a calculator app and a benchmark 
inspired by it. The interesting tidbit is that this version already 
includes an optimization that typically prevents recursive descent 
parser from falling into exponential behavior.

Future directions:
- actually add grammar layer on top of combinators, it is fancy but useful
- more built-in parsers to play with, including some stuff inspired by 
std.regex
- some goodies for usability, e.g. shortcuts to avoid constructing 
Stream out of string etc.

---
Dmitry Olshansky

Jan 14 2017

Dicebot <public dicebot.lv> writes:

 protected-headers="v1"
From: Dicebot <public dicebot.lv>
Newsgroups: d,i,g,i,t,a,l,m,a,r,s,.,D,.,a,n,n,o,u,n,c,e
Subject: Re: Pry v0.3.1 is out!
References: <o5ej3g$27f2$1 digitalmars.com>
In-Reply-To: <o5ej3g$27f2$1 digitalmars.com>

--M290e2qNuu60taXBOLQ2DuElHvRp788KV
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Sounds intriguing!

On 01/15/2017 01:26 AM, Dmitry Olshansky wrote:
 - versatility, generating some goofy parse tree is not a goal, the goal=

 is extraction of data the way the user specifies

Can you show an example of what you have in mind for this?

--M290e2qNuu60taXBOLQ2DuElHvRp788KV--

Jan 15 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 1/15/17 12:43 PM, Dicebot wrote:
 Sounds intriguing!

 On 01/15/2017 01:26 AM, Dmitry Olshansky wrote:
 - versatility, generating some goofy parse tree is not a goal, the goal
 is extraction of data the way the user specifies

 Can you show an example of what you have in mind for this?

Each parser has a value type that it extracts from the stream. It can be 
an AST node but doesn't have to. For instance, taking a page form the 
calculator example:

auto expr = dynamic!int;
auto primary = any(
	range!('0', '9').rep.map!(x => x.to!int),
	seq(tk!'(', expr, tk!')').map!(x => x[1])
);


Here 'expr' is a parser that produces an int that will be defined later. 
Then we see how digits are converted to string: 'range' has value of 
parsed digit, rep tells to apply it repeatedly and use the slice of 
input as the value. Finally map converts parser producing slice to 
parser producing int.

Similarly the result of 'seq' is a tuple and map takes the middle one 
('expr'), that produces an int.


I could have wasted time by creating nodes and assigning values in the 
map functions but if you just want the result of calculation it's all moot.

---
Dmitry Olshansky

Jan 15 2017

Dicebot <public dicebot.lv> writes:

On Sunday, 15 January 2017 at 13:14:45 UTC, Dmitry Olshansky 
wrote:
 I could have wasted time by creating nodes and assigning values 
 in the map functions but if you just want the result of 
 calculation it's all moot.

Thanks for explanation! This is indeed very promising and much in 
spirit of D selling point of zero-overhead convenience 
abstractions.

Jan 15 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 1/15/17 2:26 AM, Dmitry Olshansky wrote:
 Pry is a new pragmatic parser combinators library.

[snip]
 Two key areas of focus are (compared to say Pegged):
 - performance, on par with hand-written code or die

Actually testing the latest version with LDC I found out that
handwritten code is a bit *slower*. Beats me, as I spent quite some time
laying out that handwritten stuff.

All in all, this makes me confident that I soon will never have to
write parsers by hand, the last nebulous reason is out.

---
Dmitry Olshansky

Jan 15 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Sunday, 15 January 2017 at 01:26:07 UTC, Dmitry Olshansky 
wrote:
 Pry is a new pragmatic parser combinators library.

 https://github.com/DmitryOlshansky/pry

Interesting. How about left-recursion? (I added support for 
left-recursive grammars to Pegged.)

Jan 15 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 1/16/17 1:29 AM, Bastiaan Veelo wrote:
 On Sunday, 15 January 2017 at 01:26:07 UTC, Dmitry Olshansky wrote:
 Pry is a new pragmatic parser combinators library.

 https://github.com/DmitryOlshansky/pry

 Interesting. How about left-recursion? (I added support for
 left-recursive grammars to Pegged.)

I think left-recursion is better handled at the grammar level.
What I currently have is parser combinators level where adding
this transformation is awkward and too much magic IMO.

However I plan to add a grammar on top of combinators, and yes handling 
left-recursive grammars is going to be an interesting challenge.

---
Dmitry Olshansky

Jan 16 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Monday, 16 January 2017 at 22:29:01 UTC, Dmitry Olshansky 
wrote:
 I think left-recursion is better handled at the grammar level.
 What I currently have is parser combinators level where adding
 this transformation is awkward and too much magic IMO.

Handling left-recursion by grammar transformation often has 
unwanted side-effects (operator precedence) and eliminating 
indirect left-recursion this way can be impossible in practice. 
Depending on the complexity of the grammar, even identifying the 
recursive loop can be a challenge.

The trouble is that one can be happily implementing a parser or 
designing a grammar when suddenly for some input the parser hangs 
indefinitely. Users are likely quick to blame the parser lib, 
while in fact it is the grammar that has left-recursion. Hitting 
that roadblock is a real bummer.

In some cases the grammar is a given (by a standard for example) 
and transforming it to combat left-recursion can obfuscate it 
beyond recognition. Hardening Pegged to deal with the various 
kinds of left-recursion was very challenging, but easier than 
transforming the grammar I was dealing with (ISO 10206).

Jan 17 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 1/17/17 1:16 PM, Bastiaan Veelo wrote:
 On Monday, 16 January 2017 at 22:29:01 UTC, Dmitry Olshansky wrote:
 I think left-recursion is better handled at the grammar level.
 What I currently have is parser combinators level where adding
 this transformation is awkward and too much magic IMO.

 Handling left-recursion by grammar transformation often has unwanted
 side-effects (operator precedence) and eliminating indirect
 left-recursion this way can be impossible in practice. Depending on the
 complexity of the grammar, even identifying the recursive loop can be a
 challenge.

I do not suggest to change the grammar itself, I think that processing 
of the grammar may perform hidden transformations internally.

 The trouble is that one can be happily implementing a parser or
 designing a grammar when suddenly for some input the parser hangs
 indefinitely. Users are likely quick to blame the parser lib, while in
 fact it is the grammar that has left-recursion. Hitting that roadblock
 is a real bummer.

 In some cases the grammar is a given (by a standard for example) and
 transforming it to combat left-recursion can obfuscate it beyond
 recognition. Hardening Pegged to deal with the various kinds of
 left-recursion was very challenging, but easier than transforming the
 grammar I was dealing with (ISO 10206).

Interesting, what kind of hardening?

---
Dmitry Olshansky

Jan 17 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Tuesday, 17 January 2017 at 21:10:17 UTC, Dmitry Olshansky 
wrote:
 On 1/17/17 1:16 PM, Bastiaan Veelo wrote:
 On Monday, 16 January 2017 at 22:29:01 UTC, Dmitry Olshansky 
 wrote:
 I think left-recursion is better handled at the grammar level.
 What I currently have is parser combinators level where adding
 this transformation is awkward and too much magic IMO.

 Handling left-recursion by grammar transformation often has 
 unwanted
 side-effects (operator precedence) and eliminating indirect
 left-recursion this way can be impossible in practice. 
 Depending on the
 complexity of the grammar, even identifying the recursive loop 
 can be a
 challenge.

 I do not suggest to change the grammar itself, I think that 
 processing of the grammar may perform hidden transformations 
 internally.

Interesting. Would that work with indirect left-recursion? That 
would be daunting, I fear. It'l still mess with order of 
operation / associativity, won't it?

 Hardening Pegged to deal with the various kinds of
 left-recursion was very challenging, but easier than 
 transforming the
 grammar I was dealing with (ISO 10206).

 Interesting, what kind of hardening?

The trick is to recurse, but knowing when to stop. At a certain 
point, recursing once more will reduce the length of the matched 
string. So if you can break recursion when the match stops 
growing, you are good. This is implemented by memoizing previous 
recursions.
There is an explanation along with literature references in my 
first PR on left-recursion [1]. However, the PR itself was flawed 
[2] and received several makeovers [3-5]. In the end, the 
solution was considerably more complex than the initial PR, 
involving introspection of the grammar to identify left-recursive 
loops and the appropriate rules to memoize. We now have complete 
support for direct, indirect and hidden left-recursion, including 
multiple intersecting loops and mixed left- and right-recursion. 
And it does not interfere with general memoization, which is 
important to speed up PEG parsers [6]. There are some 
illustrative unit tests in [7] and further.

Bastiaan.

[1] https://github.com/PhilippeSigaud/Pegged/pull/164
[2] 
https://github.com/PhilippeSigaud/Pegged/pull/165#issuecomment-158914006
[3] https://github.com/PhilippeSigaud/Pegged/pull/168
[4] https://github.com/PhilippeSigaud/Pegged/pull/186
[5] https://github.com/PhilippeSigaud/Pegged/pull/196
[6] Pegged does not memoize at CT though, at the moment.
[7] 
https://github.com/PhilippeSigaud/Pegged/blob/master/pegged/grammar.d#L2965

Jan 17 2017

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Sunday, 15 January 2017 at 01:26:07 UTC, Dmitry Olshansky 
wrote:
 Two key areas of focus are (compared to say Pegged):
 - performance, on par with hand-written code or die

I didn't profile, but apart from the time-complexity that is 
inherent to straight forward recursive descent parsers like 
Pegged (improved by memoization), my impression is that Pegged's 
main performance problem is due to a very high constant: it is 
concatenating strings, all the time. A LOT of strings, and a LOT 
of concatenations. Almost all of them are discarded, of course. 
Finding a way to do the same without string concatenation would 
have a big impact.

Bastiaan.

Jan 17 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 1/17/17 11:52 PM, Bastiaan Veelo wrote:
 On Sunday, 15 January 2017 at 01:26:07 UTC, Dmitry Olshansky wrote:
 Two key areas of focus are (compared to say Pegged):
 - performance, on par with hand-written code or die

 I didn't profile, but apart from the time-complexity that is inherent to
 straight forward recursive descent parsers like Pegged (improved by
 memoization), my impression is that Pegged's main performance problem is
 due to a very high constant: it is concatenating strings, all the time.
 A LOT of strings, and a LOT of concatenations. Almost all of them are
 discarded, of course. Finding a way to do the same without string
 concatenation would have a big impact.

I considered to work with Pegged but I decided against it, because my 
priorities are different: get it fast then get it full of features.
IMHO performance is something to think of at design time not afterwards.

---
Dmitry Olshansky

Jan 20 2017

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Pry v0.3.1 is out!