digitalmars.D.learn - Reducing Pegged ASTs

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (24/24) Nov 25 2014 Is there a way to (on the fly) reduce Pegged parse results such as

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (6/8) Nov 25 2014 Correction: For consistency we probably want this example to be

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/8) Nov 25 2014 Correction again:

Etienne Cimon (27/28) Nov 25 2014 I've made an asn.1 parser using pegged tree map, it's not so complex and...
Philippe Sigaud via Digitalmars-d-learn (22/45) Nov 25 2014 The reducing can be done, either with operators or semantic actions.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (15/35) Nov 26 2014 What's the name of this function?

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

Is there a way to (on the fly) reduce Pegged parse results such as

C [0, 6]["int", "x", ";"]
  +-C.TranslationUnit [0, 6]["int", "x", ";"]
     +-C.ExternalDeclaration [0, 6]["int", "x", ";"]
        +-C.Declaration [0, 6]["int", "x", ";"]
           +-C.DeclarationSpecifiers [0, 4]["int"]
           |  +-C.TypeSpecifier [0, 4]["int"]
           +-C.InitDeclaratorList [4, 5]["x"]
              +-C.InitDeclarator [4, 5]["x"]
                 +-C.Declarator [4, 5]["x"]
                    +-C.DirectDeclarator [4, 5]["x"]
                       +-C.Identifier [4, 5]["x"]

to

C [0, 6]["int", "x", ";"]
  +-C.TranslationUnit [0, 6]["int", "x", ";"]
     +-C.ExternalDeclaration [0, 6]["int", "x", ";"]
        +-C.Declaration [0, 6]["int", "x", ";"]
           +-C.TypeSpecifier [0, 4]["int"]
           +-C.Identifier [4, 5]["x"]

and still, when needed, be able query that C.Identifier is an 
instance of C.DirectDeclarator etc?

If not this seems like a cool feature to have ;)

I guess it would reduce memory requirements by about a magnitude 
right?

Nov 25 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Tuesday, 25 November 2014 at 15:12:39 UTC, Nordlöw wrote:
 I guess it would reduce memory requirements by about a 
 magnitude right?

Correction: For consistency we probably want this example to be 
reduced to

+-C.Declaration [0, 6]["int", "x", ";"]
    +-C.TypeSpecifier [0, 4]["int"]
       +-C.Identifier [4, 5]["x"]

Nov 25 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Tuesday, 25 November 2014 at 15:15:40 UTC, Nordlöw wrote:
 +-C.Declaration [0, 6]["int", "x", ";"]
    +-C.TypeSpecifier [0, 4]["int"]
       +-C.Identifier [4, 5]["x"]

Correction again:

+-C.Declaration [0, 6]["int", "x", ";"]
    +-C.TypeSpecifier [0, 4]["int"]
    +-C.Identifier [4, 5]["x"]

Nov 25 2014

Etienne Cimon <etcimon gmail.com> writes:

On 2014-11-25 10:12, "Nordlöw" wrote:
Is there a way to (on the fly) reduce Pegged parse results such as

I've made an asn.1 parser using pegged tree map, it's not so complex and
does the reducing as well.

https://github.com/globecsys/asn1.d

Most of the meat is in asn1/generator/

In short, it's much easier when you put all the info in the same object,
in this case it's an AEntity:
https://github.com/globecsys/asn1.d/blob/master/asn1/generator/asntree.d#L239

When the whole tree is done that way, you can easily traverse it and
move nodes like a linked list.. I've made a helper function here:

https://github.com/globecsys/asn1.d/blob/master/asn1/generator/asntree.d#L10

You can see it being used here:
https://github.com/globecsys/asn1.d/blob/38bd1907498cf69a08604a96394892416f7aa3bd/asn1/generator/asntree.d#L109

and then here:

https://github.com/globecsys/asn1.d/blob/master/asn1/generator/generator.d#L500

Also, the garbage collector really makes it easy to optimize memory
usage, ie. when you use a node in multiple places and need to re-order
the tree elements.

I still have a bunch of work to do, and I intend on replacing botan's
ASN1 functionalities with this and a DER serialization module.

Beware, the pegged structure isn't insanely fast to parse because of the
recursion limits I implemented very inefficiently because I was too lazy
to translate the standard asn.1 BNF into PEG.. Also, the bigger
bottleneck would be error strings.

For a 1-2 months of work (incl. learning ASN.1), I'm very satisfied with
the technology involved and would recommend intermediate structures with
traversal helpers.

Nov 25 2014

Philippe Sigaud via Digitalmars-d-learn writes:

On Tue, Nov 25, 2014 at 4:12 PM, "Nordlöw"
<digitalmars-d-learn puremagic.com> wrote:
 Is there a way to (on the fly) reduce Pegged parse results such as

 C [0, 6]["int", "x", ";"]
  +-C.TranslationUnit [0, 6]["int", "x", ";"]
     +-C.ExternalDeclaration [0, 6]["int", "x", ";"]
        +-C.Declaration [0, 6]["int", "x", ";"]
           +-C.DeclarationSpecifiers [0, 4]["int"]
           |  +-C.TypeSpecifier [0, 4]["int"]
           +-C.InitDeclaratorList [4, 5]["x"]
              +-C.InitDeclarator [4, 5]["x"]
                 +-C.Declarator [4, 5]["x"]
                    +-C.DirectDeclarator [4, 5]["x"]
                       +-C.Identifier [4, 5]["x"]

 to

 C [0, 6]["int", "x", ";"]
  +-C.TranslationUnit [0, 6]["int", "x", ";"]
     +-C.ExternalDeclaration [0, 6]["int", "x", ";"]
        +-C.Declaration [0, 6]["int", "x", ";"]
           +-C.TypeSpecifier [0, 4]["int"]
           +-C.Identifier [4, 5]["x"]

 and still, when needed, be able query that C.Identifier is an instance of
 C.DirectDeclarator etc?

The reducing can be done, either with operators or semantic actions.
IIRC there is a free function in Pegged that does it.
I did not automate it, because every time I cut down severely a parse
tree, I later regret it because I lost information that way.

Cutting while still retaining original info (who is this node's
ancestor) is more difficult: you would have to store it somewhere
anyhow. You cannot create node classes to represent the hierarchy,
because of loops in the grammar: an Identifier can have many different
ancestors.

Note also that Pegged produces parse trees (complete parsing
information), not ASTs. ASTs would indeed be much smaller, but we
would have to define what are the master nodes in the D grammar.

 If not this seems like a cool feature to have ;)

 I guess it would reduce memory requirements by about a magnitude right?

If you want to remember the intermediate nodes you cut down, not
really, since you still need to store them somehow.

I think what's consuming memory right now is that I duplicate the
matched strings at each level, even concatenating them at the higher
levels. I should let them only in the 'leaves' of the tree (heck, like
an AST).

Halas, I've no free time to code anything in D these days... but of
course, feel free to push any pull request you might have!

Nov 25 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Wednesday, 26 November 2014 at 06:09:12 UTC, Philippe Sigaud 
via Digitalmars-d-learn wrote:
 IIRC there is a free function in Pegged that does it.

What's the name of this function?

 I did not automate it, because every time I cut down severely a 
 parse
 tree, I later regret it because I lost information that way.

 Cutting while still retaining original info (who is this node's
 ancestor) is more difficult: you would have to store it 
 somewhere
 anyhow. You cannot create node classes to represent the 
 hierarchy,
 because of loops in the grammar: an Identifier can have many 
 different
 ancestors.

 Note also that Pegged produces parse trees (complete parsing
 information), not ASTs. ASTs would indeed be much smaller, but 
 we
 would have to define what are the master nodes in the D grammar.

What do you mean with master nodes?

 If you want to remember the intermediate nodes you cut down, not
 really, since you still need to store them somehow.

I don't quite understand your formulation in English here. Could 
you elaborate?

 I think what's consuming memory right now is that I duplicate 
 the matched strings at each level

What do you mean with duplicate? Doesn't Pegged use string slices 
that reference the original source?

If this problem is related to (im)mutability and If I understand 
you correctly you could use something like

static if (isImmutable!Source)
     node.text = source_text[i..j];
else
     node.text = source_text[i..j].idup;

right? Where in Pegged could this logic be injected?

Nov 26 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Reducing Pegged ASTs