www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Best practices for parsing files

reply lurker <lurker mailinator.com> writes:
Hi.

I'm new to D but not to programming. I would like to write a small scripting
engine using the great D programming language but I'm undecided on what
techniques should use to parse source files.

Since slices seem to be a central feature of D I was thinking on reading the
whole file in memory and use slices to build the syntax tree.

Does anyone have examples of parsing files using this method?

Any other methods I should consider?

Thanks.
Jan 25 2007
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to lurker,

 Hi.
 
 I'm new to D but not to programming. I would like to write a small
 scripting engine using the great D programming language but I'm
 undecided on what techniques should use to parse source files.
 
 Since slices seem to be a central feature of D I was thinking on
 reading the whole file in memory and use slices to build the syntax
 tree.
 
 Does anyone have examples of parsing files using this method?
 
 Any other methods I should consider?
 
 Thanks.
 

Enki would be my choice if you don't mind using a code generator http://www.dsource.org/projects/ddl/wiki/Enki If you are feeling adventurous you can try dparse http://www.dsource.org/projects/scrapple/browser/trunk/dparser/dparse.d It's not vary mature but it's kind of fun to play with. (full disclosure: I wrote dparse)
Jan 25 2007
parent reply lurker <lurker mailinator.com> writes:
Both suggestions are very interesting and I'll be evaluating them; but what I
was
hoping was something more on the line of DMD's parser (been insanely fast): A
hand-written parser. We also thought of translating it to D just as an exercise
to
learn how it works.

You see, one of my concerns (and the primary reason to use D) is parsing speed:
I'm going to parse lot's and lot's of those files and memory consumption almost
isn't an issue since we have lots of it.

Also, the tasks will be executed on a thread pool and we don't want to face
locking problems with code generated by some tool. At least if we write the code
we'll know who to blame. :D

Thanks.
Jan 25 2007
next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
lurker wrote:
 Both suggestions are very interesting and I'll be evaluating them; but what I
was
 hoping was something more on the line of DMD's parser (been insanely fast): A
 hand-written parser. We also thought of translating it to D just as an
exercise to
 learn how it works.

Somebody did that already, it's not been updated for a couple of months though: http://www.dsource.org/projects/dparser
Jan 25 2007
parent lurker <lurker mailinator.com> writes:
Lutger wrote:
Somebody did that already, it's not been updated for a couple of months
though:
http://www.dsource.org/projects/dparser

Didn't know that. Taking a look right now. Thanks.
Jan 25 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to lurker,

 Both suggestions are very interesting and I'll be evaluating them; but
 what I was hoping was something more on the line of DMD's parser (been
 insanely fast): A hand-written parser. We also thought of translating
 it to D just as an exercise to learn how it works.
 
 You see, one of my concerns (and the primary reason to use D) is
 parsing speed: I'm going to parse lot's and lot's of those files and
 memory consumption almost isn't an issue since we have lots of it.

Ah, then I guess you won't want an LL parser.
 
 Also, the tasks will be executed on a thread pool and we don't want to
 face locking problems with code generated by some tool. At least if we
 write the code we'll know who to blame. :D
 

Both should be thread safe (if you stick to one thread per file) As far as slicing goes, I'm working on a parser that read a file into memory (I guess it could mmap it in as well) and converts it to an array of token structs. A parser will then walk on the array. If you new a big array of struct in advance and have your lexer write directly to the array (slicing out of the file where the text is important, that should be fairly fast. That's my 2 cents, I'm not sure how much help this will be (my parser is /not/ performance driven) but I hope it might help.
Jan 25 2007
parent reply lurker <lurker mailinator.com> writes:
BCS wrote:
 As far as slicing goes, I'm working on a parser that read a file into memory
 (I guess it could mmap it in as well) and converts it to an array of token
 structs. A parser will then walk on the array. If you new a big array of
 struct in advance and have your lexer write directly to the array (slicing
 out of the file where the text is important, that should be fairly fast.

Excellent! Is any of your code available? We really would like take a look at your code (If possible). We are a little lost right now and by your description It seams very much like what we want to build. Thanks
Jan 25 2007
parent BCS <ao pathlink.com> writes:
Reply to lurker,

 BCS wrote:
 
 As far as slicing goes, I'm working on a parser that read a file into
 memory (I guess it could mmap it in as well) and converts it to an
 array of token structs. A parser will then walk on the array. If you
 new a big array of struct in advance and have your lexer write
 directly to the array (slicing out of the file where the text is
 important, that should be fairly fast.
 

look at your code (If possible). We are a little lost right now and by your description It seams very much like what we want to build. Thanks

That isn't how my lexer works (at the moment), I was just saying I think it could be done. In fact, my app copies everything to make sure that it doesn't stomp on it's self. OTOH, it wouldn't be to hard to port it to what I described above, and I plan on posting the code when I get a bit closer to done.
Jan 26 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
lurker wrote:
 Hi.
 
 I'm new to D but not to programming. I would like to write a small scripting
 engine using the great D programming language but I'm undecided on what
 techniques should use to parse source files.
 
 Since slices seem to be a central feature of D I was thinking on reading the
 whole file in memory and use slices to build the syntax tree.
 
 Does anyone have examples of parsing files using this method?

The DMD lexer works pretty much this way, and it's available in every DMD distribution :-)
 Any other methods I should consider?

This is the method I've used in the past, even in C++. It seems to make for cleaner code than the allocate/copy method, and it's faster to boot. Sean
Jan 25 2007