digitalmars.D - Best practices for parsing files

lurker (9/9) Jan 25 2007 Hi.

BCS (7/23) Jan 25 2007 Enki would be my choice if you don't mind using a code generator

lurker (11/11) Jan 25 2007 Both suggestions are very interesting and I'll be evaluating them; but w...

Lutger (4/8) Jan 25 2007 Somebody did that already, it's not been updated for a couple of months

lurker (3/6) Jan 25 2007 Didn't know that. Taking a look right now.

BCS (10/23) Jan 25 2007 Both should be thread safe (if you stick to one thread per file)

lurker (5/10) Jan 25 2007 Excellent! Is any of your code available? We really would like take a lo...

BCS (6/21) Jan 26 2007 That isn't how my lexer works (at the moment), I was just saying I think...

Sean Kelly (6/17) Jan 25 2007 The DMD lexer works pretty much this way, and it's available in every

lurker <lurker mailinator.com> writes:

Hi.

I'm new to D but not to programming. I would like to write a small scripting
engine using the great D programming language but I'm undecided on what
techniques should use to parse source files.

Since slices seem to be a central feature of D I was thinking on reading the
whole file in memory and use slices to build the syntax tree.

Does anyone have examples of parsing files using this method?

Any other methods I should consider?

Thanks.

Jan 25 2007

BCS <ao pathlink.com> writes:

Reply to lurker,

 Hi.
 
 I'm new to D but not to programming. I would like to write a small
 scripting engine using the great D programming language but I'm
 undecided on what techniques should use to parse source files.
 
 Since slices seem to be a central feature of D I was thinking on
 reading the whole file in memory and use slices to build the syntax
 tree.
 
 Does anyone have examples of parsing files using this method?
 
 Any other methods I should consider?
 
 Thanks.
 

Enki would be my choice if you don't mind using a code generator

http://www.dsource.org/projects/ddl/wiki/Enki

If you are feeling adventurous you can try dparse

http://www.dsource.org/projects/scrapple/browser/trunk/dparser/dparse.d

It's not vary mature but it's kind of fun to play with.
(full disclosure: I wrote dparse)

Jan 25 2007

lurker <lurker mailinator.com> writes:

Both suggestions are very interesting and I'll be evaluating them; but what I
was
hoping was something more on the line of DMD's parser (been insanely fast): A
hand-written parser. We also thought of translating it to D just as an exercise
to
learn how it works.

You see, one of my concerns (and the primary reason to use D) is parsing speed:
I'm going to parse lot's and lot's of those files and memory consumption almost
isn't an issue since we have lots of it.

Also, the tasks will be executed on a thread pool and we don't want to face
locking problems with code generated by some tool. At least if we write the code
we'll know who to blame. :D

Thanks.

Jan 25 2007

Lutger <lutger.blijdestijn gmail.com> writes:

lurker wrote:
 Both suggestions are very interesting and I'll be evaluating them; but what I
was
 hoping was something more on the line of DMD's parser (been insanely fast): A
 hand-written parser. We also thought of translating it to D just as an
exercise to
 learn how it works.

Somebody did that already, it's not been updated for a couple of months 
though:
http://www.dsource.org/projects/dparser

Jan 25 2007

lurker <lurker mailinator.com> writes:

Lutger wrote:
Somebody did that already, it's not been updated for a couple of months
though:
http://www.dsource.org/projects/dparser

Didn't know that. Taking a look right now.

Thanks.

Jan 25 2007

BCS <ao pathlink.com> writes:

Reply to lurker,

 Both suggestions are very interesting and I'll be evaluating them; but
 what I was hoping was something more on the line of DMD's parser (been
 insanely fast): A hand-written parser. We also thought of translating
 it to D just as an exercise to learn how it works.
 
 You see, one of my concerns (and the primary reason to use D) is
 parsing speed: I'm going to parse lot's and lot's of those files and
 memory consumption almost isn't an issue since we have lots of it.

Ah, then I guess you won't want an LL parser. 

 
 Also, the tasks will be executed on a thread pool and we don't want to
 face locking problems with code generated by some tool. At least if we
 write the code we'll know who to blame. :D
 

Both should be thread safe (if you stick to one thread per file)


As far as slicing goes, I'm working on a parser that read a file into memory 
(I guess it could mmap it in as well) and converts it to an array of token 
structs. A parser will then walk on the array. If you new a big array of 
struct in advance and have your lexer write directly to the array (slicing 
out of the file where the text is important, that should be fairly fast. 

That's my 2 cents, I'm not sure how much help this will be (my parser is 
/not/ performance driven) but I hope it might help.

Jan 25 2007

lurker <lurker mailinator.com> writes:

BCS wrote:
 As far as slicing goes, I'm working on a parser that read a file into memory
 (I guess it could mmap it in as well) and converts it to an array of token
 structs. A parser will then walk on the array. If you new a big array of
 struct in advance and have your lexer write directly to the array (slicing
 out of the file where the text is important, that should be fairly fast.

Excellent! Is any of your code available? We really would like take a look at
your
code (If possible). We are a little lost right now and by your description It
seams very much like what we want to build.

Thanks

Jan 25 2007

BCS <ao pathlink.com> writes:

Reply to lurker,

 BCS wrote:
 
 As far as slicing goes, I'm working on a parser that read a file into
 memory (I guess it could mmap it in as well) and converts it to an
 array of token structs. A parser will then walk on the array. If you
 new a big array of struct in advance and have your lexer write
 directly to the array (slicing out of the file where the text is
 important, that should be fairly fast.
 

 Excellent! Is any of your code available? We really would like take a
 look at your code (If possible). We are a little lost right now and by
 your description It seams very much like what we want to build.
 
 Thanks
 

That isn't how my lexer works (at the moment), I was just saying I think 
it could be done. In fact, my app copies everything to make sure that it 
doesn't stomp on it's self.

OTOH, it wouldn't be to hard to port it to what I described above, and I 
plan on posting the code when I get a bit closer to done.

Jan 26 2007

Sean Kelly <sean f4.ca> writes:

lurker wrote:
 Hi.
 
 I'm new to D but not to programming. I would like to write a small scripting
 engine using the great D programming language but I'm undecided on what
 techniques should use to parse source files.
 
 Since slices seem to be a central feature of D I was thinking on reading the
 whole file in memory and use slices to build the syntax tree.
 
 Does anyone have examples of parsing files using this method?

The DMD lexer works pretty much this way, and it's available in every 
DMD distribution :-)

 Any other methods I should consider?

This is the method I've used in the past, even in C++.  It seems to make 
for cleaner code than the allocate/copy method, and it's faster to boot.



Sean

Jan 25 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Best practices for parsing files