www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Parser

reply Cecil Ward <cecil cecilward.com> writes:
I’m thinking that I might had to end up writing a partial, rather 
rough parser for parts of the D language. Could I get some 
suggestions for help that I might find in the way of software 
components? D has a very powerful regex module, I believe.

I have been writing inline asm library routines for GDC as a 
learning exercise and unfortunately I can’t build them under LDC 
because LDC does not yet offer full support for the GCC in-line 
asm grammar, specifically named in-asm arguments such as " mov 
%[dest], %[src]" - where you see the names enclosed in [ ]. I’m 
thinking that I might have to fix this deficiency myself. There’s 
no way that I can enhance LDC myself as I wouldn’t even know 
where to start.

I could pre-process the string expressions used in inline asm so 
that LDC could understand an alternative easier grammar, one 
where there are numbers instead of "[names]", eg "%0" instead of 
the meaningful "%[dest]". It seems that the compilers take string 
_expressions_ everywhere rather than just simple literal strings.

Can I generate fragments of D and inject them into the rest of 
the code using mixin? Not really sure how to use it.

There are three string expressions involved: the string 
containing the asm, which needs to be scanned for %[ names ], and 
these need to be replaced with numbers in order of occurrence of 
declarations of the names, then an outputs section and an inputs 
section which can both contain declarations of these names, eg ‘: 
[ dest ] "=r" ( d-expression ) ,’ … ‘: [ src ]’…. The arbitrary 
fragment of D in d-expression can unfortunately be anything, and 
there’s no way I can write a full D lexer/parser to scan that 
properly,  but luckily I just have to pass over it to find its 
terminator which is either a ‘,’ or a ‘:’. (There might be a case 
where there is a ‘;’ as a terminator instead of a ‘:’, I’m not 
sure if that’s permitted in the grammar immediately after the 
inputs section.

But having to parse all the types of strings and operators in a 
string-expression is hard enough. I will also have to deal with 
all the possible comment types wherever they can occur, which is 
all over the place within, before and after these expressions.

Any tips, modules that I could use would be most welcome. I’m 
very much out of my depth here.
Jun 14 2023
parent Ben Jones <fake fake.fake> writes:
On Wednesday, 14 June 2023 at 09:28:57 UTC, Cecil Ward wrote:
 I’m thinking that I might had to end up writing a partial, 
 rather rough parser for parts of the D language. Could I get 
 some suggestions for help that I might find in the way of 
 software components? D has a very powerful regex module, I 
 believe.
A couple of pointers for general parsers: The Pegged library: https://github.com/PhilippeSigaud/Pegged/tree/master is pretty popular for building general parsers I have a rough implementation a similar idea here: https://github.com/benjones/autoparsed (definitely less polished and probably buggier than pegged). In mine you annotate your types with their syntax and can then call parse!MyType on a token stream to get a MyType.
Jun 15 2023