www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - RegEx for a simple Lexer

reply Tim Holzschuh via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
Hi there,
I read a book about an introduction to creating programming languages 
(really basic).

The sample code is written in Ruby, but I want to rewrite the examples in D.

However, the Lexer uses Ruby's regex features to scan the code.

I'm not very familiar with D's RegEx system (nor with another..), so it 
would be very helpful to receive some tips on how to "translate" the 
ruby RegEx's to D's implementation.

If in Ruby I have a string called src, I just can use this: 
src[/\A([A-Z]\w*)/, 1].

Would match( src, r"([A-Z]\w*)" ); essentially do the same?
(I know I have to use .captures to receive the found expression)

If I also want to create a RegEx to filter string-expressions a la " xyz 
", how would I do this?

At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I 
couldn't find in the Library Reference how to change it..

Sorry if these questions seem dumb to you..

Ahh, I forgot one:
In the book a parser generator like Yacc is used to create a suitable 
parser.
Is there an equivalent for D?
Or if not: is it really that hard to create a parser that is able to 
parse sth. like this:

// Example
class Foo:
     def name:
         "name"

     def asdf:
         100

foo = Foo.new

print( foo.nam )
print( foo.asdf )


Thank you for helping,
     Tim
May 13 2014
next sibling parent reply "anonymous" <anonymous example.com> writes:
On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
Digitalmars-d-learn wrote:
 If I also want to create a RegEx to filter string-expressions a 
 la " xyz ", how would I do this?

 At least match( src, r"^\" (.*) $\" " ); doesn't seem to work 
 and I couldn't find in the Library Reference how to change it..
That string literal is malformed. WYSIWYG strings (r"...") don't know escape sequences. So, the string ends at the second quote, and the rest is syntactical garbage to the compiler. "^\" (.*) $\" " would be a proper D string literal. You could also use the alternative WYSIWYG syntax: `^" (.*) $" ` That dollar sign looks off, though. It matches the end of the input. You probably want to put that at the end of the regex: "^\" (.*) \"$" Meaning: The match has to start at the beginning of the input (^). Matches a quote, then a space, then anything (.*), then a space, then a quote. The match has to end at the end of the input ($). Then again, when you're writing a tokenizer/parser, you usually don't require an expression to span the whole input, but just match as far as it goes. In that case, drop the dollar sign. And think about what happens when there are quotes in the payload.
May 13 2014
parent Ary Borenszweig <ary esperanto.org.ar> writes:
On 5/13/14, 5:43 PM, anonymous wrote:
 On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
 Digitalmars-d-learn wrote:
 If I also want to create a RegEx to filter string-expressions a la "
 xyz ", how would I do this?

 At least match( src, r"^\" (.*) $\" " ); doesn't seem to work and I
 couldn't find in the Library Reference how to change it..
I think he's confusing r"..." with a regular expression literal (I also confused them)
May 13 2014
prev sibling next sibling parent "Brian Schott" <briancschott gmail.com> writes:
On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via 
Digitalmars-d-learn wrote:
 Hi there,
 I read a book about an introduction to creating programming 
 languages (really basic).

 The sample code is written in Ruby, but I want to rewrite the 
 examples in D.

 However, the Lexer uses Ruby's regex features to scan the code.

 I'm not very familiar with D's RegEx system (nor with 
 another..), so it would be very helpful to receive some tips on 
 how to "translate" the ruby RegEx's to D's implementation.
You may find the following useful: http://hackerpilot.github.io/experimental/std_lexer/phobos/lexer.html The source of the lexer generator is located here: https://github.com/Hackerpilot/Dscanner/blob/master/std/lexer.d D lexer: https://github.com/Hackerpilot/Dscanner/blob/master/std/d/lexer.d There's also a parser and AST library for D in that same project. The lexer generator may not be as simple as what you're using right now, but it is very fast.
May 13 2014
prev sibling parent "Kagamin" <spam here.lot> writes:
On Tuesday, 13 May 2014 at 20:02:59 UTC, Tim Holzschuh via 
Digitalmars-d-learn wrote:
 Still: Would it be very difficult to write a suitable parser 
 from scratch?
See http://forum.dlang.org/post/lbnheh$2ssm$1 digitalmars.com with duscussion about parsers on reddit.
May 14 2014