digitalmars.D.learn - =?windows-1252?Q?std=2Eregex_literal_syntax_=28the_=5CQ=85=5CE_escape

digitalmars.D.learn - =?windows-1252?Q?std=2Eregex_literal_syntax_=28the_=5CQ=85=5CE_escape_sequenc?=

Andrej Mitrovic (14/14) Dec 18 2013 I'm reading through http://www.regular-expressions.info, and there's a

Dmitry Olshansky (25/36) Dec 18 2013 All in all I wanted to be principled about what set of features to

Andrej Mitrovic (3/12) Dec 18 2013 Excellent, that's what I'm hoping for from any library dev. Weigh the
Andrej Mitrovic (6/8) Dec 18 2013 Btw one thing I'm not fond of is the format specifiers, in particular:

Dmitry Olshansky (10/18) Dec 18 2013 The precedent is Perl. A heavy influencer on the (former) std.regex desi...

Andrej Mitrovic (2/5) Dec 18 2013 Ah, classic Perl. Write once - don't bother to read ever again. :p

Dmitry Olshansky (4/9) Dec 18 2013 Or rather - if it's so fast to (re)write, why bother reading at all? :)

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

I'm reading through http://www.regular-expressions.info, and there's a
feature that's missing from std.regex, quoted:

-----
All the characters between the \Q and the \E are interpreted as
literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The
\E may be omitted at the end of the regex, so \Q*\d+* is the same as
\Q*\d+*\E.
-----

This would translate to the following needing to work (which fails at
runtime with an exception):

writeln(r"*\d+*".match(r"\Q*\d+*\E"));

Should this feature be added? I guess there's probably more regex
features missing (I just began reading the page), I'm not sure how
Dmitry feels about adding X number of features though.

Dec 18 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

18-Dec-2013 22:33, Andrej Mitrovic пишет:
 I'm reading through http://www.regular-expressions.info, and there's a
 feature that's missing from std.regex,
 quoted:

 -----
 All the characters between the \Q and the \E are interpreted as
 literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The
 \E may be omitted at the end of the regex, so \Q*\d+* is the same as
 \Q*\d+*\E.

[snip]
 Should this feature be added? I guess there's probably more regex
 features missing (I just began reading the page), I'm not sure how
 Dmitry feels about adding X number of features though.

All in all I wanted to be principled about what set of features to 
support. The initial design was:
1. Choose a syntax flavor (ECMAScript)
2. Add some powerful stuff (e.g. unlimited lookbehind, full unicode-support)
3. Add some convenient stuff that is popular enough/easy to implement 
(named captures).
4. Avoid extensions that complicate engine and preclude optimizations, 
or heavily depend on implementation. (So no recursion and similar madness)

In that light 'missing' might be on purpose. For instance std.regex 
doesn't provide 'atomic'(possessive) groups simply because it's a kludge 
invented for poor (performance of) backtracking engines.

By the end of day any feature is interesting as long as we carefully weight:

- how useful a feature is
- how widespread the syntax/how many precedents in other libraries

against

- how difficult to implement
- does it affect backwards compatibility
- any other hidden costs

I'd be glad to implement well motivated enhancement requests.

P.S. This reminds me to put a roadmap of sorts on where std.regex is 
going and what to expect.

-- 
Dmitry Olshansky

Dec 18 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 By the end of day any feature is interesting as long as we carefully
 weight:

 - how useful a feature is
 - how widespread the syntax/how many precedents in other libraries

 against

 - how difficult to implement
 - does it affect backwards compatibility
 - any other hidden costs

 I'd be glad to implement well motivated enhancement requests.

Excellent, that's what I'm hoping for from any library dev. Weigh the
odds before adding random features. :)

Dec 18 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 P.S. This reminds me to put a roadmap of sorts on where std.regex is
 going and what to expect.

Btw one thing I'm not fond of is the format specifiers, in particular:

$` 	part of input preceding the match.
$' 	part of input following the match.

` and ' are very hard to tell apart. But I guess this was based on an
existing standard? Personally I'd prefer $< and $>.

Dec 18 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

18-Dec-2013 23:54, Andrej Mitrovic пишет:
 On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 P.S. This reminds me to put a roadmap of sorts on where std.regex is
 going and what to expect.

 Btw one thing I'm not fond of is the format specifiers, in particular:

 $` 	part of input preceding the match.
 $' 	part of input following the match.

 ` and ' are very hard to tell apart. But I guess this was based on an
 existing standard? Personally I'd prefer $< and $>.

The precedent is Perl. A heavy influencer on the (former) std.regex design.
http://perldoc.perl.org/perlre.html#Capture-groups
(grep for $')

Personally I'd prefer both simply gone :) Reasoning is that you can't 
support these while pattern matching on the fly (say on a network 
stream). Since we can't do that - anything better that is popular enough 
is acceptable.

-- 
Dmitry Olshansky

Dec 18 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 The precedent is Perl. A heavy influencer on the (former) std.regex design.
 http://perldoc.perl.org/perlre.html#Capture-groups
 (grep for $')

Ah, classic Perl. Write once - don't bother to read ever again. :p

Dec 18 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

19-Dec-2013 01:05, Andrej Mitrovic пишет:
 On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 The precedent is Perl. A heavy influencer on the (former) std.regex design.
 http://perldoc.perl.org/perlre.html#Capture-groups
 (grep for $')

 Ah, classic Perl. Write once - don't bother to read ever again. :p

Or rather - if it's so fast to (re)write, why bother reading at all? :)

-- 
Dmitry Olshansky

Dec 18 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - =?windows-1252?Q?std=2Eregex_literal_syntax_=28the_=5CQ=85=5CE_escape_sequenc?=