www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - =?windows-1252?Q?std=2Eregex_literal_syntax_=28the_=5CQ=85=5CE_escape_sequenc?=

reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I'm reading through http://www.regular-expressions.info, and there's a
feature that's missing from std.regex, quoted:

-----
All the characters between the \Q and the \E are interpreted as
literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The
\E may be omitted at the end of the regex, so \Q*\d+* is the same as
\Q*\d+*\E.
-----

This would translate to the following needing to work (which fails at
runtime with an exception):

writeln(r"*\d+*".match(r"\Q*\d+*\E"));

Should this feature be added? I guess there's probably more regex
features missing (I just began reading the page), I'm not sure how
Dmitry feels about adding X number of features though.
Dec 18 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
18-Dec-2013 22:33, Andrej Mitrovic пишет:
 I'm reading through http://www.regular-expressions.info, and there's a
 feature that's missing from std.regex,
 quoted:

 -----
 All the characters between the \Q and the \E are interpreted as
 literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The
 \E may be omitted at the end of the regex, so \Q*\d+* is the same as
 \Q*\d+*\E.
[snip]
 Should this feature be added? I guess there's probably more regex
 features missing (I just began reading the page), I'm not sure how
 Dmitry feels about adding X number of features though.
All in all I wanted to be principled about what set of features to support. The initial design was: 1. Choose a syntax flavor (ECMAScript) 2. Add some powerful stuff (e.g. unlimited lookbehind, full unicode-support) 3. Add some convenient stuff that is popular enough/easy to implement (named captures). 4. Avoid extensions that complicate engine and preclude optimizations, or heavily depend on implementation. (So no recursion and similar madness) In that light 'missing' might be on purpose. For instance std.regex doesn't provide 'atomic'(possessive) groups simply because it's a kludge invented for poor (performance of) backtracking engines. By the end of day any feature is interesting as long as we carefully weight: - how useful a feature is - how widespread the syntax/how many precedents in other libraries against - how difficult to implement - does it affect backwards compatibility - any other hidden costs I'd be glad to implement well motivated enhancement requests. P.S. This reminds me to put a roadmap of sorts on where std.regex is going and what to expect. -- Dmitry Olshansky
Dec 18 2013
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 By the end of day any feature is interesting as long as we carefully
 weight:

 - how useful a feature is
 - how widespread the syntax/how many precedents in other libraries

 against

 - how difficult to implement
 - does it affect backwards compatibility
 - any other hidden costs

 I'd be glad to implement well motivated enhancement requests.
Excellent, that's what I'm hoping for from any library dev. Weigh the odds before adding random features. :)
Dec 18 2013
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 P.S. This reminds me to put a roadmap of sorts on where std.regex is
 going and what to expect.
Btw one thing I'm not fond of is the format specifiers, in particular: $` part of input preceding the match. $' part of input following the match. ` and ' are very hard to tell apart. But I guess this was based on an existing standard? Personally I'd prefer $< and $>.
Dec 18 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
18-Dec-2013 23:54, Andrej Mitrovic пишет:
 On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 P.S. This reminds me to put a roadmap of sorts on where std.regex is
 going and what to expect.
Btw one thing I'm not fond of is the format specifiers, in particular: $` part of input preceding the match. $' part of input following the match. ` and ' are very hard to tell apart. But I guess this was based on an existing standard? Personally I'd prefer $< and $>.
The precedent is Perl. A heavy influencer on the (former) std.regex design. http://perldoc.perl.org/perlre.html#Capture-groups (grep for $') Personally I'd prefer both simply gone :) Reasoning is that you can't support these while pattern matching on the fly (say on a network stream). Since we can't do that - anything better that is popular enough is acceptable. -- Dmitry Olshansky
Dec 18 2013
parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 The precedent is Perl. A heavy influencer on the (former) std.regex design.
 http://perldoc.perl.org/perlre.html#Capture-groups
 (grep for $')
Ah, classic Perl. Write once - don't bother to read ever again. :p
Dec 18 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
19-Dec-2013 01:05, Andrej Mitrovic пишет:
 On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 The precedent is Perl. A heavy influencer on the (former) std.regex design.
 http://perldoc.perl.org/perlre.html#Capture-groups
 (grep for $')
Ah, classic Perl. Write once - don't bother to read ever again. :p
Or rather - if it's so fast to (re)write, why bother reading at all? :) -- Dmitry Olshansky
Dec 18 2013