digitalmars.D.learn - =?windows-1252?Q?std=2Eregex_literal_syntax_=28the_=5CQ=85=5CE_escape_sequenc?=
- Andrej Mitrovic (14/14) Dec 18 2013 I'm reading through http://www.regular-expressions.info, and there's a
- Dmitry Olshansky (25/36) Dec 18 2013 All in all I wanted to be principled about what set of features to
- Andrej Mitrovic (3/12) Dec 18 2013 Excellent, that's what I'm hoping for from any library dev. Weigh the
- Andrej Mitrovic (6/8) Dec 18 2013 Btw one thing I'm not fond of is the format specifiers, in particular:
- Dmitry Olshansky (10/18) Dec 18 2013 The precedent is Perl. A heavy influencer on the (former) std.regex desi...
- Andrej Mitrovic (2/5) Dec 18 2013 Ah, classic Perl. Write once - don't bother to read ever again. :p
- Dmitry Olshansky (4/9) Dec 18 2013 Or rather - if it's so fast to (re)write, why bother reading at all? :)
I'm reading through http://www.regular-expressions.info, and there's a feature that's missing from std.regex, quoted: ----- All the characters between the \Q and the \E are interpreted as literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The \E may be omitted at the end of the regex, so \Q*\d+* is the same as \Q*\d+*\E. ----- This would translate to the following needing to work (which fails at runtime with an exception): writeln(r"*\d+*".match(r"\Q*\d+*\E")); Should this feature be added? I guess there's probably more regex features missing (I just began reading the page), I'm not sure how Dmitry feels about adding X number of features though.
Dec 18 2013
18-Dec-2013 22:33, Andrej Mitrovic пишет:I'm reading through http://www.regular-expressions.info, and there's a feature that's missing from std.regex, quoted: ----- All the characters between the \Q and the \E are interpreted as literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The \E may be omitted at the end of the regex, so \Q*\d+* is the same as \Q*\d+*\E.[snip]Should this feature be added? I guess there's probably more regex features missing (I just began reading the page), I'm not sure how Dmitry feels about adding X number of features though.All in all I wanted to be principled about what set of features to support. The initial design was: 1. Choose a syntax flavor (ECMAScript) 2. Add some powerful stuff (e.g. unlimited lookbehind, full unicode-support) 3. Add some convenient stuff that is popular enough/easy to implement (named captures). 4. Avoid extensions that complicate engine and preclude optimizations, or heavily depend on implementation. (So no recursion and similar madness) In that light 'missing' might be on purpose. For instance std.regex doesn't provide 'atomic'(possessive) groups simply because it's a kludge invented for poor (performance of) backtracking engines. By the end of day any feature is interesting as long as we carefully weight: - how useful a feature is - how widespread the syntax/how many precedents in other libraries against - how difficult to implement - does it affect backwards compatibility - any other hidden costs I'd be glad to implement well motivated enhancement requests. P.S. This reminds me to put a roadmap of sorts on where std.regex is going and what to expect. -- Dmitry Olshansky
Dec 18 2013
On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:By the end of day any feature is interesting as long as we carefully weight: - how useful a feature is - how widespread the syntax/how many precedents in other libraries against - how difficult to implement - does it affect backwards compatibility - any other hidden costs I'd be glad to implement well motivated enhancement requests.Excellent, that's what I'm hoping for from any library dev. Weigh the odds before adding random features. :)
Dec 18 2013
On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:P.S. This reminds me to put a roadmap of sorts on where std.regex is going and what to expect.Btw one thing I'm not fond of is the format specifiers, in particular: $` part of input preceding the match. $' part of input following the match. ` and ' are very hard to tell apart. But I guess this was based on an existing standard? Personally I'd prefer $< and $>.
Dec 18 2013
18-Dec-2013 23:54, Andrej Mitrovic пишет:On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:The precedent is Perl. A heavy influencer on the (former) std.regex design. http://perldoc.perl.org/perlre.html#Capture-groups (grep for $') Personally I'd prefer both simply gone :) Reasoning is that you can't support these while pattern matching on the fly (say on a network stream). Since we can't do that - anything better that is popular enough is acceptable. -- Dmitry OlshanskyP.S. This reminds me to put a roadmap of sorts on where std.regex is going and what to expect.Btw one thing I'm not fond of is the format specifiers, in particular: $` part of input preceding the match. $' part of input following the match. ` and ' are very hard to tell apart. But I guess this was based on an existing standard? Personally I'd prefer $< and $>.
Dec 18 2013
On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:The precedent is Perl. A heavy influencer on the (former) std.regex design. http://perldoc.perl.org/perlre.html#Capture-groups (grep for $')Ah, classic Perl. Write once - don't bother to read ever again. :p
Dec 18 2013
19-Dec-2013 01:05, Andrej Mitrovic пишет:On 12/18/13, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Or rather - if it's so fast to (re)write, why bother reading at all? :) -- Dmitry OlshanskyThe precedent is Perl. A heavy influencer on the (former) std.regex design. http://perldoc.perl.org/perlre.html#Capture-groups (grep for $')Ah, classic Perl. Write once - don't bother to read ever again. :p
Dec 18 2013