digitalmars.D - Poll of the week: How should std.regex handle unknown escape
- "Marco Leise" <Marco.Leise gmx.de> Dec 02 2011
- "Jonathan M Davis" <jmdavisProg gmx.com> Dec 02 2011
- Jesse Phillips <jessekphillips+d gmail.com> Dec 02 2011
- Jonathan M Davis <jmdavisProg gmx.com> Dec 02 2011
- Xinok <xinok live.com> Dec 02 2011
- Kagamin <spam here.lot> Dec 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Dec 03 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Dec 03 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Dec 03 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Dec 03 2011
- David Nadlinger <see klickverbot.at> Dec 03 2011
- David Nadlinger <see klickverbot.at> Dec 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Dec 03 2011
- Michel Fortin <michel.fortin michelf.com> Dec 03 2011
- "Vladimir Panteleev" <vladimir thecybershadow.net> Dec 03 2011
- "Vladimir Panteleev" <vladimir thecybershadow.net> Dec 03 2011
- "Martin Nowak" <dawg dawgfoto.de> Dec 04 2011
- Jesse Phillips <jessekphillips+d gmail.com> Dec 04 2011
- Walter Bright <newshound2 digitalmars.com> Dec 04 2011
On Friday, December 02, 2011 23:33:34 Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Why wouldn't std.regex accept an escaped sequence such as "\."? I thought that the whole point of something like "\." was to make it so that you could use "." directly in spite of the fact that it means something special in regexes. Or is it something special to do with the fact that it's between brackets? I'd still have thought that it would just escape it, since it _is_ an escape sequence. Or is that the escape sequence isn't necessary in between the brackets, and so the question is how to handle it, since it isn't necessary? - Jonathan M Davis
Dec 02 2011
On Fri, 02 Dec 2011 17:59:59 -0500, Jonathan M Davis wrote:On Friday, December 02, 2011 23:33:34 Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Why wouldn't std.regex accept an escaped sequence such as "\."? I thought that the whole point of something like "\." was to make it so that you could use "." directly in spite of the fact that it means something special in regexes. Or is it something special to do with the fact that it's between brackets? I'd still have thought that it would just escape it, since it _is_ an escape sequence. Or is that the escape sequence isn't necessary in between the brackets, and so the question is how to handle it, since it isn't necessary? - Jonathan M Davis
Brackets being a character class, dot is used literally. So in this case was it meant to be: [\\.] or [.]
Dec 02 2011
On Saturday, December 03, 2011 02:35:21 Jesse Phillips wrote:On Fri, 02 Dec 2011 17:59:59 -0500, Jonathan M Davis wrote:On Friday, December 02, 2011 23:33:34 Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Why wouldn't std.regex accept an escaped sequence such as "\."? I thought that the whole point of something like "\." was to make it so that you could use "." directly in spite of the fact that it means something special in regexes. Or is it something special to do with the fact that it's between brackets? I'd still have thought that it would just escape it, since it _is_ an escape sequence. Or is that the escape sequence isn't necessary in between the brackets, and so the question is how to handle it, since it isn't necessary? - Jonathan M Davis
Brackets being a character class, dot is used literally. So in this case was it meant to be: [\\.] or [.]
Well, then if \. is not legal, I'd expect a static assertion failure or a template constraint failure if the string were given as a compile-time argument and an exception if it were given as a runtime argument. - Jonathan M Davis
Dec 02 2011
On 12/2/2011 5:33 PM, Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
I prefer that regexp engines are as consistent as possible. Everything I tested accepts this as a valid regular expression, so I think std.regex should as well.
Dec 02 2011
Marco Leise Wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Dec 03 2011
On 12/3/11 6:54 AM, Kagamin wrote:Marco Leise Wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
The dot inside a character set must not be escaped. Andrei
Dec 03 2011
On 03.12.2011 19:48, Andrei Alexandrescu wrote:On 12/3/11 6:54 AM, Kagamin wrote:Marco Leise Wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
The dot inside a character set must not be escaped. Andrei
And that breaks ehm ... e.g. rdmd ;) Anyhow I'm trying to pick a reasonable rule. 100% compatibility with old regex would mean ignore '\' where not applicable. My only concerns with it is future extensibility via \<character>. -- Dmitry Olshansky
Dec 03 2011
On 03.12.2011 16:54, Kagamin wrote:Marco Leise Wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Let's clarify this a bit. Well \. is more or less common outside of []. The question is more like: treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.
Dec 03 2011
On 03.12.2011 21:00, Vladimir Panteleev wrote:On Sat, 03 Dec 2011 17:51:13 +0200, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.
I think the common intuitive rules regarding escapes in regexes are as follows: 1) Unescaped punctuation usually has special meaning (so people often escape all punctuation literals) 2) Unescaped letters are literal 3) Escaped punctuation is literal 4) Escaped letters have special meaning
Therefore, I think that std.regex should throw on unrecognized *letter* escapes. It's very likely that the user might be trying to use a character class or feature from another regex engine, but unsupported by std.regex.
-- Dmitry Olshansky
Dec 03 2011
On 12/3/11 6:00 PM, Vladimir Panteleev wrote:I think the common intuitive rules regarding escapes in regexes are as follows: 1) Unescaped punctuation usually has special meaning (so people often escape all punctuation literals)
I am only a causal user of regexen, and I agree that this is what seems intuitive to me – in fact, I just used [^\(] to match non-brackets in my editor today. David
Dec 03 2011
On 12/3/11 8:14 PM, David Nadlinger wrote:I am only a causal user […]
Oh well, typing is hard.
Dec 03 2011
On 12/3/11 9:51 AM, Dmitry Olshansky wrote:On 03.12.2011 16:54, Kagamin wrote:Marco Leise Wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Let's clarify this a bit. Well \. is more or less common outside of []. The question is more like: treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.
Probably this is not a place to get innovative. Let's do what gramps Perl does. Andrei
Dec 03 2011
On 2011-12-02 22:33:34 +0000, "Marco Leise" <Marco.Leise gmx.de> said:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
I'd say, go with what other engines are doing. PCRE accepts them. I think POSIX does not, but does not allow any escaping inside a character class either. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 03 2011
On Sat, 03 Dec 2011 17:51:13 +0200, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.
I think the common intuitive rules regarding escapes in regexes are as follows: 1) Unescaped punctuation usually has special meaning (so people often escape all punctuation literals) 2) Unescaped letters are literal 3) Escaped punctuation is literal 4) Escaped letters have special meaning Therefore, I think that std.regex should throw on unrecognized *letter* escapes. It's very likely that the user might be trying to use a character class or feature from another regex engine, but unsupported by std.regex. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 03 2011
On Sat, 03 Dec 2011 19:00:44 +0200, Vladimir Panteleev <vladimir thecybershadow.net> wrote:escapes
For the context of my post, I meant "escaped" as "prefixed by a backslash". -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 03 2011
On Fri, 02 Dec 2011 23:33:34 +0100, Marco Leise <Marco.Leise gmx.de> wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
auto s = "[\.]"; => Error: undefined escape sequence. Do you actually mean r"[\.]" or `[\.]`?
Dec 04 2011
On Sun, 04 Dec 2011 19:53:59 +0100, Martin Nowak wrote:On Fri, 02 Dec 2011 23:33:34 +0100, Marco Leise <Marco.Leise gmx.de> wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
auto s = "[\.]"; => Error: undefined escape sequence. Do you actually mean r"[\.]" or `[\.]`?
He was referring to invalid regular expression escape sequences, so yes r"[\.]" would build a proper string and invalid regex (or more to be determined if it is invalid).
Dec 04 2011
On 12/2/2011 2:33 PM, Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
In general, behavior for things we don't know what to do with should be "failure". Then, when we do figure out what to do with it, we don't break existing code.
Dec 04 2011









"Jonathan M Davis" <jmdavisProg gmx.com> 