digitalmars.D - Poll of the week: How should std.regex handle unknown escape
- Marco Leise (1/1) Dec 02 2011 http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
- Jonathan M Davis (9/10) Dec 02 2011 Why wouldn't std.regex accept an escaped sequence such as "\."? I though...
- Jesse Phillips (4/17) Dec 02 2011 Brackets being a character class, dot is used literally. So in this case...
- Jonathan M Davis (5/24) Dec 02 2011 Well, then if \. is not legal, I'd expect a static assertion failure or ...
- Xinok (4/5) Dec 02 2011 I prefer that regexp engines are as consistent as possible. Everything I...
- Kagamin (2/3) Dec 03 2011 Erm... but "\." is a perfectly known escape sequence, so the question sh...
- Andrei Alexandrescu (3/6) Dec 03 2011 The dot inside a character set must not be escaped.
- Dmitry Olshansky (7/17) Dec 03 2011 And that breaks ehm ... e.g. rdmd ;)
- Dmitry Olshansky (6/9) Dec 03 2011 Let's clarify this a bit.
- Vladimir Panteleev (15/18) Dec 03 2011 I think the common intuitive rules regarding escapes in regexes are as
- Vladimir Panteleev (6/7) Dec 03 2011 For the context of my post, I meant "escaped" as "prefixed by a backslas...
- Dmitry Olshansky (5/21) Dec 03 2011 Yes, another point for not going for fully blind approach.
- David Nadlinger (5/9) Dec 03 2011 I am only a causal user of regexen, and I agree that this is what seems
- David Nadlinger (2/3) Dec 03 2011 Oh well, typing is hard.
- Andrei Alexandrescu (4/17) Dec 03 2011 Probably this is not a place to get innovative. Let's do what gramps
- Michel Fortin (8/9) Dec 03 2011 I'd say, go with what other engines are doing. PCRE accepts them. I
- Martin Nowak (3/4) Dec 04 2011 auto s = "[\.]"; => Error: undefined escape sequence.
- Jesse Phillips (4/11) Dec 04 2011 He was referring to invalid regular expression escape sequences, so yes
- Walter Bright (4/5) Dec 04 2011 In general, behavior for things we don't know what to do with should be
http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2
Dec 02 2011
On Friday, December 02, 2011 23:33:34 Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Why wouldn't std.regex accept an escaped sequence such as "\."? I thought that the whole point of something like "\." was to make it so that you could use "." directly in spite of the fact that it means something special in regexes. Or is it something special to do with the fact that it's between brackets? I'd still have thought that it would just escape it, since it _is_ an escape sequence. Or is that the escape sequence isn't necessary in between the brackets, and so the question is how to handle it, since it isn't necessary? - Jonathan M Davis
Dec 02 2011
On Fri, 02 Dec 2011 17:59:59 -0500, Jonathan M Davis wrote:On Friday, December 02, 2011 23:33:34 Marco Leise wrote:Brackets being a character class, dot is used literally. So in this case was it meant to be: [\\.] or [.]http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Why wouldn't std.regex accept an escaped sequence such as "\."? I thought that the whole point of something like "\." was to make it so that you could use "." directly in spite of the fact that it means something special in regexes. Or is it something special to do with the fact that it's between brackets? I'd still have thought that it would just escape it, since it _is_ an escape sequence. Or is that the escape sequence isn't necessary in between the brackets, and so the question is how to handle it, since it isn't necessary? - Jonathan M Davis
Dec 02 2011
On Saturday, December 03, 2011 02:35:21 Jesse Phillips wrote:On Fri, 02 Dec 2011 17:59:59 -0500, Jonathan M Davis wrote:Well, then if \. is not legal, I'd expect a static assertion failure or a template constraint failure if the string were given as a compile-time argument and an exception if it were given as a runtime argument. - Jonathan M DavisOn Friday, December 02, 2011 23:33:34 Marco Leise wrote:Brackets being a character class, dot is used literally. So in this case was it meant to be: [\\.] or [.]http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Why wouldn't std.regex accept an escaped sequence such as "\."? I thought that the whole point of something like "\." was to make it so that you could use "." directly in spite of the fact that it means something special in regexes. Or is it something special to do with the fact that it's between brackets? I'd still have thought that it would just escape it, since it _is_ an escape sequence. Or is that the escape sequence isn't necessary in between the brackets, and so the question is how to handle it, since it isn't necessary? - Jonathan M Davis
Dec 02 2011
On 12/2/2011 5:33 PM, Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2I prefer that regexp engines are as consistent as possible. Everything I tested accepts this as a valid regular expression, so I think std.regex should as well.
Dec 02 2011
Marco Leise Wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Dec 03 2011
On 12/3/11 6:54 AM, Kagamin wrote:Marco Leise Wrote:The dot inside a character set must not be escaped. Andreihttp://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Dec 03 2011
On 03.12.2011 19:48, Andrei Alexandrescu wrote:On 12/3/11 6:54 AM, Kagamin wrote:And that breaks ehm ... e.g. rdmd ;) Anyhow I'm trying to pick a reasonable rule. 100% compatibility with old regex would mean ignore '\' where not applicable. My only concerns with it is future extensibility via \<character>. -- Dmitry OlshanskyMarco Leise Wrote:The dot inside a character set must not be escaped. Andreihttp://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Dec 03 2011
On 03.12.2011 16:54, Kagamin wrote:Marco Leise Wrote:Let's clarify this a bit. Well \. is more or less common outside of []. The question is more like: treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Dec 03 2011
On Sat, 03 Dec 2011 17:51:13 +0200, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.I think the common intuitive rules regarding escapes in regexes are as follows: 1) Unescaped punctuation usually has special meaning (so people often escape all punctuation literals) 2) Unescaped letters are literal 3) Escaped punctuation is literal 4) Escaped letters have special meaning Therefore, I think that std.regex should throw on unrecognized *letter* escapes. It's very likely that the user might be trying to use a character class or feature from another regex engine, but unsupported by std.regex. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 03 2011
On Sat, 03 Dec 2011 19:00:44 +0200, Vladimir Panteleev <vladimir thecybershadow.net> wrote:escapesFor the context of my post, I meant "escaped" as "prefixed by a backslash". -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 03 2011
On 03.12.2011 21:00, Vladimir Panteleev wrote:On Sat, 03 Dec 2011 17:51:13 +0200, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Looks quite sane.treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.I think the common intuitive rules regarding escapes in regexes are as follows: 1) Unescaped punctuation usually has special meaning (so people often escape all punctuation literals) 2) Unescaped letters are literal 3) Escaped punctuation is literal 4) Escaped letters have special meaningTherefore, I think that std.regex should throw on unrecognized *letter* escapes. It's very likely that the user might be trying to use a character class or feature from another regex engine, but unsupported by std.regex.Yes, another point for not going for fully blind approach. -- Dmitry Olshansky
Dec 03 2011
On 12/3/11 6:00 PM, Vladimir Panteleev wrote:I think the common intuitive rules regarding escapes in regexes are as follows: 1) Unescaped punctuation usually has special meaning (so people often escape all punctuation literals)I am only a causal user of regexen, and I agree that this is what seems intuitive to me – in fact, I just used [^\(] to match non-brackets in my editor today. David
Dec 03 2011
On 12/3/11 8:14 PM, David Nadlinger wrote:I am only a causal user […]Oh well, typing is hard.
Dec 03 2011
On 12/3/11 9:51 AM, Dmitry Olshansky wrote:On 03.12.2011 16:54, Kagamin wrote:Probably this is not a place to get innovative. Let's do what gramps Perl does. AndreiMarco Leise Wrote:Let's clarify this a bit. Well \. is more or less common outside of []. The question is more like: treat every \<something> as plain <something> (ignoring \) inside character classes [] if it's not a known escape sequence like \w, \d, \uXXXX, \W, \cA -\cZ and so on.http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2Erm... but "\." is a perfectly known escape sequence, so the question should be "How should std.regex handle known escape sequences as in: "[\.]"".
Dec 03 2011
On 2011-12-02 22:33:34 +0000, "Marco Leise" <Marco.Leise gmx.de> said:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2I'd say, go with what other engines are doing. PCRE accepts them. I think POSIX does not, but does not allow any escaping inside a character class either. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 03 2011
On Fri, 02 Dec 2011 23:33:34 +0100, Marco Leise <Marco.Leise gmx.de> wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2auto s = "[\.]"; => Error: undefined escape sequence. Do you actually mean r"[\.]" or `[\.]`?
Dec 04 2011
On Sun, 04 Dec 2011 19:53:59 +0100, Martin Nowak wrote:On Fri, 02 Dec 2011 23:33:34 +0100, Marco Leise <Marco.Leise gmx.de> wrote:He was referring to invalid regular expression escape sequences, so yes r"[\.]" would build a proper string and invalid regex (or more to be determined if it is invalid).http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2auto s = "[\.]"; => Error: undefined escape sequence. Do you actually mean r"[\.]" or `[\.]`?
Dec 04 2011
On 12/2/2011 2:33 PM, Marco Leise wrote:http://www.easypolls.net/poll.html?p=4ed9478e4fb7b0e4886eeea2In general, behavior for things we don't know what to do with should be "failure". Then, when we do figure out what to do with it, we don't break existing code.
Dec 04 2011