digitalmars.D - Formal Review of std.regex (FReD)
- Jesse Phillips <jessekphillips+d gmail.com> Oct 08 2011
- Walter Bright <newshound2 digitalmars.com> Oct 08 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 08 2011
- Walter Bright <newshound2 digitalmars.com> Oct 08 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 08 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 08 2011
- Christian Kamm <kamm-incasoftware removethis.de> Oct 09 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 09 2011
- Christian Kamm <kamm-incasoftware removethis.de> Oct 09 2011
- Christian Kamm <kamm-incasoftware removethis.de> Oct 09 2011
- Christian Kamm <kamm-incasoftware removethis.de> Oct 09 2011
- Jacob Carlborg <doob me.com> Oct 09 2011
- Brad Roberts <braddr puremagic.com> Oct 09 2011
- Jacob Carlborg <doob me.com> Oct 09 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 09 2011
- Jacob Carlborg <doob me.com> Oct 09 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 09 2011
- Jacob Carlborg <doob me.com> Oct 09 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 09 2011
- Jacob Carlborg <doob me.com> Oct 09 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 10 2011
- Alix Pexton <alix.DOT.pexton gmail.DOT.com> Oct 09 2011
- Jerry <jlquinn optonline.net> Oct 11 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 12 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 12 2011
- kennytm <kennytm gmail.com> Oct 12 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 12 2011
- kennytm <kennytm gmail.com> Oct 12 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Oct 12 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 13 2011
- Jesse Phillips <jessekphillips+d gmail.com> Oct 12 2011
- Jacob Carlborg <doob me.com> Oct 12 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 15 2011
- Jesse Phillips <jessekphillips+d gmail.com> Oct 22 2011
- Rainer Schuetze <r.sagitario gmx.de> Oct 22 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 22 2011
- Rainer Schuetze <r.sagitario gmx.de> Oct 23 2011
- Dmitry Olshansky <dmitry.olsh gmail.com> Oct 23 2011
- Rainer Schuetze <r.sagitario gmx.de> Oct 24 2011
- Fawzi Mohamed <fawzi gmx.ch> Oct 22 2011
- "Marco Leise" <Marco.Leise gmx.de> Oct 24 2011
Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/ Code: https://github.com/blackwhale/phobos MASTER Package of FReD: https://github.com/downloads/blackwhale/FReD/FReD.zip Remember this will be replacing the current std.regex and is intended to be a drop in replacement. This project is also part of GSoC. Dmitry, I ask that you apply this patch to posix.mak (adding to internal modules). --- a/posix.mak +++ b/posix.mak -184,7 +184,8 std/c/, fenv locale math process stdarg stddef stdio stdlib time wcharh) EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix \ std/internal/math/, biguintcore biguintnoasm biguintx86 \ - gammafunction errorfunction) std/internal/processinit + gammafunction errorfunction) std/internal/processinit \ + std/internal/uni std/internal/uni_tab # Aggregate all D modules relevant to this build D_MODULES = crc32 $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)
Oct 08 2011
On 10/8/2011 12:56 PM, Jesse Phillips wrote:Doc: http://nascent.freeshell.org/fred/doc/
1. There are many different regular expressions for strings. Should include a link to whichever one fred uses. Feel free to crib from http://www.digitalmars.com/ctg/regular.html 2. Many of the examples can be wrapped in a void main(){ ... } so that they are compilable using cut & paste. 3. "Advanced Syntax" and other headings need to be bold faced.
Oct 08 2011
On 09.10.2011 0:30, Walter Bright wrote:On 10/8/2011 12:56 PM, Jesse Phillips wrote:Doc: http://nascent.freeshell.org/fred/doc/
1. There are many different regular expressions for strings. Should include a link to whichever one fred uses. Feel free to crib from http://www.digitalmars.com/ctg/regular.html
While I do mention ECMA-262 falvor, I agree a table right there is far more preferable. Will do.2. Many of the examples can be wrapped in a void main(){ ... } so that they are compilable using cut & paste.
Indeed, I just though it wasn't phobos style. Now looking through again I see there are a lot of examples with void main().3. "Advanced Syntax" and other headings need to be bold faced.
Right, thanks. -- Dmitry Olshansky
Oct 08 2011
On 10/8/2011 1:43 PM, Dmitry Olshansky wrote:Right, thanks.
Welcs. And I might add that I do greatly appreciate the work you've done on this, I think it could be a showcase for D's capabilities.
Oct 08 2011
On 08.10.2011 23:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/ Code: https://github.com/blackwhale/phobos MASTER Package of FReD: https://github.com/downloads/blackwhale/FReD/FReD.zip Remember this will be replacing the current std.regex and is intended to be a drop in replacement. This project is also part of GSoC. Dmitry, I ask that you apply this patch to posix.mak (adding to internal modules). --- a/posix.mak +++ b/posix.mak -184,7 +184,8 std/c/, fenv locale math process stdarg stddef stdio stdlib time wcharh) EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix \ std/internal/math/, biguintcore biguintnoasm biguintx86 \ - gammafunction errorfunction) std/internal/processinit + gammafunction errorfunction) std/internal/processinit \ + std/internal/uni std/internal/uni_tab # Aggregate all D modules relevant to this build D_MODULES = crc32 $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)
Thanks, updated and now it works on linux for me. Though it wasn't that simple. I've found out what caused my builds to break. The thing is that both std.file & std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible. So I changed it to core.stdc in std.file and added static import to std.stdio (some functions from std.c are not present in core.stdc apparently). If there is any problem with that I can revert it, and investigate why it affects only me ;) -- Dmitry Olshansky
Oct 08 2011
On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file & std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)? Andrei
Oct 08 2011
Andrei Alexandrescu wrote:On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file & std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?
It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }
Oct 09 2011
On 10/9/11 2:26 AM, Christian Kamm wrote:Andrei Alexandrescu wrote:On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file& std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?
It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }
Hm, this is important. But what is the contribution of a.d to the example? Do you compile it together with b.d? Andrei
Oct 09 2011
Andrei Alexandrescu wrote:On 10/9/11 2:26 AM, Christian Kamm wrote:Andrei Alexandrescu wrote:On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file& std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?
It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }
Hm, this is important. But what is the contribution of a.d to the example? Do you compile it together with b.d?
Yes, 'dmd b.d' fails, 'dmd a.d b.d' succeeds.
Oct 09 2011
Brad Roberts wrote:Isn't this bug #314? Very well known, super old, highly voted for, etc, etc.
No, bug 314 is about privately imported symbols being accessible even though they shouldn't be. This problem is about modules that aren't imported at all in a file or any of its imports still being accessible. Btw: I've updated my pull request to fix #314 to apply cleanly against the current dmd/master: https://github.com/D-Programming-Language/dmd/pull/190
Oct 09 2011
Christian Kamm wrote:Andrei Alexandrescu wrote:On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file & std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?
It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible.
Heh, I actually reported it a while ago and then forgot about it. :) http://d.puremagic.com/issues/show_bug.cgi?id=6307
Oct 09 2011
On 2011-10-08 23:37, Andrei Alexandrescu wrote:On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file & std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)? Andrei
I think it's a bug, but sometimes it can be useful. -- /Jacob Carlborg
Oct 09 2011
On 10/9/2011 12:30 AM, Andrei Alexandrescu wrote:On 10/9/11 2:26 AM, Christian Kamm wrote:Andrei Alexandrescu wrote:On 10/8/11 3:34 PM, Dmitry Olshansky wrote:I've found out what caused my builds to break. The thing is that both std.file& std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible.
That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?
It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }
Hm, this is important. But what is the contribution of a.d to the example? Do you compile it together with b.d? Andrei
Isn't this bug #314? Very well known, super old, highly voted for, etc, etc.
Oct 09 2011
On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs. -- /Jacob Carlborg
Oct 09 2011
On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar. -- Dmitry Olshansky
Oct 09 2011
On 2011-10-09 16:09, Dmitry Olshansky wrote:On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.
I don't think the documentation should refer to RegEx if it's not defined in the docs. -- /Jacob Carlborg
Oct 09 2011
On 09.10.2011 18:49, Jacob Carlborg wrote:On 2011-10-09 16:09, Dmitry Olshansky wrote:On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.
I don't think the documentation should refer to RegEx if it's not defined in the docs.
-- Dmitry Olshansky
Oct 09 2011
On 2011-10-09 17:01, Dmitry Olshansky wrote:On 09.10.2011 18:49, Jacob Carlborg wrote:On 2011-10-09 16:09, Dmitry Olshansky wrote:On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.
I don't think the documentation should refer to RegEx if it's not defined in the docs.
The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well? -- /Jacob Carlborg
Oct 09 2011
On 09.10.2011 19:09, Jacob Carlborg wrote:On 2011-10-09 17:01, Dmitry Olshansky wrote:On 09.10.2011 18:49, Jacob Carlborg wrote:On 2011-10-09 16:09, Dmitry Olshansky wrote:On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.
I don't think the documentation should refer to RegEx if it's not defined in the docs.
The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well?
No, that's what I tried to point out but failed obviously. The thing is that it is a templated parameter and due to constraint it could be either StaticRegex!Char or Regex!Char. They represent pattern compiled as machine code or bytecode respectively for character width of Char. All of the 6 versions of compiled patterns in the end do not have a common type nor one is technically possible (w/o some quite bad performance trade offs). -- Dmitry Olshansky
Oct 09 2011
On 2011-10-09 17:29, Dmitry Olshansky wrote:On 09.10.2011 19:09, Jacob Carlborg wrote:On 2011-10-09 17:01, Dmitry Olshansky wrote:On 09.10.2011 18:49, Jacob Carlborg wrote:On 2011-10-09 16:09, Dmitry Olshansky wrote:On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.
I don't think the documentation should refer to RegEx if it's not defined in the docs.
The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well?
No, that's what I tried to point out but failed obviously. The thing is that it is a templated parameter and due to constraint it could be either StaticRegex!Char or Regex!Char. They represent pattern compiled as machine code or bytecode respectively for character width of Char. All of the 6 versions of compiled patterns in the end do not have a common type nor one is technically possible (w/o some quite bad performance trade offs).
Aha, ok, I see. Could RegEx be explained in the docs so it won't cause further confusion? -- /Jacob Carlborg
Oct 09 2011
On 09.10.2011 22:47, Jacob Carlborg wrote:On 2011-10-09 17:29, Dmitry Olshansky wrote:On 09.10.2011 19:09, Jacob Carlborg wrote:On 2011-10-09 17:01, Dmitry Olshansky wrote:On 09.10.2011 18:49, Jacob Carlborg wrote:On 2011-10-09 16:09, Dmitry Olshansky wrote:On 09.10.2011 14:33, Jacob Carlborg wrote:On 2011-10-08 21:56, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/
What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.
RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.
I don't think the documentation should refer to RegEx if it's not defined in the docs.
The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well?
No, that's what I tried to point out but failed obviously. The thing is that it is a templated parameter and due to constraint it could be either StaticRegex!Char or Regex!Char. They represent pattern compiled as machine code or bytecode respectively for character width of Char. All of the 6 versions of compiled patterns in the end do not have a common type nor one is technically possible (w/o some quite bad performance trade offs).
Aha, ok, I see. Could RegEx be explained in the docs so it won't cause further confusion?
I guess putting "The RegEx parameter can be either Regex!Char or StaticRegex!Char depending on the actual type of pattern passed" all over the place won't cut it. Placing it somewhere on the top has disadvantage of lacking any prior context, and most users will miss it anyway. Maybe I'll just add Params: section with short description to all functions that still lack one. -- Dmitry Olshansky
Oct 10 2011
I've not had a proper look at the code yet, but I recall from when I read the docs during the pre-review period that the introduction was a little on the informal side. It doesn't seem to have changed since then, and IMHO the introduction/description needs a bit of a polish to bring it up to the standard that is required of official documentation. I'll be busy over the next few weeks, but I will try to make time to assemble some more specific comments. I just wanted to let you know that I thought the docs needed some work, just in case. A...
Oct 09 2011
I have 2 thoughts. 1) Minor doc typo: Long form for hex notation should be \U00YYYYYY. 2) Unicode set syntax If you're going to provide unicode set support, why not use ICU syntax rather than invent another one? Jerry
Oct 11 2011
On 12.10.2011 0:04, Jerry wrote:I have 2 thoughts. 1) Minor doc typo: Long form for hex notation should be \U00YYYYYY.
Yeah, \U it is.2) Unicode set syntax If you're going to provide unicode set support, why not use ICU syntax rather than invent another one?
Looks like I was tricked by their technical standard then. I can't immediately recall where this syntax was ever used but: http://unicode.org/reports/tr18/#Subtraction_and_Intersection The prime reason cited here is that e.g. '--' is (almost) unambigious with range notation '-' and also allows to skip [] where applicable [\p{letter}--a-z] vs [[\p{letter}]-[a-z]]. Come to think of it '--' is cleaner in this case.Jerry
-- Dmitry Olshansky
Oct 12 2011
Fresh version of documentation is here: http://blackwhale.github.com/ This fixes all typos reported so far, adds missing overload of replace (ouch!) and introduces a brand new syntax table. -- Dmitry Olshansky
Oct 12 2011
Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Fresh version of documentation is here: http://blackwhale.github.com/ This fixes all typos reported so far, adds missing overload of replace (ouch!) and introduces a brand new syntax table.
The '.' really matches any character, including the new line '\n'?
Oct 12 2011
On 12.10.2011 23:32, kennytm wrote:Dmitry Olshansky<dmitry.olsh gmail.com> wrote:Fresh version of documentation is here: http://blackwhale.github.com/ This fixes all typos reported so far, adds missing overload of replace (ouch!) and introduces a brand new syntax table.
The '.' really matches any character, including the new line '\n'?
Hm, yes. Is that a problem? -- Dmitry Olshansky
Oct 12 2011
Dmitry Olshansky <dmitry.olsh gmail.com> wrote:On 12.10.2011 23:32, kennytm wrote:Dmitry Olshansky<dmitry.olsh gmail.com> wrote:Fresh version of documentation is here: http://blackwhale.github.com/ This fixes all typos reported so far, adds missing overload of replace (ouch!) and introduces a brand new syntax table.
The '.' really matches any character, including the new line '\n'?
Hm, yes. Is that a problem?
Most regex flavors don't match '\n' by default unless you supply the "s" flag -- including ECMAScript (well it doesn't even provide the "s" flag to allow '.' to match all characters). While I am OK with having "s" turned on by default, this should at least be documented explicitly.
Oct 12 2011
On 10/12/11 9:50 PM, Jesse Phillips wrote:On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:Most regex flavors don't match '\n' by default unless you supply the "s" flag -- including ECMAScript (well it doesn't even provide the "s" flag to allow '.' to match all characters).
Really? Sense when? I didn't know there was any that didn't match \n. If you want to match everything not a new line [^\n].
Kenny's right. http://www.regular-expressions.info/dot.html Engines have special options for multiline. Andrei
Oct 12 2011
On 13.10.2011 8:38, Andrei Alexandrescu wrote:On 10/12/11 9:50 PM, Jesse Phillips wrote:On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:Most regex flavors don't match '\n' by default unless you supply the "s" flag -- including ECMAScript (well it doesn't even provide the "s" flag to allow '.' to match all characters).
Really? Sense when? I didn't know there was any that didn't match \n. If you want to match everything not a new line [^\n].
Kenny's right. http://www.regular-expressions.info/dot.html Engines have special options for multiline.
The funny thing is that multiline mode affects only ^ & $ anchors. And single line mode affects only . matches \r and \n rule. So it's entirely possible to use both at the same time. But anyway I guess I have to bite the bullet: add 's' option and introduce classic semantics by default. BTW in unicode end of line is much more then just \r or \n and among other things includes "unbreakable" two codepoint sequence '\r\n'. I wonder if any engine matches . in the middle of \r\n or do they detect stop on any other end-of-line characters. -- Dmitry Olshansky
Oct 13 2011
On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:Most regex flavors don't match '\n' by default unless you supply the "s" flag -- including ECMAScript (well it doesn't even provide the "s" flag to allow '.' to match all characters).
Really? Sense when? I didn't know there was any that didn't match \n. If you want to match everything not a new line [^\n].
Oct 12 2011
On 2011-10-12 21:41, Dmitry Olshansky wrote:On 12.10.2011 23:32, kennytm wrote:Dmitry Olshansky<dmitry.olsh gmail.com> wrote:Fresh version of documentation is here: http://blackwhale.github.com/ This fixes all typos reported so far, adds missing overload of replace (ouch!) and introduces a brand new syntax table.
The '.' really matches any character, including the new line '\n'?
Hm, yes. Is that a problem?
Shouldn't "." exclude newlines? I think this is a good reference: http://www.regular-expressions.info/reference.html Which says: Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too. -- /Jacob Carlborg
Oct 12 2011
On 12.10.2011 22:17, Dmitry Olshansky wrote:Fresh version of documentation is here: http://blackwhale.github.com/ This fixes all typos reported so far, adds missing overload of replace (ouch!) and introduces a brand new syntax table.
Updated, with single-line mode and a few documentation fixes. Source code is still here: https://github.com/blackwhale/phobos -- Dmitry Olshansky
Oct 15 2011
Please note that the review will be ending this weekend in just 32 hours. At which point voting will begin, please do not wait for voting to criticize the library. Updating Documentation: http://blackwhale.github.com/ On Sat, 08 Oct 2011 19:56:32 +0000, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/ Code: https://github.com/blackwhale/phobos MASTER Package of FReD: https://github.com/downloads/blackwhale/FReD/FReD.zip Remember this will be replacing the current std.regex and is intended to be a drop in replacement. This project is also part of GSoC. Dmitry, I ask that you apply this patch to posix.mak (adding to internal modules). --- a/posix.mak +++ b/posix.mak -184,7 +184,8 std/c/, fenv locale math process stdarg stddef stdio stdlib time wcharh) EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix \ std/internal/math/, biguintcore biguintnoasm biguintx86 \ - gammafunction errorfunction) std/internal/processinit + gammafunction errorfunction) std/internal/processinit \ + std/internal/uni std/internal/uni_tab # Aggregate all D modules relevant to this build D_MODULES = crc32 $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)
Oct 22 2011
I haven't followed the discussion closely, and I cannot really comment on the core regex functionality, but I did actually use FReD as a replacement of a buggy std.regex once. In that case I wanted to have a lazily created static regex, but I did not find an official way to test whether a Regex has been initialized: static Regex!char re; if(!isInitializedRE(re)) re = regex(r"^(.*)\(([0-9]+)\):(.*)$"); So I implemented isInitializedRE() as "re.ir !is null" for std.regex and "re.captures() > 0" for fred, but that fails for being a "drop-in replacement". I think, both versions use implementation specifics, maybe there should be a documented way to test for being initialized. I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears twice in the documentation, same for "bmatch". I guess they should not appear together with the string versions. Rainer On 22.10.2011 18:21, Jesse Phillips wrote:Please note that the review will be ending this weekend in just 32 hours. At which point voting will begin, please do not wait for voting to criticize the library. Updating Documentation: http://blackwhale.github.com/ On Sat, 08 Oct 2011 19:56:32 +0000, Jesse Phillips wrote:Hello everyone, I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week. Please note that you can try FRed as part of Phobos (Code) or by itself (Package of FReD) which includes docs. Doc: http://nascent.freeshell.org/fred/doc/ Code: https://github.com/blackwhale/phobos MASTER Package of FReD: https://github.com/downloads/blackwhale/FReD/FReD.zip Remember this will be replacing the current std.regex and is intended to be a drop in replacement. This project is also part of GSoC. Dmitry, I ask that you apply this patch to posix.mak (adding to internal modules). --- a/posix.mak +++ b/posix.mak -184,7 +184,8 std/c/, fenv locale math process stdarg stddef stdio stdlib time wcharh) EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix \ std/internal/math/, biguintcore biguintnoasm biguintx86 \ - gammafunction errorfunction) std/internal/processinit + gammafunction errorfunction) std/internal/processinit \ + std/internal/uni std/internal/uni_tab # Aggregate all D modules relevant to this build D_MODULES = crc32 $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)
Oct 22 2011
On 22.10.2011 20:56, Rainer Schuetze wrote:I haven't followed the discussion closely, and I cannot really comment on the core regex functionality, but I did actually use FReD as a replacement of a buggy std.regex once. In that case I wanted to have a lazily created static regex, but I did not find an official way to test whether a Regex has been initialized: static Regex!char re; if(!isInitializedRE(re)) re = regex(r"^(.*)\(([0-9]+)\):(.*)$"); So I implemented isInitializedRE() as "re.ir !is null" for std.regex and "re.captures() > 0" for fred, but that fails for being a "drop-in replacement".
Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.I think, both versions use implementation specifics, maybe there should be a documented way to test for being initialized.
Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears twice in the documentation, same for "bmatch". I guess they should not appear together with the string versions.
I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion. Maybe it would be better to just combine docs together, and leave one overload undocumented. -- Dmitry Olshansky
Oct 22 2011
On 22.10.2011 21:05, Dmitry Olshansky wrote:On 22.10.2011 20:56, Rainer Schuetze wrote:I haven't followed the discussion closely, and I cannot really comment on the core regex functionality, but I did actually use FReD as a replacement of a buggy std.regex once. In that case I wanted to have a lazily created static regex, but I did not find an official way to test whether a Regex has been initialized: static Regex!char re; if(!isInitializedRE(re)) re = regex(r"^(.*)\(([0-9]+)\):(.*)$"); So I implemented isInitializedRE() as "re.ir !is null" for std.regex and "re.captures() > 0" for fred, but that fails for being a "drop-in replacement".
Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.I think, both versions use implementation specifics, maybe there should be a documented way to test for being initialized.
Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }
I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears twice in the documentation, same for "bmatch". I guess they should not appear together with the string versions.
I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion. Maybe it would be better to just combine docs together, and leave one overload undocumented.
As RegEx is a template argument here, it can stand for both Regex and StaticRegex, and that should be mentioned. Whether it has two different implementations is an implementation detail that does not need to bother the user. If you want to keep the second entries, I'd recommend renaming the argument to StaticRegEx.
Oct 23 2011
On 23.10.2011 11:28, Rainer Schuetze wrote:On 22.10.2011 21:05, Dmitry Olshansky wrote:On 22.10.2011 20:56, Rainer Schuetze wrote:I haven't followed the discussion closely, and I cannot really comment on the core regex functionality, but I did actually use FReD as a replacement of a buggy std.regex once. In that case I wanted to have a lazily created static regex, but I did not find an official way to test whether a Regex has been initialized: static Regex!char re; if(!isInitializedRE(re)) re = regex(r"^(.*)\(([0-9]+)\):(.*)$"); So I implemented isInitializedRE() as "re.ir !is null" for std.regex and "re.captures() > 0" for fred, but that fails for being a "drop-in replacement".
Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.I think, both versions use implementation specifics, maybe there should be a documented way to test for being initialized.
Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }
I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.
"" is a valid regex that matches anywhere, with global flag it will match before any codepoint + once at end. I'm not sure using 'valid' is good, it may mislead user to check it all over the place e.g.: auto r = regex("blah"); if(r.valid()) ...I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears twice in the documentation, same for "bmatch". I guess they should not appear together with the string versions.
I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion. Maybe it would be better to just combine docs together, and leave one overload undocumented.
As RegEx is a template argument here, it can stand for both Regex and StaticRegex, and that should be mentioned. Whether it has two different implementations is an implementation detail that does not need to bother the user.
OK, will do.If you want to keep the second entries, I'd recommend renaming the argument to StaticRegEx.
-- Dmitry Olshansky
Oct 23 2011
On 23.10.2011 17:46, Dmitry Olshansky wrote:On 23.10.2011 11:28, Rainer Schuetze wrote:On 22.10.2011 21:05, Dmitry Olshansky wrote:On 22.10.2011 20:56, Rainer Schuetze wrote:I haven't followed the discussion closely, and I cannot really comment on the core regex functionality, but I did actually use FReD as a replacement of a buggy std.regex once. In that case I wanted to have a lazily created static regex, but I did not find an official way to test whether a Regex has been initialized: static Regex!char re; if(!isInitializedRE(re)) re = regex(r"^(.*)\(([0-9]+)\):(.*)$"); So I implemented isInitializedRE() as "re.ir !is null" for std.regex and "re.captures() > 0" for fred, but that fails for being a "drop-in replacement".
Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.I think, both versions use implementation specifics, maybe there should be a documented way to test for being initialized.
Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }
I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.
"" is a valid regex that matches anywhere, with global flag it will match before any codepoint + once at end. I'm not sure using 'valid' is good, it may mislead user to check it all over the place e.g.: auto r = regex("blah"); if(r.valid()) ....
You may be right. Maybe 'initialized', otherwise 'empty' isn't too bad as well. But I think it should be explicite, so I would not add opCast to bool.
Oct 24 2011
On Oct 22, 2011, at 12:05 PM, Dmitry Olshansky wrote:On 22.10.2011 20:56, Rainer Schuetze wrote:[=85] I think, both versions use implementation specifics, maybe there =
be a documented way to test for being initialized. =20
Definitely. How about adding an empty property + opCast to bool, with =
if(!re) { //create re }
I think this is better, should one ever want to switch to plain = pointer=85, also you need less thinking if it works like for classes.and a bit more verbose: if(re.empty) { //create re }
Oct 22 2011
Am 22.10.2011, 21:05 Uhr, schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re }
It is nice that you *can* do this,and a bit more verbose: if(re.empty) { //create re }
but I prefer some speaking name here. Otherwise I'd believe 're' is a pointer or boolean + it is harder to look up in the documentation.
Oct 24 2011









Walter Bright <newshound2 digitalmars.com> 