www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Formal Review of std.regex (FReD)

reply Jesse Phillips <jessekphillips+d gmail.com> writes:
Hello everyone,

I have taken the role of review manager of the std.regex replacement by 
Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 
2011-10-23 at midnight UTC. A voting thread to include into Phobos will 
be held after review assuming such is appropriate. The Voting period is 
one week.

Please note that you can try FRed as part of Phobos (Code) or by itself 
(Package of FReD) which includes docs.

Doc:

http://nascent.freeshell.org/fred/doc/

Code:

https://github.com/blackwhale/phobos MASTER

Package of FReD:

https://github.com/downloads/blackwhale/FReD/FReD.zip

Remember this will be replacing the current std.regex and is intended to 
be a drop in replacement. This project is also part of GSoC.

Dmitry, I ask that you apply this patch to posix.mak (adding to internal 
modules).

--- a/posix.mak
+++ b/posix.mak
   -184,7 +184,8    std/c/, fenv locale math process stdarg stddef stdio 
stdlib 
 time wcharh)
 EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix                    \
        std/internal/math/, biguintcore biguintnoasm biguintx86 \
-       gammafunction errorfunction) std/internal/processinit
+       gammafunction errorfunction) std/internal/processinit \
+       std/internal/uni std/internal/uni_tab
 
 # Aggregate all D modules relevant to this build
 D_MODULES = crc32 $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)
Oct 08 2011
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/8/2011 12:56 PM, Jesse Phillips wrote:
 Doc:

 http://nascent.freeshell.org/fred/doc/

1. There are many different regular expressions for strings. Should include a link to whichever one fred uses. Feel free to crib from http://www.digitalmars.com/ctg/regular.html 2. Many of the examples can be wrapped in a void main(){ ... } so that they are compilable using cut & paste. 3. "Advanced Syntax" and other headings need to be bold faced.
Oct 08 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.10.2011 0:30, Walter Bright wrote:
 On 10/8/2011 12:56 PM, Jesse Phillips wrote:
 Doc:

 http://nascent.freeshell.org/fred/doc/

1. There are many different regular expressions for strings. Should include a link to whichever one fred uses. Feel free to crib from http://www.digitalmars.com/ctg/regular.html

While I do mention ECMA-262 falvor, I agree a table right there is far more preferable. Will do.
 2. Many of the examples can be wrapped in a void main(){ ... } so that
 they are compilable using cut & paste.

Indeed, I just though it wasn't phobos style. Now looking through again I see there are a lot of examples with void main().
 3. "Advanced Syntax" and other headings need to be bold faced.

Right, thanks. -- Dmitry Olshansky
Oct 08 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/8/2011 1:43 PM, Dmitry Olshansky wrote:
 Right, thanks.

Welcs. And I might add that I do greatly appreciate the work you've done on this, I think it could be a showcase for D's capabilities.
Oct 08 2011
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.10.2011 23:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

 Code:

 https://github.com/blackwhale/phobos MASTER

 Package of FReD:

 https://github.com/downloads/blackwhale/FReD/FReD.zip

 Remember this will be replacing the current std.regex and is intended to
 be a drop in replacement. This project is also part of GSoC.

 Dmitry, I ask that you apply this patch to posix.mak (adding to internal
 modules).

 --- a/posix.mak
 +++ b/posix.mak
    -184,7 +184,8    std/c/, fenv locale math process stdarg stddef stdio
 stdlib
   time wcharh)
   EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix                    \
          std/internal/math/, biguintcore biguintnoasm biguintx86 \
 -       gammafunction errorfunction) std/internal/processinit
 +       gammafunction errorfunction) std/internal/processinit \
 +       std/internal/uni std/internal/uni_tab

   # Aggregate all D modules relevant to this build
   D_MODULES = crc32 $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)

Thanks, updated and now it works on linux for me. Though it wasn't that simple. I've found out what caused my builds to break. The thing is that both std.file & std.stdio use fully qualified std.c.stdio.func calls but never actually import std.c.stdio in any way. I wasn't even aware that's possible. So I changed it to core.stdc in std.file and added static import to std.stdio (some functions from std.c are not present in core.stdc apparently). If there is any problem with that I can revert it, and investigate why it affects only me ;) -- Dmitry Olshansky
Oct 08 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file & std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware that's
 possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)? Andrei
Oct 08 2011
next sibling parent reply Christian Kamm <kamm-incasoftware removethis.de> writes:
Andrei Alexandrescu wrote:

 On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file & std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware that's
 possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?

It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }
Oct 09 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/9/11 2:26 AM, Christian Kamm wrote:
 Andrei Alexandrescu wrote:

 On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file&  std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware that's
 possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?

It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }

Hm, this is important. But what is the contribution of a.d to the example? Do you compile it together with b.d? Andrei
Oct 09 2011
next sibling parent Christian Kamm <kamm-incasoftware removethis.de> writes:
Andrei Alexandrescu wrote:

 On 10/9/11 2:26 AM, Christian Kamm wrote:
 Andrei Alexandrescu wrote:

 On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file&  std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware
 that's possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?

It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }

Hm, this is important. But what is the contribution of a.d to the example? Do you compile it together with b.d?

Yes, 'dmd b.d' fails, 'dmd a.d b.d' succeeds.
Oct 09 2011
prev sibling parent Christian Kamm <kamm-incasoftware removethis.de> writes:
Brad Roberts wrote:
 Isn't this bug #314?  Very well known, super old, highly voted for, etc,
 etc.

No, bug 314 is about privately imported symbols being accessible even though they shouldn't be. This problem is about modules that aren't imported at all in a file or any of its imports still being accessible. Btw: I've updated my pull request to fix #314 to apply cleanly against the current dmd/master: https://github.com/D-Programming-Language/dmd/pull/190
Oct 09 2011
prev sibling parent Christian Kamm <kamm-incasoftware removethis.de> writes:
Christian Kamm wrote:

 Andrei Alexandrescu wrote:
 
 On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file & std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware that's
 possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?

It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible.

Heh, I actually reported it a while ago and then forgot about it. :) http://d.puremagic.com/issues/show_bug.cgi?id=6307
Oct 09 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-10-08 23:37, Andrei Alexandrescu wrote:
 On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file & std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware that's
 possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)? Andrei

I think it's a bug, but sometimes it can be useful. -- /Jacob Carlborg
Oct 09 2011
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On 10/9/2011 12:30 AM, Andrei Alexandrescu wrote:
 On 10/9/11 2:26 AM, Christian Kamm wrote:
 Andrei Alexandrescu wrote:

 On 10/8/11 3:34 PM, Dmitry Olshansky wrote:
 I've found out what caused my builds to break. The thing is that both
 std.file&  std.stdio use fully qualified std.c.stdio.func calls but
 never actually import std.c.stdio in any way. I wasn't even aware that's
 possible.

That may be a bug in the compiler. A symbol shouldn't be visible unless e.g. publicly imported from an imported module (could that be the case)?

It's definitely a bug. Once an import is processed, the package is visible globally as long as the parent package is accessible. This compiles: touch dmd2/src/phobos/std/empty.d a.d: import std.stdio; b.d: import std.empty; void main() { std.stdio.writeln("hi!"); }

Hm, this is important. But what is the contribution of a.d to the example? Do you compile it together with b.d? Andrei

Isn't this bug #314? Very well known, super old, highly voted for, etc, etc.
Oct 09 2011
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs. -- /Jacob Carlborg
Oct 09 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar. -- Dmitry Olshansky
Oct 09 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-09 16:09, Dmitry Olshansky wrote:
 On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.

I don't think the documentation should refer to RegEx if it's not defined in the docs. -- /Jacob Carlborg
Oct 09 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.10.2011 18:49, Jacob Carlborg wrote:
 On 2011-10-09 16:09, Dmitry Olshansky wrote:
 On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will
 end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.

I don't think the documentation should refer to RegEx if it's not defined in the docs.

-- Dmitry Olshansky
Oct 09 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-09 17:01, Dmitry Olshansky wrote:
 On 09.10.2011 18:49, Jacob Carlborg wrote:
 On 2011-10-09 16:09, Dmitry Olshansky wrote:
 On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex
 replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will
 end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos
 will
 be held after review assuming such is appropriate. The Voting
 period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by
 itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.

I don't think the documentation should refer to RegEx if it's not defined in the docs.


The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well? -- /Jacob Carlborg
Oct 09 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.10.2011 19:09, Jacob Carlborg wrote:
 On 2011-10-09 17:01, Dmitry Olshansky wrote:
 On 09.10.2011 18:49, Jacob Carlborg wrote:
 On 2011-10-09 16:09, Dmitry Olshansky wrote:
 On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex
 replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will
 end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos
 will
 be held after review assuming such is appropriate. The Voting
 period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by
 itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.

I don't think the documentation should refer to RegEx if it's not defined in the docs.


The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well?

No, that's what I tried to point out but failed obviously. The thing is that it is a templated parameter and due to constraint it could be either StaticRegex!Char or Regex!Char. They represent pattern compiled as machine code or bytecode respectively for character width of Char. All of the 6 versions of compiled patterns in the end do not have a common type nor one is technically possible (w/o some quite bad performance trade offs). -- Dmitry Olshansky
Oct 09 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-10-09 17:29, Dmitry Olshansky wrote:
 On 09.10.2011 19:09, Jacob Carlborg wrote:
 On 2011-10-09 17:01, Dmitry Olshansky wrote:
 On 09.10.2011 18:49, Jacob Carlborg wrote:
 On 2011-10-09 16:09, Dmitry Olshansky wrote:
 On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex
 replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will
 end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos
 will
 be held after review assuming such is appropriate. The Voting
 period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by
 itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.

I don't think the documentation should refer to RegEx if it's not defined in the docs.


The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well?

No, that's what I tried to point out but failed obviously. The thing is that it is a templated parameter and due to constraint it could be either StaticRegex!Char or Regex!Char. They represent pattern compiled as machine code or bytecode respectively for character width of Char. All of the 6 versions of compiled patterns in the end do not have a common type nor one is technically possible (w/o some quite bad performance trade offs).

Aha, ok, I see. Could RegEx be explained in the docs so it won't cause further confusion? -- /Jacob Carlborg
Oct 09 2011
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.10.2011 22:47, Jacob Carlborg wrote:
 On 2011-10-09 17:29, Dmitry Olshansky wrote:
 On 09.10.2011 19:09, Jacob Carlborg wrote:
 On 2011-10-09 17:01, Dmitry Olshansky wrote:
 On 09.10.2011 18:49, Jacob Carlborg wrote:
 On 2011-10-09 16:09, Dmitry Olshansky wrote:
 On 09.10.2011 14:33, Jacob Carlborg wrote:
 On 2011-10-08 21:56, Jesse Phillips wrote:
 Hello everyone,

 I have taken the role of review manager of the std.regex
 replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will
 end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos
 will
 be held after review assuming such is appropriate. The Voting
 period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by
 itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

What's the difference between Regex and RegEx? I can see RegEx in the documentation but I cannot find its definition in the docs.

RegEx is a template parameter (it's that usual abstract 'T'), that in the end deduced as StaticRegex!Char or Regex!Char where Char is char/wchar/dchar.

I don't think the documentation should refer to RegEx if it's not defined in the docs.


The second parameter type of the match function (and a couple of other functions) is RegEx, is that possible to fix as well?

No, that's what I tried to point out but failed obviously. The thing is that it is a templated parameter and due to constraint it could be either StaticRegex!Char or Regex!Char. They represent pattern compiled as machine code or bytecode respectively for character width of Char. All of the 6 versions of compiled patterns in the end do not have a common type nor one is technically possible (w/o some quite bad performance trade offs).

Aha, ok, I see. Could RegEx be explained in the docs so it won't cause further confusion?

I guess putting "The RegEx parameter can be either Regex!Char or StaticRegex!Char depending on the actual type of pattern passed" all over the place won't cut it. Placing it somewhere on the top has disadvantage of lacking any prior context, and most users will miss it anyway. Maybe I'll just add Params: section with short description to all functions that still lack one. -- Dmitry Olshansky
Oct 10 2011
prev sibling next sibling parent Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
I've not had a proper look at the code yet, but I recall from when I 
read the docs during the pre-review period that the introduction was a 
little on the informal side. It doesn't seem to have changed since then, 
and IMHO the introduction/description needs a bit of a polish to bring 
it up to the standard that is required of official documentation.

I'll be busy over the next few weeks, but I will try to make time to 
assemble some more specific comments. I just wanted to let you know that 
I thought the docs needed some work, just in case.

A...
Oct 09 2011
prev sibling next sibling parent reply Jerry <jlquinn optonline.net> writes:
I have 2 thoughts.

1) Minor doc typo:

Long form for hex notation should be \U00YYYYYY.

2) Unicode set syntax

If you're going to provide unicode set support, why not use ICU syntax
rather than invent another one?

Jerry
Oct 11 2011
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.10.2011 0:04, Jerry wrote:
 I have 2 thoughts.

 1) Minor doc typo:

 Long form for hex notation should be \U00YYYYYY.

Yeah, \U it is.
 2) Unicode set syntax

 If you're going to provide unicode set support, why not use ICU syntax
 rather than invent another one?

Looks like I was tricked by their technical standard then. I can't immediately recall where this syntax was ever used but: http://unicode.org/reports/tr18/#Subtraction_and_Intersection The prime reason cited here is that e.g. '--' is (almost) unambigious with range notation '-' and also allows to skip [] where applicable [\p{letter}--a-z] vs [[\p{letter}]-[a-z]]. Come to think of it '--' is cleaner in this case.
 Jerry

-- Dmitry Olshansky
Oct 12 2011
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
Fresh version of documentation is here:
http://blackwhale.github.com/

This fixes all typos reported so far, adds missing overload of replace 
(ouch!) and introduces a brand new syntax table.

-- 
Dmitry Olshansky
Oct 12 2011
next sibling parent reply kennytm <kennytm gmail.com> writes:
Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Fresh version of documentation is here:
 http://blackwhale.github.com/
 
 This fixes all typos reported so far, adds missing overload of replace
 (ouch!) and introduces a brand new syntax table.

The '.' really matches any character, including the new line '\n'?
Oct 12 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.10.2011 23:32, kennytm wrote:
 Dmitry Olshansky<dmitry.olsh gmail.com>  wrote:
 Fresh version of documentation is here:
 http://blackwhale.github.com/

 This fixes all typos reported so far, adds missing overload of replace
 (ouch!) and introduces a brand new syntax table.

The '.' really matches any character, including the new line '\n'?

Hm, yes. Is that a problem? -- Dmitry Olshansky
Oct 12 2011
next sibling parent reply kennytm <kennytm gmail.com> writes:
Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 On 12.10.2011 23:32, kennytm wrote:
 Dmitry Olshansky<dmitry.olsh gmail.com>  wrote:
 Fresh version of documentation is here:
 http://blackwhale.github.com/
 
 This fixes all typos reported so far, adds missing overload of replace
 (ouch!) and introduces a brand new syntax table.

The '.' really matches any character, including the new line '\n'?

Hm, yes. Is that a problem?

Most regex flavors don't match '\n' by default unless you supply the "s" flag -- including ECMAScript (well it doesn't even provide the "s" flag to allow '.' to match all characters). While I am OK with having "s" turned on by default, this should at least be documented explicitly.
Oct 12 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/12/11 9:50 PM, Jesse Phillips wrote:
 On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:

 Most regex flavors don't match '\n' by default unless you supply the "s"
 flag -- including ECMAScript (well it doesn't even provide the "s" flag
 to allow '.' to match all characters).

Really? Sense when? I didn't know there was any that didn't match \n. If you want to match everything not a new line [^\n].

Kenny's right. http://www.regular-expressions.info/dot.html Engines have special options for multiline. Andrei
Oct 12 2011
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 13.10.2011 8:38, Andrei Alexandrescu wrote:
 On 10/12/11 9:50 PM, Jesse Phillips wrote:
 On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:

 Most regex flavors don't match '\n' by default unless you supply the "s"
 flag -- including ECMAScript (well it doesn't even provide the "s" flag
 to allow '.' to match all characters).

Really? Sense when? I didn't know there was any that didn't match \n. If you want to match everything not a new line [^\n].

Kenny's right. http://www.regular-expressions.info/dot.html Engines have special options for multiline.

The funny thing is that multiline mode affects only ^ & $ anchors. And single line mode affects only . matches \r and \n rule. So it's entirely possible to use both at the same time. But anyway I guess I have to bite the bullet: add 's' option and introduce classic semantics by default. BTW in unicode end of line is much more then just \r or \n and among other things includes "unbreakable" two codepoint sequence '\r\n'. I wonder if any engine matches . in the middle of \r\n or do they detect stop on any other end-of-line characters. -- Dmitry Olshansky
Oct 13 2011
prev sibling next sibling parent Jesse Phillips <jessekphillips+d gmail.com> writes:
On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:

 Most regex flavors don't match '\n' by default unless you supply the "s"
 flag -- including ECMAScript (well it doesn't even provide the "s" flag
 to allow '.' to match all characters).

Really? Sense when? I didn't know there was any that didn't match \n. If you want to match everything not a new line [^\n].
Oct 12 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-10-12 21:41, Dmitry Olshansky wrote:
 On 12.10.2011 23:32, kennytm wrote:
 Dmitry Olshansky<dmitry.olsh gmail.com> wrote:
 Fresh version of documentation is here:
 http://blackwhale.github.com/

 This fixes all typos reported so far, adds missing overload of replace
 (ouch!) and introduces a brand new syntax table.

The '.' really matches any character, including the new line '\n'?

Hm, yes. Is that a problem?

Shouldn't "." exclude newlines? I think this is a good reference: http://www.regular-expressions.info/reference.html Which says: Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too. -- /Jacob Carlborg
Oct 12 2011
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.10.2011 22:17, Dmitry Olshansky wrote:
 Fresh version of documentation is here:
 http://blackwhale.github.com/

 This fixes all typos reported so far, adds missing overload of replace
 (ouch!) and introduces a brand new syntax table.

Updated, with single-line mode and a few documentation fixes. Source code is still here: https://github.com/blackwhale/phobos -- Dmitry Olshansky
Oct 15 2011
prev sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
Please note that the review will be ending this weekend in just 32 hours. 
At which point voting will begin, please do not wait for voting to 
criticize the library.

Updating Documentation: http://blackwhale.github.com/

On Sat, 08 Oct 2011 19:56:32 +0000, Jesse Phillips wrote:

 Hello everyone,
 
 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.
 
 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.
 
 Doc:
 
 http://nascent.freeshell.org/fred/doc/
 
 Code:
 
 https://github.com/blackwhale/phobos MASTER
 
 Package of FReD:
 
 https://github.com/downloads/blackwhale/FReD/FReD.zip
 
 Remember this will be replacing the current std.regex and is intended to
 be a drop in replacement. This project is also part of GSoC.
 
 Dmitry, I ask that you apply this patch to posix.mak (adding to internal
 modules).
 
 --- a/posix.mak +++ b/posix.mak    -184,7 +184,8    std/c/, fenv locale
 math process stdarg stddef stdio stdlib
  time wcharh)
  EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix                   
  \
         std/internal/math/, biguintcore biguintnoasm biguintx86 \
 -       gammafunction errorfunction) std/internal/processinit +      
 gammafunction errorfunction) std/internal/processinit \
 +       std/internal/uni std/internal/uni_tab
  
  # Aggregate all D modules relevant to this build D_MODULES = crc32
  $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)

Oct 22 2011
next sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
I haven't followed the discussion closely, and I cannot really comment 
on the core regex functionality, but I did actually use FReD as a 
replacement of a buggy std.regex once.

In that case I wanted to have a lazily created static regex, but I did 
not find an official way to test whether a Regex has been initialized:

	static Regex!char re;
	if(!isInitializedRE(re))
		re = regex(r"^(.*)\(([0-9]+)\):(.*)$");

So I implemented isInitializedRE() as "re.ir !is null" for std.regex and 
"re.captures() > 0" for fred, but that fails for being a "drop-in 
replacement".

I think, both versions use implementation specifics, maybe there should 
be a documented way to test for being initialized.

I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears 
twice in the documentation, same for "bmatch". I guess they should not 
appear together with the string versions.

Rainer

On 22.10.2011 18:21, Jesse Phillips wrote:
 Please note that the review will be ending this weekend in just 32 hours.
 At which point voting will begin, please do not wait for voting to
 criticize the library.

 Updating Documentation: http://blackwhale.github.com/

 On Sat, 08 Oct 2011 19:56:32 +0000, Jesse Phillips wrote:

 Hello everyone,

 I have taken the role of review manager of the std.regex replacement by
 Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
 be held after review assuming such is appropriate. The Voting period is
 one week.

 Please note that you can try FRed as part of Phobos (Code) or by itself
 (Package of FReD) which includes docs.

 Doc:

 http://nascent.freeshell.org/fred/doc/

 Code:

 https://github.com/blackwhale/phobos MASTER

 Package of FReD:

 https://github.com/downloads/blackwhale/FReD/FReD.zip

 Remember this will be replacing the current std.regex and is intended to
 be a drop in replacement. This project is also part of GSoC.

 Dmitry, I ask that you apply this patch to posix.mak (adding to internal
 modules).

 --- a/posix.mak +++ b/posix.mak    -184,7 +184,8    std/c/, fenv locale
 math process stdarg stddef stdio stdlib
   time wcharh)
   EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix
   \
          std/internal/math/, biguintcore biguintnoasm biguintx86 \
 -       gammafunction errorfunction) std/internal/processinit +
 gammafunction errorfunction) std/internal/processinit \
 +       std/internal/uni std/internal/uni_tab

   # Aggregate all D modules relevant to this build D_MODULES = crc32
   $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)


Oct 22 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 22.10.2011 20:56, Rainer Schuetze wrote:
 I haven't followed the discussion closely, and I cannot really comment
 on the core regex functionality, but I did actually use FReD as a
 replacement of a buggy std.regex once.

 In that case I wanted to have a lazily created static regex, but I did
 not find an official way to test whether a Regex has been initialized:

 static Regex!char re;
 if(!isInitializedRE(re))
 re = regex(r"^(.*)\(([0-9]+)\):(.*)$");

 So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
 "re.captures() > 0" for fred, but that fails for being a "drop-in
 replacement".

Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.
 I think, both versions use implementation specifics, maybe there should
 be a documented way to test for being initialized.

Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }
 I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
 twice in the documentation, same for "bmatch". I guess they should not
 appear together with the string versions.

I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion. Maybe it would be better to just combine docs together, and leave one overload undocumented. -- Dmitry Olshansky
Oct 22 2011
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 22.10.2011 21:05, Dmitry Olshansky wrote:
 On 22.10.2011 20:56, Rainer Schuetze wrote:
 I haven't followed the discussion closely, and I cannot really comment
 on the core regex functionality, but I did actually use FReD as a
 replacement of a buggy std.regex once.

 In that case I wanted to have a lazily created static regex, but I did
 not find an official way to test whether a Regex has been initialized:

 static Regex!char re;
 if(!isInitializedRE(re))
 re = regex(r"^(.*)\(([0-9]+)\):(.*)$");

 So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
 "re.captures() > 0" for fred, but that fails for being a "drop-in
 replacement".

Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.
 I think, both versions use implementation specifics, maybe there should
 be a documented way to test for being initialized.

Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }

I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.
 I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
 twice in the documentation, same for "bmatch". I guess they should not
 appear together with the string versions.

I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion. Maybe it would be better to just combine docs together, and leave one overload undocumented.

As RegEx is a template argument here, it can stand for both Regex and StaticRegex, and that should be mentioned. Whether it has two different implementations is an implementation detail that does not need to bother the user. If you want to keep the second entries, I'd recommend renaming the argument to StaticRegEx.
Oct 23 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 23.10.2011 11:28, Rainer Schuetze wrote:
 On 22.10.2011 21:05, Dmitry Olshansky wrote:
 On 22.10.2011 20:56, Rainer Schuetze wrote:
 I haven't followed the discussion closely, and I cannot really comment
 on the core regex functionality, but I did actually use FReD as a
 replacement of a buggy std.regex once.

 In that case I wanted to have a lazily created static regex, but I did
 not find an official way to test whether a Regex has been initialized:

 static Regex!char re;
 if(!isInitializedRE(re))
 re = regex(r"^(.*)\(([0-9]+)\):(.*)$");

 So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
 "re.captures() > 0" for fred, but that fails for being a "drop-in
 replacement".

Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.
 I think, both versions use implementation specifics, maybe there should
 be a documented way to test for being initialized.

Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }

I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.

"" is a valid regex that matches anywhere, with global flag it will match before any codepoint + once at end. I'm not sure using 'valid' is good, it may mislead user to check it all over the place e.g.: auto r = regex("blah"); if(r.valid()) ...
 I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
 twice in the documentation, same for "bmatch". I guess they should not
 appear together with the string versions.

I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion. Maybe it would be better to just combine docs together, and leave one overload undocumented.

As RegEx is a template argument here, it can stand for both Regex and StaticRegex, and that should be mentioned. Whether it has two different implementations is an implementation detail that does not need to bother the user.

OK, will do.
 If you want to keep the second entries, I'd recommend renaming the
 argument to StaticRegEx.

-- Dmitry Olshansky
Oct 23 2011
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 23.10.2011 17:46, Dmitry Olshansky wrote:
 On 23.10.2011 11:28, Rainer Schuetze wrote:
 On 22.10.2011 21:05, Dmitry Olshansky wrote:
 On 22.10.2011 20:56, Rainer Schuetze wrote:
 I haven't followed the discussion closely, and I cannot really comment
 on the core regex functionality, but I did actually use FReD as a
 replacement of a buggy std.regex once.

 In that case I wanted to have a lazily created static regex, but I did
 not find an official way to test whether a Regex has been initialized:

 static Regex!char re;
 if(!isInitializedRE(re))
 re = regex(r"^(.*)\(([0-9]+)\):(.*)$");

 So I implemented isInitializedRE() as "re.ir !is null" for std.regex
 and
 "re.captures() > 0" for fred, but that fails for being a "drop-in
 replacement".

Coincidentally, you still can access re.ir property in this way. Wow, I wonder how far with backwards compatibility I can go :) In both cases this relies on undocumented features. Even now I can suggest a more portable and entirely generic way: if(re == Regex!(char).init) { //create re } Though that risks doing more work then needed.
 I think, both versions use implementation specifics, maybe there should
 be a documented way to test for being initialized.

Definitely. How about adding an empty property + opCast to bool, with that you'd get: if(!re) { //create re } and a bit more verbose: if(re.empty) { //create re }

I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.

"" is a valid regex that matches anywhere, with global flag it will match before any codepoint + once at end. I'm not sure using 'valid' is good, it may mislead user to check it all over the place e.g.: auto r = regex("blah"); if(r.valid()) ....

You may be right. Maybe 'initialized', otherwise 'empty' isn't too bad as well. But I think it should be explicite, so I would not add opCast to bool.
Oct 24 2011
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On Oct 22, 2011, at 12:05 PM, Dmitry Olshansky wrote:

 On 22.10.2011 20:56, Rainer Schuetze wrote:
 [=85]
 I think, both versions use implementation specifics, maybe there =


 be a documented way to test for being initialized.
=20

Definitely. How about adding an empty property + opCast to bool, with =

 if(!re)
 {
 //create re
 }

I think this is better, should one ever want to switch to plain = pointer=85, also you need less thinking if it works like for classes.
 and a bit more verbose:
 if(re.empty)
 {
 //create re
 }

Oct 22 2011
prev sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 22.10.2011, 21:05 Uhr, schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 Definitely. How about adding an empty property + opCast to bool, with  
 that you'd get:
 if(!re)
 {
 //create re
 }

It is nice that you *can* do this,
 and a bit more verbose:
 if(re.empty)
 {
 //create re
 }

but I prefer some speaking name here. Otherwise I'd believe 're' is a pointer or boolean + it is harder to look up in the documentation.
Oct 24 2011