digitalmars.D - FReD is ready for review

Dmitry Olshansky (27/27) Sep 13 2011 FReD ( Fast Regular expressions for D) is a GSOC project proposed as

Vladimir Panteleev (11/16) Sep 13 2011 Hi, thanks for your hard work!

Vladimir Panteleev (7/9) Sep 13 2011 Sorry, I had -wi enabled in my build environment. (Why would -wi cause

Jonathan M Davis (5/13) Sep 13 2011 -wi shouldn't cause fatal errors, but anything going into Phobos needs t...

Dmitry Olshansky (20/33) Sep 13 2011 Arrrrgh! On serious note it is fixed in some recent pull request, but

Vladimir Panteleev (8/10) Sep 13 2011 That's really cool... I noticed that d_dna.d doesn't use regexCt, would ...

Dmitry Olshansky (11/18) Sep 13 2011 In this particular case it may very well much faster actually. The thing...

Vladimir Panteleev (7/10) Sep 13 2011 Ah, okay. I thought compile-time regex was the same as the default engin...

Dmitry Olshansky (28/35) Sep 13 2011 Let me explain a bit. The default as chosen is fairly advanced NFA

Andrei Alexandrescu (5/16) Sep 13 2011 Sounds great! I advise you to invest effort in reducing the 25% handicap...

Dmitry Olshansky (23/40) Sep 14 2011 Yes, sir! :)

Simen Kjaeraas (6/24) Sep 14 2011 Even so, that is awesome. Another point to D for its

Marco Leise (5/31) Sep 14 2011 Wait with the champaign until this doesn't use 8 GB RAM during compile a...

Martin Nowak (2/36) Sep 14 2011 It peaks at 4GB for me and takes about 20s to compile.

Andrei Alexandrescu (7/49) Sep 14 2011 Great, thanks. Could you please email me privately some more information...

bearophile (6/13) Sep 13 2011 This is not so good. Even if this problem is not solved soon, it will be...

Marco Leise (6/11) Sep 13 2011 I heard a Google engineer talk about that. In JavaScript regular

Jonathan M Davis (12/17) Sep 13 2011 I suspect that there's value in reviewing the unicode stuff separately, ...

Dmitry Olshansky (12/28) Sep 13 2011 It doesn't affect API at all, but an important implementation detail.

Jonathan M Davis (5/37) Sep 13 2011 As long as it doesn't affect the API, then we can review std.regex first...

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

FReD ( Fast Regular expressions for D) is a GSOC project proposed as 
source level compatible replacement for std.regex. Needless to say it is 
also superior in many ways ;)

Source + docs + goodies packaged here:
https://github.com/downloads/blackwhale/FReD/FReD.zip

Goodies include popular regex-dna benchmark, by my measures we should be 

linux box, on win32 sadly it runs out of memory on the same machine 
(false pointers?).

Speaking of source there are few artifacts that should end up in std.uni 
and not in std.regex:
CodepointSet and CodepointTrie, and unicode property tables.
I'd rather have them all accessible to user, but their interface is 
admittedly clunky, I'm open to ideas on better API.

Notable caveats:
	- backreferences allowed only to locally unique groups, e.g. 
\b(\w+)\b\1 is allowed, same w/o any of \b - not.
	- replace with delegeate now takes Captures!R where R is e.g. string. 
This is due to having few types of engine and RegexMatch!X respectively. 
Previous signature allowed a certain measure of abuse e.g. you could 
have done a couple of popFronts on it matcher(!) in this delegate. 
However in most cases the code inside can be left as is.

Otherwise it should work with existing code after replacing imports.

P.S. I'm taking a seat somewhere in review queue, and busying myself 
with newspaper :)

-- 
Dmitry Olshansky

Sep 13 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

Hi, thanks for your hard work!

On Wed, 14 Sep 2011 00:04:08 +0300, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 https://github.com/downloads/blackwhale/FReD/FReD.zip

Doesn't seem to compile on the latest DMD, due to switch case fallthrough.

 Goodies include popular regex-dna benchmark, by my measures we should be  


This is awesome, but if it's not a secret, who are the first two, and how  
far behind is FReD? Does this regard regular or compile-time regex?

 Though I used 64bit VM linux box, on win32 sadly it runs out of memory  
 on the same machine (false pointers?).

If I'll find time to port my memory debugger to D2 before someone else  
figures it out, I'll have a look at this.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Sep 13 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wed, 14 Sep 2011 00:18:03 +0300, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 Doesn't seem to compile on the latest DMD, due to switch case  
 fallthrough.

Sorry, I had -wi enabled in my build environment. (Why would -wi cause  
fatal errors, though? Sounds like a bug.)

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Sep 13 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, September 13, 2011 14:25 Vladimir Panteleev wrote:
 On Wed, 14 Sep 2011 00:18:03 +0300, Vladimir Panteleev
 
 <vladimir thecybershadow.net> wrote:
 Doesn't seem to compile on the latest DMD, due to switch case
 fallthrough.

 
 Sorry, I had -wi enabled in my build environment. (Why would -wi cause
 fatal errors, though? Sounds like a bug.)

-wi shouldn't cause fatal errors, but anything going into Phobos needs to 
build with -w. So, if the proposed changes don't build with -w, that needs to 
be fixed.

- Jonathan M Davis

Sep 13 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.09.2011 1:18, Vladimir Panteleev wrote:
 Hi, thanks for your hard work!

 On Wed, 14 Sep 2011 00:04:08 +0300, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 https://github.com/downloads/blackwhale/FReD/FReD.zip

 Doesn't seem to compile on the latest DMD, due to switch case fallthrough.

Arrrrgh! On serious note it is fixed in some recent pull request, but 
that was hopelessly late for getting in this release.

 Goodies include popular regex-dna benchmark, by my measures we should


 This is awesome, but if it's not a secret, who are the first two, and
 how far behind is FReD? Does this regard regular or compile-time regex?

OK in this particular benchmark, we are slighlty behind Google inc. :)
Namely RE2 and irregex.
On my machine RE2 is ~5 sec while FReD does the job in ~7sec.
As for irregex, ... uhm-oh I haven't actually tested it (failed to just 
compile) but it beats RE2 in this test by some ~30%.


OpenMP declarations stripped:
http://shootout.alioth.debian.org/u32/performance.php?test=regexdna


Anyway, the rest of competition is not even close, and I still haven't 
spent all of my tricks.

 Though I used 64bit VM linux box, on win32 sadly it runs out of memory
 on the same machine (false pointers?).

 If I'll find time to port my memory debugger to D2 before someone else
 figures it out, I'll have a look at this.

Thanks, by my cut and paste tests shows the problem is caused by 
allocating large buffers in replace loop. Probably a better solution to 
this is alternative replace function that doesn't allocate on each call 
but uses a OutputRange.

-- 
Dmitry Olshansky

Sep 13 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wed, 14 Sep 2011 00:37:28 +0300, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 Anyway, the rest of competition is not even close, and I still haven't  
 spent all of my tricks.

That's really cool... I noticed that d_dna.d doesn't use regexCt, would  
that make any difference or am I overestimating the advantage of  
compile-time codegen?

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Sep 13 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.09.2011 1:42, Vladimir Panteleev wrote:
 On Wed, 14 Sep 2011 00:37:28 +0300, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 Anyway, the rest of competition is not even close, and I still haven't
 spent all of my tricks.

 That's really cool... I noticed that d_dna.d doesn't use regexCt, would
 that make any difference or am I overestimating the advantage of
 compile-time codegen?

In this particular case it may very well much faster actually. The thing 
about this benchmark is that it exposes all the nasty constant overhead 
there is in the engine.
On the other hand C-T regex doesn't have most of "what do we do next" 
overhead so it should be faster.
However when it comes to some serious business like matching charsets 
and doing ambiguous control flow it starts losing to more superior 
design of the default engine.

-- 
Dmitry Olshansky

Sep 13 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wed, 14 Sep 2011 00:49:47 +0300, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 However when it comes to some serious business like matching charsets  
 and doing ambiguous control flow it starts losing to more superior  
 design of the default engine.

Ah, okay. I thought compile-time regex was the same as the default engine,  
but without the runtime parsing and bytecode interpretation(?).

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Sep 13 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.09.2011 1:55, Vladimir Panteleev wrote:
 On Wed, 14 Sep 2011 00:49:47 +0300, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 However when it comes to some serious business like matching charsets
 and doing ambiguous control flow it starts losing to more superior
 design of the default engine.

 Ah, okay. I thought compile-time regex was the same as the default
 engine, but without the runtime parsing and bytecode interpretation(?).

Let me explain a bit. The default as chosen is fairly advanced NFA 
engine which features:
	- never backtracks and consequently decodes each UTF8/16/32 character 
only once;
	- as side effect doesn't support fully combinatorial backreferences 
(why would someone use 'em anyway);
	- guaranteed O(N*M) time on input with M being "size" of pattern;

This also opens up matching directly on buffered streams... but back to 
the point.

There is also a slower traditional backtracking engine, it still faster 
then many other engines out there but it ebbs out on some patterns like 
.*.*.*Z in text with no 'Z', precisely because of combinatorial nature 
of backtracking.

Then C-T is built on top of backtracking, just because it was much 
easier this way :)
It's indeed replacing bytecode interpretations with hardcoded code which 
is exceptionally fast, but the prime time consumer in real patterns is 
usually this memoize/return. C-T regex is still a bit experimental 
though and going to be improved over time.

Last but not least there is also completely dumb (as in dead simple and 
fast) kickstart engine that does all of heavy lifting iff pattern starts 
with something more or less constant (e.g. it does 50% of job in regex-dna).

And you can just parse at compile time and use default run-time engine 
for matching:
enum r = regex("whatever");


-- 
Dmitry Olshansky

Sep 13 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/13/11 4:37 PM, Dmitry Olshansky wrote:
 OK in this particular benchmark, we are slighlty behind Google inc. :)
 Namely RE2 and irregex.
 On my machine RE2 is ~5 sec while FReD does the job in ~7sec.
 As for irregex, ... uhm-oh I haven't actually tested it (failed to just
 compile) but it beats RE2 in this test by some ~30%.


 OpenMP declarations stripped:
 http://shootout.alioth.debian.org/u32/performance.php?test=regexdna


 Anyway, the rest of competition is not even close, and I still haven't
 spent all of my tricks.

Sounds great! I advise you to invest effort in reducing the 25% handicap 
to <1%. As regexen have a fairly standardized interface, speed is the 
most impactful differentiatior people look at.

Andrei

Sep 13 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.09.2011 2:44, Andrei Alexandrescu wrote:
 On 9/13/11 4:37 PM, Dmitry Olshansky wrote:
 OK in this particular benchmark, we are slighlty behind Google inc. :)
 Namely RE2 and irregex.
 On my machine RE2 is ~5 sec while FReD does the job in ~7sec.
 As for irregex, ... uhm-oh I haven't actually tested it (failed to just
 compile) but it beats RE2 in this test by some ~30%.


 OpenMP declarations stripped:
 http://shootout.alioth.debian.org/u32/performance.php?test=regexdna


 Anyway, the rest of competition is not even close, and I still haven't
 spent all of my tricks.

 Sounds great! I advise you to invest effort in reducing the 25% handicap
 to <1%.

Yes, sir! :)

As regexen have a fairly standardized interface, speed is the
 most impactful differentiatior people look at.

Well in this particular benchmark there is little I can do, some small 
scale optimizations and such, I expect no more then 10%.
The bigger picture is that our best bet here is static regex, the 
competitors use some aggressive JIT so it's a fair play. I actually 
received pull request this morning do just that  (thx, Martin) though 
dmd uses so much memory it's not for mortals yet:

I had 2G of ram + some 600mb of swap - fail on 3rd pattern of 9
+ 2 Gb of swap - 6 out of 9
Come on, it's a sport!
+ 4 Gb of swap and after few restless minutes I got it working

Fresh timings:
FReD C-T  3.4 sec
FReD R-T  6.6 sec
RE2       4.8 sec
V8 JS     3.7 sec

Look ma, we are on top! :)

So, by the end of day, D does kick ass, even though it's a bit of 
Pyrrhic victory.

---
Dmitry Olshansky

Sep 14 2011

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Wed, 14 Sep 2011 14:51:54 +0200, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 Well in this particular benchmark there is little I can do, some small  
 scale optimizations and such, I expect no more then 10%.
 The bigger picture is that our best bet here is static regex, the  
 competitors use some aggressive JIT so it's a fair play. I actually  
 received pull request this morning do just that  (thx, Martin) though  
 dmd uses so much memory it's not for mortals yet:

 I had 2G of ram + some 600mb of swap - fail on 3rd pattern of 9
 + 2 Gb of swap - 6 out of 9
 Come on, it's a sport!
 + 4 Gb of swap and after few restless minutes I got it working

 Fresh timings:
 FReD C-T  3.4 sec
 FReD R-T  6.6 sec
 RE2       4.8 sec
 V8 JS     3.7 sec

 Look ma, we are on top! :)

 So, by the end of day, D does kick ass, even though it's a bit of  
 Pyrrhic victory.

Even so, that is awesome. Another point to D for its
metaprogramming.

-- 
   Simen

Sep 14 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 14.09.2011, 17:10 Uhr, schrieb Simen Kjaeraas <simen.kjaras gmail.com>:

 On Wed, 14 Sep 2011 14:51:54 +0200, Dmitry Olshansky  
 <dmitry.olsh gmail.com> wrote:

 Well in this particular benchmark there is little I can do, some small  
 scale optimizations and such, I expect no more then 10%.
 The bigger picture is that our best bet here is static regex, the  
 competitors use some aggressive JIT so it's a fair play. I actually  
 received pull request this morning do just that  (thx, Martin) though  
 dmd uses so much memory it's not for mortals yet:

 I had 2G of ram + some 600mb of swap - fail on 3rd pattern of 9
 + 2 Gb of swap - 6 out of 9
 Come on, it's a sport!
 + 4 Gb of swap and after few restless minutes I got it working

 Fresh timings:
 FReD C-T  3.4 sec
 FReD R-T  6.6 sec
 RE2       4.8 sec
 V8 JS     3.7 sec

 Look ma, we are on top! :)

 So, by the end of day, D does kick ass, even though it's a bit of  
 Pyrrhic victory.

 Even so, that is awesome. Another point to D for its
 metaprogramming.

Wait with the champaign until this doesn't use 8 GB RAM during compile any  
more ;) Other than that, congrats! This is really impressive. I think you  
do some great pioneer work, that will help other who try to embed simple  
domain specific languages.

Sep 14 2011

"Martin Nowak" <dawg dawgfoto.de> writes:

On Wed, 14 Sep 2011 17:36:25 +0200, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 14.09.2011, 17:10 Uhr, schrieb Simen Kjaeraas  
 <simen.kjaras gmail.com>:

 On Wed, 14 Sep 2011 14:51:54 +0200, Dmitry Olshansky  
 <dmitry.olsh gmail.com> wrote:

 Well in this particular benchmark there is little I can do, some small  
 scale optimizations and such, I expect no more then 10%.
 The bigger picture is that our best bet here is static regex, the  
 competitors use some aggressive JIT so it's a fair play. I actually  
 received pull request this morning do just that  (thx, Martin) though  
 dmd uses so much memory it's not for mortals yet:

 I had 2G of ram + some 600mb of swap - fail on 3rd pattern of 9
 + 2 Gb of swap - 6 out of 9
 Come on, it's a sport!
 + 4 Gb of swap and after few restless minutes I got it working

 Fresh timings:
 FReD C-T  3.4 sec
 FReD R-T  6.6 sec
 RE2       4.8 sec
 V8 JS     3.7 sec

 Look ma, we are on top! :)

 So, by the end of day, D does kick ass, even though it's a bit of  
 Pyrrhic victory.

 Even so, that is awesome. Another point to D for its
 metaprogramming.

 Wait with the champaign until this doesn't use 8 GB RAM during compile  
 any more ;) Other than that, congrats! This is really impressive. I  
 think you do some great pioneer work, that will help other who try to  
 embed simple domain specific languages.

It peaks at 4GB for me and takes about 20s to compile.

Sep 14 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/14/11 7:51 AM, Dmitry Olshansky wrote:
 On 14.09.2011 2:44, Andrei Alexandrescu wrote:
 On 9/13/11 4:37 PM, Dmitry Olshansky wrote:
 OK in this particular benchmark, we are slighlty behind Google inc. :)
 Namely RE2 and irregex.
 On my machine RE2 is ~5 sec while FReD does the job in ~7sec.
 As for irregex, ... uhm-oh I haven't actually tested it (failed to just
 compile) but it beats RE2 in this test by some ~30%.


 OpenMP declarations stripped:
 http://shootout.alioth.debian.org/u32/performance.php?test=regexdna


 Anyway, the rest of competition is not even close, and I still haven't
 spent all of my tricks.

 Sounds great! I advise you to invest effort in reducing the 25% handicap
 to <1%.

 Yes, sir! :)

 As regexen have a fairly standardized interface, speed is the
 most impactful differentiatior people look at.

 Well in this particular benchmark there is little I can do, some small
 scale optimizations and such, I expect no more then 10%.
 The bigger picture is that our best bet here is static regex, the
 competitors use some aggressive JIT so it's a fair play. I actually
 received pull request this morning do just that (thx, Martin) though dmd
 uses so much memory it's not for mortals yet:

 I had 2G of ram + some 600mb of swap - fail on 3rd pattern of 9
 + 2 Gb of swap - 6 out of 9
 Come on, it's a sport!
 + 4 Gb of swap and after few restless minutes I got it working

 Fresh timings:
 FReD C-T 3.4 sec
 FReD R-T 6.6 sec
 RE2 4.8 sec
 V8 JS 3.7 sec

 Look ma, we are on top! :)

 So, by the end of day, D does kick ass, even though it's a bit of
 Pyrrhic victory.

 ---
 Dmitry Olshansky

Great, thanks. Could you please email me privately some more information 
about the benchmark and your test code? I'm considering using this in my 
upcoming talk at the Strange Loop conference.

Thanks again,

Andrei

Andrei

Sep 14 2011

bearophile <bearophileHUGS lycos.com> writes:

Dmitry Olshansky:

 Goodies include popular regex-dna benchmark, by my measures we should be 



amount of engineering efforts.


 Though I used 64bit VM 
 linux box, on win32 sadly it runs out of memory on the same machine 
 (false pointers?).

This is not so good. Even if this problem is not solved soon, it will be good
to know its real cause.


 P.S. I'm taking a seat somewhere in review queue, and busying myself 
 with newspaper :)

*Gives some fine white-golden tea*

Bye,
bearophile

Sep 13 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 13.09.2011, 23:22 Uhr, schrieb bearophile <bearophileHUGS lycos.com>:

 Dmitry Olshansky:

 Goodies include popular regex-dna benchmark, by my measures we should be



 significant amount of engineering efforts.

I heard a Google engineer talk about that. In JavaScript regular  
expressions are often used for a wider range of operations on strings  
(e.g. splitting), since they were generally faster than doing the same in  
pure JavaScript. It is one of those few occasions where a JS developer can  
rely on native code in a library doing most of the work.

Sep 13 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, September 13, 2011 14:04 Dmitry Olshansky wrote:
 Speaking of source there are few artifacts that should end up in std.uni
 and not in std.regex:
 CodepointSet and CodepointTrie, and unicode property tables.
 I'd rather have them all accessible to user, but their interface is
 admittedly clunky, I'm open to ideas on better API.

I suspect that there's value in reviewing the unicode stuff separately, and if 
it affects std.regex' API at all (as opposed to being simply an implementation 
detail), I'd suggest that we review and sort out the unicode stuff before we 
review then new std.regex, since that would then have definite impact on 
std.regex, and it affects a lot more than just std.regex.

If the unicode stuff is entirely an implementation detail of std.regex and is 
not in the API, then we can review the unicode stuff separately and make 
std.regex use the stuff in std.uni once it's in there, but if you're doing 
major unicode stuff, I think that that's significant enough to merit its own 
review.

- Jonathan M Davis

Sep 13 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.09.2011 1:36, Jonathan M Davis wrote:
 On Tuesday, September 13, 2011 14:04 Dmitry Olshansky wrote:
 Speaking of source there are few artifacts that should end up in std.uni
 and not in std.regex:
 CodepointSet and CodepointTrie, and unicode property tables.
 I'd rather have them all accessible to user, but their interface is
 admittedly clunky, I'm open to ideas on better API.

 I suspect that there's value in reviewing the unicode stuff separately, and if
 it affects std.regex' API at all (as opposed to being simply an implementation
 detail), I'd suggest that we review and sort out the unicode stuff before we
 review then new std.regex, since that would then have definite impact on
 std.regex, and it affects a lot more than just std.regex.

It doesn't affect API at all, but an important implementation detail.
Particularly one can't get reasonable performance w/o unicode 
incorporated as an implementation detail yet I feel there should be a 
more general interface to it. Otherwise everyone willing to tackle 
unicode would have to, for instance, duplicate all of unicode property 
tables, and that's some 100Kb we are looking at.


 If the unicode stuff is entirely an implementation detail of std.regex and is
 not in the API, then we can review the unicode stuff separately and make
 std.regex use the stuff in std.uni once it's in there, but if you're doing
 major unicode stuff, I think that that's significant enough to merit its own
 review.

It could be done in two steps, if I was to choose which way to go I'd 
first folded in std.regex then moved all its hidden unicode stuff to 
std.uni. It's mostly about character classification and about doing it fast.


-- 
Dmitry Olshansky

Sep 13 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, September 13, 2011 14:52 Dmitry Olshansky wrote:
 On 14.09.2011 1:36, Jonathan M Davis wrote:
 On Tuesday, September 13, 2011 14:04 Dmitry Olshansky wrote:
 Speaking of source there are few artifacts that should end up in std.uni
 and not in std.regex:
 CodepointSet and CodepointTrie, and unicode property tables.
 I'd rather have them all accessible to user, but their interface is
 admittedly clunky, I'm open to ideas on better API.

 
 I suspect that there's value in reviewing the unicode stuff separately,
 and if it affects std.regex' API at all (as opposed to being simply an
 implementation detail), I'd suggest that we review and sort out the
 unicode stuff before we review then new std.regex, since that would then
 have definite impact on std.regex, and it affects a lot more than just
 std.regex.

 
 It doesn't affect API at all, but an important implementation detail.
 Particularly one can't get reasonable performance w/o unicode
 incorporated as an implementation detail yet I feel there should be a
 more general interface to it. Otherwise everyone willing to tackle
 unicode would have to, for instance, duplicate all of unicode property
 tables, and that's some 100Kb we are looking at.
 
 If the unicode stuff is entirely an implementation detail of std.regex
 and is not in the API, then we can review the unicode stuff separately
 and make std.regex use the stuff in std.uni once it's in there, but if
 you're doing major unicode stuff, I think that that's significant enough
 to merit its own review.

 
 It could be done in two steps, if I was to choose which way to go I'd
 first folded in std.regex then moved all its hidden unicode stuff to
 std.uni. It's mostly about character classification and about doing it
 fast.

As long as it doesn't affect the API, then we can review std.regex first, but 
we do definitely want to get any major unicode improvements into std.uni. And 
that will probably merit its own review.

- Jonathan M Davis

Sep 13 2011

D Programming

C/C++ Programming

Other

digitalmars.D - FReD is ready for review