digitalmars.D - earthquake changes of std.regexp to come

Andrei Alexandrescu (14/14) Feb 17 2009 I'm quite unhappy with the API of std.regexp. It's a chaotic design that...

bearophile (10/13) Feb 17 2009 I have no problems in accepting changes here too. D2 is already essentia...

Joel C. Salomon (14/15) Feb 17 2009 So steal one, rather than invent something new. My suggestion would be

Andrei Alexandrescu (4/19) Feb 17 2009 s/string/input range/

dsimcha (14/28) Feb 17 2009 As I've said before, anyone who can't stomach breaking changes w/o compl...

dsimcha (3/31) Feb 17 2009 BTW, can you elaborate on how arrays, both builtin and any library versi...

Andrei Alexandrescu (34/66) Feb 17 2009 Well finalizations hinges not only on me but on Walter (bugfixes and a

Yigal Chripun (16/101) Feb 18 2009 I've got a few questions about the proposed container value semantics:
Yigal Chripun (4/89) Feb 18 2009 Another question regarding the container design - have you considered
Georg Wrede (11/20) Feb 20 2009 I admit I'm tired right now... You mention disadvantages, the one I

Andrei Alexandrescu (13/37) Feb 20 2009 Better said, I was too tired when I posted that. I gave too little

Bill Baxter (9/13) Feb 17 2009 So what do you think it should be, a struct?

Andrei Alexandrescu (26/40) Feb 17 2009 Well you'd be surprised. The RegEx class saves the state of the last

bearophile (7/10) Feb 17 2009 (I often use xplit() that is like split but yields items lazily, for lar...
Bill Baxter (12/53) Feb 17 2009 So that sounds to me like RegEx should have a .dup, and then it would

Andrei Alexandrescu (7/13) Feb 17 2009 I lost that perspective when criticizing RegExp, you're right. But still...

Bill Baxter (10/22) Feb 17 2009 Ok. I'm certainly not in love with the API either. Though, the only

bearophile (5/6) Feb 17 2009 I agree, I too need the Python docs every time I want to use something m...

Jarrett Billingsley (5/12) Feb 17 2009 Is there ever a situation where you want to use a single regexp for

Andrei Alexandrescu (17/31) Feb 17 2009 Ehm, that's odd. You'd think that after Perl has set the precedent, it

Bill Baxter (9/41) Feb 17 2009 All I know is that I found one incantation that works and I've been

Walter Bright (5/9) Feb 20 2009 std.regexp evolved out of the ECMAscript regex functions - they have the...

Andrei Alexandrescu (3/13) Feb 20 2009 s/good \(bad\?\)/REALLY BAD/

Denis Koroskin (4/16) Feb 20 2009 Backward compatibility is almost always a bad thing.

Andrei Alexandrescu (4/25) Feb 20 2009 In this case it's even worse, as I don't think anyone expects to paste

Jarrett Billingsley (20/33) Feb 17 2009 Well I don't mean to, uh, toot my own horn but.. I recently bound

Bill Baxter (9/22) Feb 17 2009 Btw, I've got no problems with you breaking the API of 2.0 either.

Andrei Alexandrescu (3/29) Feb 17 2009 I was thinking of moving older stuff to etc, is that ok?

Walter Bright (3/4) Feb 17 2009 Yes. But you should also rename the new one, perhaps to std.regex. That

Andrei Alexandrescu (5/10) Feb 17 2009 Terrific. I prefer "regex" to "regexp" because it's easier to pronounce,...

bearophile (4/7) Feb 17 2009 I'd like std.re :-)
Chris Nicholson-Sauls (7/20) Feb 17 2009 It sounds to me like a frog who, immediately post-utterance, just got

Leandro Lucarella (11/20) Feb 19 2009 What's the rationale for "etc"? Why not "deprecated", o something shorte...

Andrei Alexandrescu (3/17) Feb 19 2009 In the words of George Costanza: "Because it's there!"

Ellery Newcomer (2/21) Feb 19 2009 Shouldn't that be George Mallory?

Andrei Alexandrescu (14/37) Feb 19 2009 No, he said "because it is there". George said "because it's there":

Georg Wrede (4/12) Feb 20 2009 With the critique you've given to the existing regexp stuff, deprecated

Bill Baxter (10/23) Feb 20 2009 Agreed.
Leandro Lucarella (9/21) Feb 20 2009 Why not "misc" for that? =)

BCS (4/24) Feb 17 2009 For what it's worth, I have a partial clone of the .NET API built on top...

Daniel de Kok (8/11) Feb 17 2009 Actually, I was wondering why nobody is considering real regular

Andrei Alexandrescu (6/16) Feb 17 2009 I am considering that. One nice feature of "classic" regexes is that

Jarrett Billingsley (12/21) Feb 17 2009 Tango's regex engine is just that. It uses a tagged NFA method.

bearophile (4/5) Feb 17 2009 A modern CPU is able to do something like 60*2*2E9 operations in that ti...
BCS (14/31) Feb 17 2009 could this be transitioned to CTFE? you could even have a debug mode tha...

Jarrett Billingsley (6/19) Feb 17 2009 For what it's worth the Tango regexes actually have a method to output

BCS (4/28) Feb 17 2009 For any kind of debug, yeah, that's a problem. OTOH for release, as long...
Chris Nicholson-Sauls (5/28) Feb 17 2009 I feature which I *adore* by the way. So long as the precompiled regex

Daniel de Kok (3/6) Feb 17 2009 I have only been tinkering with Phobos, but that's good to hear, thanks!

Daniel de Kok (13/20) Feb 17 2009 Hmmm, define "complex", I suppose it's ok for the general
Jarrett Billingsley (5/6) Feb 17 2009 \w+([\-+.]\w+)*@\w+([\-.]\w+)*\.\w+([\-.]\w+)*

BCS (4/16) Feb 17 2009 I wonder how well it would work on this:

Daniel de Kok (17/23) Feb 17 2009 Hmm, odd. I have translated that regexp to the syntax of the tool that
Andrei Alexandrescu (4/30) Feb 17 2009 That would be cool; I find the engine in std.regexp rather hard to

Derek Parnell (8/13) Feb 17 2009 If your changes are going to make things better for coding and maintenan...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I'm quite unhappy with the API of std.regexp. It's a chaotic design that 
provides a hodgepodge of functionality and tries to use as many synonyms 
of "to find" in the dictionary (e.g. search, match). I could swear 
Walter never really cared for using regexps, and that is felt throughout 
the design: it fills the bullet point but it's asinine to use.

Besides std.regexp only works with (narrow) strings and we want it to 
work on streams of all widths and structures. One pet complaint I have 
is that std.regexp puts a class around it all as if everybody's favorite 
pastime would be to inherit Regexp and override some random function in it.

In the upcoming releases of D 2.0 there will be rather dramatic breaking 
changes of phobos. I just wanted to ask whether y'all could stomach yet 
another rewritten API or you'd rather use std.regexp as it is for the 
time being.


Andrei

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Don't be too much hard with the good Walter, please :-) One good thing in his
designs (in D1) is that they are often simple to use: they give you back much
more than you give them. D2 seems to ask much more from the programmer.

I agree that the API of regexes in Phobos is not much good, but I think
designing a good API for it is quite hard.


 I just wanted to ask whether y'all could stomach yet 
 another rewritten API or you'd rather use std.regexp as it is for the 
 time being.

I have no problems in accepting changes here too. D2 is already essentially
another language compared to D1.

Regarding regexes of D1 Phobos, it has problems bigger than just the API, in
the past I have found some common cases where it is O(n^2) or more.

You can see a case of such behaviours here (look at my comments that show what
parts are slow, I have also commented out versions that more logical but much
slower):
http://shootout.alioth.debian.org/debian/benchmark.php?test=regexdna&lang=gdc&id=4

If you want to test that code you can generate test data with this other code:
http://shootout.alioth.debian.org/debian/benchmark.php?test=fasta&lang=dlang&id=1

Bye,
bearophile

Feb 17 2009

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

bearophile wrote:
 I agree that the API of regexes in Phobos is not much good, but I think
designing a good API for it is quite hard.

So steal one, rather than invent something new. My suggestion would be
to expose the DFA object, as in Plan 9’s library (documentation at
<http://plan9.bell-labs.com/magic/man2html/2/regexp>, implementation at
<http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/>,
discussion and links to a Unix implementation at
<http://swtch.com/~rsc/regexp/>).

Simple API:
• regcomp: Compile a regexp DFA;
• regexec: Apply it to a string, returning a slice of the string that
matches the first hit (or an array of slices if parenthesized
expressions are used); and
• regsub: Apply substitutions to subexpressions of the matching slice.

—Joel Salomon

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Joel C. Salomon wrote:
 bearophile wrote:
 I agree that the API of regexes in Phobos is not much good, but I think
designing a good API for it is quite hard.

 
 So steal one, rather than invent something new. My suggestion would be
 to expose the DFA object, as in Plan 9�s library (documentation at
 <http://plan9.bell-labs.com/magic/man2html/2/regexp>, implementation at
 <http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/>,
 discussion and links to a Unix implementation at
 <http://swtch.com/~rsc/regexp/>).
 
 Simple API:
 � regcomp: Compile a regexp DFA;
 � regexec: Apply it to a string, returning a slice of the string that
 matches the first hit (or an array of slices if parenthesized
 expressions are used); and

s/string/input range/

Also returning a range instead of an array of slices is more flexible.

Andrei

Feb 17 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 I'm quite unhappy with the API of std.regexp. It's a chaotic design that
 provides a hodgepodge of functionality and tries to use as many synonyms
 of "to find" in the dictionary (e.g. search, match). I could swear
 Walter never really cared for using regexps, and that is felt throughout
 the design: it fills the bullet point but it's asinine to use.
 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's favorite
 pastime would be to inherit Regexp and override some random function in it.
 In the upcoming releases of D 2.0 there will be rather dramatic breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the
 time being.
 Andrei

As I've said before, anyone who can't stomach breaking changes w/o complaining
has
no business using D2 at this point.  I'd rather deal with the aggravation of
stuff
breaking in the sort run to have a nice language and libraries to go with it in
the long run.

This whole concept of ranges as you've created them seems to have achieved the
the
holy grail of both making simple things simple and complex things possible,
where
"complex things" includes needing code to be efficient, so I can see your reason
for wanting to redo all kinds of stuff in them.  This compares favorably to C++
STL iterators, which are very flexible and efficient but a huge PITA to use for
simple things because the syntax is so low-level and ugly, and to the D1/early
D2
way, which gives beautiful, simple notation for the more common cases (basic
dynamic arrays), at the expense of flexiblity when doing more complicated things
like streams, chaining, strides, etc.

Feb 17 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 I'm quite unhappy with the API of std.regexp. It's a chaotic design that
 provides a hodgepodge of functionality and tries to use as many synonyms
 of "to find" in the dictionary (e.g. search, match). I could swear
 Walter never really cared for using regexps, and that is felt throughout
 the design: it fills the bullet point but it's asinine to use.
 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's favorite
 pastime would be to inherit Regexp and override some random function in it.
 In the upcoming releases of D 2.0 there will be rather dramatic breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the
 time being.
 Andrei

 As I've said before, anyone who can't stomach breaking changes w/o complaining
has
 no business using D2 at this point.  I'd rather deal with the aggravation of
stuff
 breaking in the sort run to have a nice language and libraries to go with it in
 the long run.
 This whole concept of ranges as you've created them seems to have achieved the
the
 holy grail of both making simple things simple and complex things possible,
where
 "complex things" includes needing code to be efficient, so I can see your
reason
 for wanting to redo all kinds of stuff in them.  This compares favorably to C++
 STL iterators, which are very flexible and efficient but a huge PITA to use for
 simple things because the syntax is so low-level and ugly, and to the D1/early
D2
 way, which gives beautiful, simple notation for the more common cases (basic
 dynamic arrays), at the expense of flexiblity when doing more complicated
things
 like streams, chaining, strides, etc.

BTW, can you elaborate on how arrays, both builtin and any library versions,
will
work when everything is finalized?

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 I'm quite unhappy with the API of std.regexp. It's a chaotic design that
 provides a hodgepodge of functionality and tries to use as many synonyms
 of "to find" in the dictionary (e.g. search, match). I could swear
 Walter never really cared for using regexps, and that is felt throughout
 the design: it fills the bullet point but it's asinine to use.
 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's favorite
 pastime would be to inherit Regexp and override some random function in it.
 In the upcoming releases of D 2.0 there will be rather dramatic breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the
 time being.
 Andrei

 As I've said before, anyone who can't stomach breaking changes w/o complaining
has
 no business using D2 at this point.  I'd rather deal with the aggravation of
stuff
 breaking in the sort run to have a nice language and libraries to go with it in
 the long run.
 This whole concept of ranges as you've created them seems to have achieved the
the
 holy grail of both making simple things simple and complex things possible,
where
 "complex things" includes needing code to be efficient, so I can see your
reason
 for wanting to redo all kinds of stuff in them.  This compares favorably to C++
 STL iterators, which are very flexible and efficient but a huge PITA to use for
 simple things because the syntax is so low-level and ugly, and to the D1/early
D2
 way, which gives beautiful, simple notation for the more common cases (basic
 dynamic arrays), at the expense of flexiblity when doing more complicated
things
 like streams, chaining, strides, etc.

 
 BTW, can you elaborate on how arrays, both builtin and any library versions,
will
 work when everything is finalized?

Well finalizations hinges not only on me but on Walter (bugfixes and a 
couple of new features) and on all of you with the continuous stream of 
great suggestions and ideas. Again, without being able to experiment 
much I don't have a clear idea on how arrays/containers should at best 
look like. The interesting challenge is accommodating good, precise 
semantics with the freedom given by garbage collection. Here are some 
highlights:

* Today's T[] will be firmly an incarnation of the random-access range 
concept, to the extent that all code expecting a random-access range can 
always be passed a T[] without any impedance adaptation.

* $ will be generalized to mean "end of range" even for infinite ranges.

* We don't have a solution to address the perils of extending a slice by 
using ~=. We're considering adding the type T[new], but I'm not sure we 
should take the hit of a new built-in type constructor, particularly 
when it's implementable as a library.

* Fixed-size arrays will in all likelihood be value types. We couldn't 
find any other semantics that works.

* Containers will have value semantics.

* "Resources come and go; memory is forever" is the likely default in D 
resource management. This means that destroying e.g. an array of File 
objects will close the underlying files, but will not deallocate the 
memory allocated for them. In essence, destroying values means calling 
the destructor but not delete-ing them (unless of course they're on the 
stack). This approach has a number of disadvantages, but plenty of 
advantages that compensate them in most applications.

* std.matrix will define memory layouts for a variety of popular 
libraries and also the common means to iterate said layouts.

* For those who want containers with reference semantics, they can use 
the type Class!(T) for any value type T. That includes built-in value 
types (int, float...) and whichever value containers we define. It's 
unclear to me whether this is enough to satisfy those in need for 
complex container hierarchies.


Andrei

Feb 17 2009

Yigal Chripun <yigal100 gmail.com> writes:

Andrei Alexandrescu wrote:
 dsimcha wrote:
 == Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 article
 I'm quite unhappy with the API of std.regexp. It's a chaotic design
 that
 provides a hodgepodge of functionality and tries to use as many
 synonyms
 of "to find" in the dictionary (e.g. search, match). I could swear
 Walter never really cared for using regexps, and that is felt
 throughout
 the design: it fills the bullet point but it's asinine to use.
 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's
 favorite
 pastime would be to inherit Regexp and override some random function
 in it.
 In the upcoming releases of D 2.0 there will be rather dramatic
 breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the
 time being.
 Andrei

 As I've said before, anyone who can't stomach breaking changes w/o
 complaining has
 no business using D2 at this point. I'd rather deal with the
 aggravation of stuff
 breaking in the sort run to have a nice language and libraries to go
 with it in
 the long run.
 This whole concept of ranges as you've created them seems to have
 achieved the the
 holy grail of both making simple things simple and complex things
 possible, where
 "complex things" includes needing code to be efficient, so I can see
 your reason
 for wanting to redo all kinds of stuff in them. This compares
 favorably to C++
 STL iterators, which are very flexible and efficient but a huge PITA
 to use for
 simple things because the syntax is so low-level and ugly, and to the
 D1/early D2
 way, which gives beautiful, simple notation for the more common cases
 (basic
 dynamic arrays), at the expense of flexiblity when doing more
 complicated things
 like streams, chaining, strides, etc.

 BTW, can you elaborate on how arrays, both builtin and any library
 versions, will
 work when everything is finalized?

 Well finalizations hinges not only on me but on Walter (bugfixes and a
 couple of new features) and on all of you with the continuous stream of
 great suggestions and ideas. Again, without being able to experiment
 much I don't have a clear idea on how arrays/containers should at best
 look like. The interesting challenge is accommodating good, precise
 semantics with the freedom given by garbage collection. Here are some
 highlights:

 * Today's T[] will be firmly an incarnation of the random-access range
 concept, to the extent that all code expecting a random-access range can
 always be passed a T[] without any impedance adaptation.

 * $ will be generalized to mean "end of range" even for infinite ranges.

 * We don't have a solution to address the perils of extending a slice by
 using ~=. We're considering adding the type T[new], but I'm not sure we
 should take the hit of a new built-in type constructor, particularly
 when it's implementable as a library.

 * Fixed-size arrays will in all likelihood be value types. We couldn't
 find any other semantics that works.

 * Containers will have value semantics.

 * "Resources come and go; memory is forever" is the likely default in D
 resource management. This means that destroying e.g. an array of File
 objects will close the underlying files, but will not deallocate the
 memory allocated for them. In essence, destroying values means calling
 the destructor but not delete-ing them (unless of course they're on the
 stack). This approach has a number of disadvantages, but plenty of
 advantages that compensate them in most applications.

 * std.matrix will define memory layouts for a variety of popular
 libraries and also the common means to iterate said layouts.

 * For those who want containers with reference semantics, they can use
 the type Class!(T) for any value type T. That includes built-in value
 types (int, float...) and whichever value containers we define. It's
 unclear to me whether this is enough to satisfy those in need for
 complex container hierarchies.


 Andrei

I've got a few questions about the proposed container value semantics:

a) I'd like to be able to do for instance:
List lst = new LinkedList();
i.e use interfaces everywhere and especially in functions so that I can 
switch implementations easily when the need arises. In the above I can 
choose to use singly or doubly linked list without making changes 
throughout the code by using the List interface. Will this be possible 
and how? is D going to get proper struct interfaces?

b) it is sometimes useful to have a container!(Base) store references to 
instances of derived classes, a caconical example of this is a container 
of Widget class in a UI framework, where you can, for instance iterate 
over the container and paint all the different kinds of widgets on the 
screen by calling the virtual paint method of the base class. How can 
this be implemented with your proposed Class template?

-- Yigal

Feb 18 2009

Yigal Chripun <yigal100 gmail.com> writes:

Andrei Alexandrescu wrote:
 dsimcha wrote:
 == Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 article
 I'm quite unhappy with the API of std.regexp. It's a chaotic design
 that
 provides a hodgepodge of functionality and tries to use as many
 synonyms
 of "to find" in the dictionary (e.g. search, match). I could swear
 Walter never really cared for using regexps, and that is felt
 throughout
 the design: it fills the bullet point but it's asinine to use.
 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's
 favorite
 pastime would be to inherit Regexp and override some random function
 in it.
 In the upcoming releases of D 2.0 there will be rather dramatic
 breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the
 time being.
 Andrei

 As I've said before, anyone who can't stomach breaking changes w/o
 complaining has
 no business using D2 at this point. I'd rather deal with the
 aggravation of stuff
 breaking in the sort run to have a nice language and libraries to go
 with it in
 the long run.
 This whole concept of ranges as you've created them seems to have
 achieved the the
 holy grail of both making simple things simple and complex things
 possible, where
 "complex things" includes needing code to be efficient, so I can see
 your reason
 for wanting to redo all kinds of stuff in them. This compares
 favorably to C++
 STL iterators, which are very flexible and efficient but a huge PITA
 to use for
 simple things because the syntax is so low-level and ugly, and to the
 D1/early D2
 way, which gives beautiful, simple notation for the more common cases
 (basic
 dynamic arrays), at the expense of flexiblity when doing more
 complicated things
 like streams, chaining, strides, etc.

 BTW, can you elaborate on how arrays, both builtin and any library
 versions, will
 work when everything is finalized?

 Well finalizations hinges not only on me but on Walter (bugfixes and a
 couple of new features) and on all of you with the continuous stream of
 great suggestions and ideas. Again, without being able to experiment
 much I don't have a clear idea on how arrays/containers should at best
 look like. The interesting challenge is accommodating good, precise
 semantics with the freedom given by garbage collection. Here are some
 highlights:

 * Today's T[] will be firmly an incarnation of the random-access range
 concept, to the extent that all code expecting a random-access range can
 always be passed a T[] without any impedance adaptation.

 * $ will be generalized to mean "end of range" even for infinite ranges.

 * We don't have a solution to address the perils of extending a slice by
 using ~=. We're considering adding the type T[new], but I'm not sure we
 should take the hit of a new built-in type constructor, particularly
 when it's implementable as a library.

 * Fixed-size arrays will in all likelihood be value types. We couldn't
 find any other semantics that works.

 * Containers will have value semantics.

 * "Resources come and go; memory is forever" is the likely default in D
 resource management. This means that destroying e.g. an array of File
 objects will close the underlying files, but will not deallocate the
 memory allocated for them. In essence, destroying values means calling
 the destructor but not delete-ing them (unless of course they're on the
 stack). This approach has a number of disadvantages, but plenty of
 advantages that compensate them in most applications.

 * std.matrix will define memory layouts for a variety of popular
 libraries and also the common means to iterate said layouts.

 * For those who want containers with reference semantics, they can use
 the type Class!(T) for any value type T. That includes built-in value
 types (int, float...) and whichever value containers we define. It's
 unclear to me whether this is enough to satisfy those in need for
 complex container hierarchies.


 Andrei

Another question regarding the container design - have you considered 
mutable containers vs. functional style imutable containers? does it 
make sense to provide both options?

Feb 18 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 * "Resources come and go; memory is forever" is the likely default in D 
 resource management. This means that destroying e.g. an array of File 
 objects will close the underlying files, but will not deallocate the 
 memory allocated for them. In essence, destroying values means calling 
 the destructor but not delete-ing them (unless of course they're on the 
 stack). This approach has a number of disadvantages, but plenty of 
 advantages that compensate them in most applications.

I admit I'm tired right now... You mention disadvantages, the one I 
can't avoid thinking of is memory leak! Which means you can't write e.g. 
a simple web server that opens and closes files, instead of creating and 
managing a file object pool? Eventually it'll run out of memory, unless 
I'm way too tired now...

 * std.matrix will define memory layouts for a variety of popular 
 libraries and also the common means to iterate said layouts.

I assume this is for handy and practical rectangular (and cubic, etc.) 
"arrays". Which would be most welcome.


This "memory is forever" philosophy, is this discussed in depth 
somewhere? (With the current amount of traffic here, I simply can't 
follow every thread anymore. :-( )

Feb 20 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 * "Resources come and go; memory is forever" is the likely default in 
 D resource management. This means that destroying e.g. an array of 
 File objects will close the underlying files, but will not deallocate 
 the memory allocated for them. In essence, destroying values means 
 calling the destructor but not delete-ing them (unless of course 
 they're on the stack). This approach has a number of disadvantages, 
 but plenty of advantages that compensate them in most applications.

 
 I admit I'm tired right now... You mention disadvantages, the one I 
 can't avoid thinking of is memory leak! Which means you can't write e.g. 
 a simple web server that opens and closes files, instead of creating and 
 managing a file object pool? Eventually it'll run out of memory, unless 
 I'm way too tired now...

Better said, I was too tired when I posted that. I gave too little 
detail. Files are resources, so they will "come and go", i.e. will be 
under deterministic control; there's no need to worry. Only memory will 
have a "lives forever" regime for safety reasons. It's not really 
forever as the GC collects it. In short, my proposed system is to admit 
that GC is good _only_ for memory, and that deterministic management 
must prevail for other resources. I'll get back later on this.

 * std.matrix will define memory layouts for a variety of popular 
 libraries and also the common means to iterate said layouts.

 
 I assume this is for handy and practical rectangular (and cubic, etc.) 
 "arrays". Which would be most welcome.
 
 
 This "memory is forever" philosophy, is this discussed in depth 
 somewhere? (With the current amount of traffic here, I simply can't 
 follow every thread anymore. :-( )

I decided to curb my posting as well. Beyond a point even passable 
content becomes just white noise. Also since we don't have an off-topic 
group, off-topic discussions tend to carry on here as well and are not 
trivial to ignore. I'm happy they are civilized (congrats to all involved).


Andrei

Feb 20 2009

Bill Baxter <wbaxter gmail.com> writes:

On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Besides std.regexp only works with (narrow) strings and we want it to work
 on streams of all widths and structures. One pet complaint I have is that
 std.regexp puts a class around it all as if everybody's favorite pastime
 would be to inherit Regexp and override some random function in it.

So what do you think it should be, a struct?
That would imply to me that everybody's favorite pastime is making
value copies of regex structures, when in fact nobody does that.

Regex is a class in order to give it reference semantics and provide
encapsulation of some re-usable state.  Maybe it should be a final
class, but my impression is "final class" doesn't really work in D.

--bb

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Besides std.regexp only works with (narrow) strings and we want it to work
 on streams of all widths and structures. One pet complaint I have is that
 std.regexp puts a class around it all as if everybody's favorite pastime
 would be to inherit Regexp and override some random function in it.

 
 So what do you think it should be, a struct?

Yes.

 That would imply to me that everybody's favorite pastime is making
 value copies of regex structures, when in fact nobody does that.

Well you'd be surprised. The RegEx class saves the state of the last 
search, which is a sensible thing to do. But then consider a simple 
range Splitter that, when iterated, nicely gives you...

string a = ",a,  bcd, def,gh,";
foreach (e; splitter(a, pattern(", *"))
     writeln("[", e, "]");

writes

[]
[a]
[bcd]
[def]
[gh]

This is similar to the function std.regex.split with the notable 
difference that no extra memory is allocated. Now Splitter is an input 
range. This means you wouldn't expect that you copy a Splitter and then 
have iterating the original value affect the copy. Well, that's exactly 
what happens when you use the "good" reference semantics of the RegEx 
stored inside splitter. Worse, RegExp has no cloning primitive, so I 
need to resort to storing the pattern and recompiling it from scratch at 
every copy of Splitter. So essentially the "good" semantics of RegEx are 
useless when it comes to composing it in larger objects.

 Regex is a class in order to give it reference semantics and provide
 encapsulation of some re-usable state.  Maybe it should be a final
 class, but my impression is "final class" doesn't really work in D.

Re-usable state is provided by structs too. In addition they can choose 
value vs. reference semantics with ease.


Andrei

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 string a = ",a,  bcd, def,gh,";
 foreach (e; splitter(a, pattern(", *"))
      writeln("[", e, "]");

(I often use xplit() that is like split but yields items lazily, for larger
strings it's much faster).

A better approach is to fuse the xsplit and such xsplitter function in a single
lazy generator that can take as a second argument a string or char or RE
pattern.
A 3rd optional argument can be the max number of splits (so after such max it
yields all the rest of the string).

You can then add an eager splitter function with the same signature, that
outputs an array.

Bye,
bearophile

Feb 17 2009

Bill Baxter <wbaxter gmail.com> writes:

On Wed, Feb 18, 2009 at 6:56 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Besides std.regexp only works with (narrow) strings and we want it to
 work
 on streams of all widths and structures. One pet complaint I have is that
 std.regexp puts a class around it all as if everybody's favorite pastime
 would be to inherit Regexp and override some random function in it.

 So what do you think it should be, a struct?

 Yes.

 That would imply to me that everybody's favorite pastime is making
 value copies of regex structures, when in fact nobody does that.

 Well you'd be surprised. The RegEx class saves the state of the last search,
 which is a sensible thing to do. But then consider a simple range Splitter
 that, when iterated, nicely gives you...

 string a = ",a,  bcd, def,gh,";
 foreach (e; splitter(a, pattern(", *"))
    writeln("[", e, "]");

 writes

 []
 [a]
 [bcd]
 [def]
 [gh]

 This is similar to the function std.regex.split with the notable difference
 that no extra memory is allocated. Now Splitter is an input range. This
 means you wouldn't expect that you copy a Splitter and then have iterating
 the original value affect the copy. Well, that's exactly what happens when
 you use the "good" reference semantics of the RegEx stored inside splitter.
 Worse, RegExp has no cloning primitive, so I need to resort to storing the
 pattern and recompiling it from scratch at every copy of Splitter. So
 essentially the "good" semantics of RegEx are useless when it comes to
 composing it in larger objects.

So that sounds to me like RegEx should have a .dup, and then it would
be fine, no?  I agree it should have a dup for the odd occasion when
you do want to make a copy for some reason.

 Regex is a class in order to give it reference semantics and provide
 encapsulation of some re-usable state.  Maybe it should be a final
 class, but my impression is "final class" doesn't really work in D.


 Re-usable state is provided by structs too. In addition they can choose
 value vs. reference semantics with ease.

I think this choice is not so much available with D1, plus the
constructor situation with D1 is less than ideal.  Given that, I think
the choice of class for RegEx was apropriate.   But if the struct
problems are all going away in D2, then that's great.  Sounds like
you're saying we'll really be able to use D structs just like one uses
a non-polymorphic C++ class.  If so, then that's super.

--bb

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 I think this choice is not so much available with D1, plus the
 constructor situation with D1 is less than ideal.  Given that, I think
 the choice of class for RegEx was apropriate.   But if the struct
 problems are all going away in D2, then that's great.  Sounds like
 you're saying we'll really be able to use D structs just like one uses
 a non-polymorphic C++ class.  If so, then that's super.

I lost that perspective when criticizing RegExp, you're right. But still 
the API is lousy - every single time I am using a RegExp, I find myself 
fumbling through the thoroughly overlapping primitives in the 
documentation, and never seem to find an idiom that's simple, 
comfortable, and memorable.

Andrei

Feb 17 2009

Bill Baxter <wbaxter gmail.com> writes:

On Wed, Feb 18, 2009 at 7:44 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 I think this choice is not so much available with D1, plus the
 constructor situation with D1 is less than ideal.  Given that, I think
 the choice of class for RegEx was apropriate.   But if the struct
 problems are all going away in D2, then that's great.  Sounds like
 you're saying we'll really be able to use D structs just like one uses
 a non-polymorphic C++ class.  If so, then that's super.

 I lost that perspective when criticizing RegExp, you're right. But still the
 API is lousy - every single time I am using a RegExp, I find myself fumbling
 through the thoroughly overlapping primitives in the documentation, and
 never seem to find an idiom that's simple, comfortable, and memorable.

Ok.  I'm certainly not in love with the API either.  Though, the only
RegEx API I've ever used that felt totally comfortable with was
Perl's, which in large part is syntax instead of an API.  Python's
syntax I have to look over the documentation every time I use it, too.
 Maybe it's because of the "matching" vs "searching" distinction that
I find impossible to remember.
(http://docs.python.org/library/re.html)

--bb

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
Python's syntax I have to look over the documentation every time I use it, too.
Maybe it's because of the "matching" vs "searching" distinction that I find
impossible to remember.<

I agree, I too need the Python docs every time I want to use something more
than the basics. The syntax for group catching too is bad (groups? group?
itersomething? etc). I have proposed an improvement (using [5] to grab the 5th
group() but it was not implemented. Such syntax is possible in D too *hint*). 
It's because of situations like this that I say that designing a good API for
std.re isn't easy at all. It will require care, brain, and maybe two or more
tries :-)

Bye,
bearophile

Feb 17 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter gmail.com> wrote:
 Ok.  I'm certainly not in love with the API either.  Though, the only
 RegEx API I've ever used that felt totally comfortable with was
 Perl's, which in large part is syntax instead of an API.  Python's
 syntax I have to look over the documentation every time I use it, too.
  Maybe it's because of the "matching" vs "searching" distinction that
 I find impossible to remember.
 (http://docs.python.org/library/re.html)

Is there ever a situation where you want to use a single regexp for
both matching _and_ searching?  And if not, couldn't you just use ^ to
anchor it?  I never understood why Python's API makes such a
distinction.

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jarrett Billingsley wrote:
 On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter gmail.com> wrote:
 Ok.  I'm certainly not in love with the API either.  Though, the only
 RegEx API I've ever used that felt totally comfortable with was
 Perl's, which in large part is syntax instead of an API.  Python's
 syntax I have to look over the documentation every time I use it, too.
  Maybe it's because of the "matching" vs "searching" distinction that
 I find impossible to remember.
 (http://docs.python.org/library/re.html)

 
 Is there ever a situation where you want to use a single regexp for
 both matching _and_ searching?  And if not, couldn't you just use ^ to
 anchor it?  I never understood why Python's API makes such a
 distinction.

Ehm, that's odd. You'd think that after Perl has set the precedent, it 
would be hard to do major goofs in designing a regex API.

By the way, the more I dig into std.regexp, the stiffer the hair on my 
neck gets. Get this: the API offers both global functions and member 
functions, with both RegExp and plain string arguments. The latter are 
carefully designed to maximize the number of clashes, potential 
confusions, and errors when using both std.string and std.regex.

But wait, there's more. The API defines the following functions that all 
ostensibly do some sort of mattern patching (sic): find, search, test, 
match, and exec. I wish I were kidding. There's some opIndex and 
opEquals thrown in for good measure. Knuth wouldn't know what each of 
them does after studying them for a week and then watching an episode 
from "The Bachelor". And get this: global search() does not do what 
member search() does. Nope. Global search() does what member test() 
does. I have only contempt for such designs.


Andrei

Feb 17 2009

Bill Baxter <wbaxter gmail.com> writes:

On Wed, Feb 18, 2009 at 11:38 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Jarrett Billingsley wrote:
 On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter gmail.com> wrote:
 Ok.  I'm certainly not in love with the API either.  Though, the only
 RegEx API I've ever used that felt totally comfortable with was
 Perl's, which in large part is syntax instead of an API.  Python's
 syntax I have to look over the documentation every time I use it, too.
  Maybe it's because of the "matching" vs "searching" distinction that
 I find impossible to remember.
 (http://docs.python.org/library/re.html)

 Is there ever a situation where you want to use a single regexp for
 both matching _and_ searching?  And if not, couldn't you just use ^ to
 anchor it?  I never understood why Python's API makes such a
 distinction.

 Ehm, that's odd. You'd think that after Perl has set the precedent, it would
 be hard to do major goofs in designing a regex API.

 By the way, the more I dig into std.regexp, the stiffer the hair on my neck
 gets. Get this: the API offers both global functions and member functions,
 with both RegExp and plain string arguments. The latter are carefully
 designed to maximize the number of clashes, potential confusions, and errors
 when using both std.string and std.regex.

All I know is that I found one incantation that works and I've been
copy-pasting that every since. :-)

 But wait, there's more. The API defines the following functions that all
 ostensibly do some sort of mattern patching (sic): find, search, test,
 match, and exec. I wish I were kidding. There's some opIndex and opEquals
 thrown in for good measure. Knuth wouldn't know what each of them does after
 studying them for a week and then watching an episode from "The Bachelor".
 And get this: global search() does not do what member search() does. Nope.
 Global search() does what member test() does. I have only contempt for such
 designs.

Maybe "design" is too strong a word.  Most Phobos modules seem to have
been put together rather hastily in order to fill a pressing need.
Often *something* is better than nothing at all, even if the something
is not so great.

--bb

Feb 17 2009

Walter Bright <newshound1 digitalmars.com> writes:

Bill Baxter wrote:
 Maybe "design" is too strong a word.  Most Phobos modules seem to have
 been put together rather hastily in order to fill a pressing need.
 Often *something* is better than nothing at all, even if the something
 is not so great.

std.regexp evolved out of the ECMAscript regex functions - they have the 
same names and functionality. Layered on top of that was ruby-like names 
and functionality. It's a good (bad?) example of an api evolving without 
sacrificing backwards compatibility.

Feb 20 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 Maybe "design" is too strong a word.  Most Phobos modules seem to have
 been put together rather hastily in order to fill a pressing need.
 Often *something* is better than nothing at all, even if the something
 is not so great.

 
 std.regexp evolved out of the ECMAscript regex functions - they have the 
 same names and functionality. Layered on top of that was ruby-like names 
 and functionality. It's a good (bad?) example of an api evolving without 
 sacrificing backwards compatibility.

s/good \(bad\?\)/REALLY BAD/


Andrei

Feb 20 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 20 Feb 2009 16:35:54 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter Bright wrote:
 Bill Baxter wrote:
 Maybe "design" is too strong a word.  Most Phobos modules seem to have
 been put together rather hastily in order to fill a pressing need.
 Often *something* is better than nothing at all, even if the something
 is not so great.

  std.regexp evolved out of the ECMAscript regex functions - they have  
 the same names and functionality. Layered on top of that was ruby-like  
 names and functionality. It's a good (bad?) example of an api evolving  
 without sacrificing backwards compatibility.

 s/good \(bad\?\)/REALLY BAD/


 Andrei

Backward compatibility is almost always a bad thing.
Look what's happened to C++ and OpenGL.

Feb 20 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Fri, 20 Feb 2009 16:35:54 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Walter Bright wrote:
 Bill Baxter wrote:
 Maybe "design" is too strong a word.  Most Phobos modules seem to have
 been put together rather hastily in order to fill a pressing need.
 Often *something* is better than nothing at all, even if the something
 is not so great.

  std.regexp evolved out of the ECMAscript regex functions - they have 
 the same names and functionality. Layered on top of that was 
 ruby-like names and functionality. It's a good (bad?) example of an 
 api evolving without sacrificing backwards compatibility.

 s/good \(bad\?\)/REALLY BAD/


 Andrei

 
 Backward compatibility is almost always a bad thing.
 Look what's happened to C++ and OpenGL.

In this case it's even worse, as I don't think anyone expects to paste 
their Ruby code and compile it with dmd.

Andrei

Feb 20 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Feb 17, 2009 at 9:38 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 By the way, the more I dig into std.regexp, the stiffer the hair on my neck
 gets. Get this: the API offers both global functions and member functions,
 with both RegExp and plain string arguments. The latter are carefully
 designed to maximize the number of clashes, potential confusions, and errors
 when using both std.string and std.regex.

 But wait, there's more. The API defines the following functions that all
 ostensibly do some sort of mattern patching (sic): find, search, test,
 match, and exec. I wish I were kidding. There's some opIndex and opEquals
 thrown in for good measure. Knuth wouldn't know what each of them does after
 studying them for a week and then watching an episode from "The Bachelor".
 And get this: global search() does not do what member search() does. Nope.
 Global search() does what member test() does. I have only contempt for such
 designs.

Well I don't mean to, uh, toot my own horn but.. I recently bound
libpcre to MiniD and came up with a relatively simple but powerful and
orthogonal API.

http://www.dsource.org/projects/minid/wiki/Addons/PcreLib#LibraryReference

The regex object has a single "subject" string at a time, the string
that it's matching against. The subject is set with "search" and
"test" does everything.  All other functions are basically defined in
terms of those two.  "test" looks for the next match of the regex in
the subject and returns true if it matched.  "match" returns match
groups (0 for the whole regex and 1..n for subgroups, as well as
string indices for named subgroups).   opApply is just a quicker way
of writing something like:

re.search(someSubject)

while(re.test())
    // use re.match to get matches

You'll notice that opApply is also just defined in terms of test.

I've found it far more intuitive than other APIs.  I've never used
Perl and I doubt I ever will, though.

Feb 17 2009

Bill Baxter <wbaxter gmail.com> writes:

On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I'm quite unhappy with the API of std.regexp. It's a chaotic design that
 provides a hodgepodge of functionality and tries to use as many synonyms of
 "to find" in the dictionary (e.g. search, match). I could swear Walter never
 really cared for using regexps, and that is felt throughout the design: it
 fills the bullet point but it's asinine to use.

 Besides std.regexp only works with (narrow) strings and we want it to work
 on streams of all widths and structures. One pet complaint I have is that
 std.regexp puts a class around it all as if everybody's favorite pastime
 would be to inherit Regexp and override some random function in it.

 In the upcoming releases of D 2.0 there will be rather dramatic breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the time
 being.

Btw, I've got no problems with you breaking the API of 2.0 either.
Though you might consider moving the current implementation to
std.deprecated.regex and leaving it there for a year with a
pragma(msg, "This module is deprecated").

That way making a quick fix to broken code is just a matter of
inserting ".deprecated" into your import statements.

--bb

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I'm quite unhappy with the API of std.regexp. It's a chaotic design that
 provides a hodgepodge of functionality and tries to use as many synonyms of
 "to find" in the dictionary (e.g. search, match). I could swear Walter never
 really cared for using regexps, and that is felt throughout the design: it
 fills the bullet point but it's asinine to use.

 Besides std.regexp only works with (narrow) strings and we want it to work
 on streams of all widths and structures. One pet complaint I have is that
 std.regexp puts a class around it all as if everybody's favorite pastime
 would be to inherit Regexp and override some random function in it.

 In the upcoming releases of D 2.0 there will be rather dramatic breaking
 changes of phobos. I just wanted to ask whether y'all could stomach yet
 another rewritten API or you'd rather use std.regexp as it is for the time
 being.

 
 Btw, I've got no problems with you breaking the API of 2.0 either.
 Though you might consider moving the current implementation to
 std.deprecated.regex and leaving it there for a year with a
 pragma(msg, "This module is deprecated").
 
 That way making a quick fix to broken code is just a matter of
 inserting ".deprecated" into your import statements.


I was thinking of moving older stuff to etc, is that ok?

Andrei

Feb 17 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 I was thinking of moving older stuff to etc, is that ok?

Yes. But you should also rename the new one, perhaps to std.regex. That 
way, legacy code will refuse to compile, rather than compile wrongly.

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 I was thinking of moving older stuff to etc, is that ok?

 
 Yes. But you should also rename the new one, perhaps to std.regex. That 
 way, legacy code will refuse to compile, rather than compile wrongly.

Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, 
particularly if you're a foreigner. "Regex" sounds like a frog utterance 
by a forest lake, "regexp" sounds like nothing in particular.

Andrei

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, 
 particularly if you're a foreigner. "Regex" sounds like a frog utterance 
 by a forest lake, "regexp" sounds like nothing in particular.

I'd like std.re :-)

Bye,
bearophile

Feb 17 2009

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 I was thinking of moving older stuff to etc, is that ok?

 Yes. But you should also rename the new one, perhaps to std.regex. 
 That way, legacy code will refuse to compile, rather than compile 
 wrongly.

 
 Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, 
 particularly if you're a foreigner. "Regex" sounds like a frog utterance 
 by a forest lake, "regexp" sounds like nothing in particular.
 
 Andrei

It sounds to me like a frog who, immediately post-utterance, just got 
gigged.  I guess that makes "regex" sound even better... as its still 
alive (sounding).

-- Chris Nicholson-Sauls
-- Who so far agrees with pretty much everything you've said, and 
therefore has no real contribution...

Feb 17 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:
Btw, I've got no problems with you breaking the API of 2.0 either.
Though you might consider moving the current implementation to
std.deprecated.regex and leaving it there for a year with a
pragma(msg, "This module is deprecated").
That way making a quick fix to broken code is just a matter of
inserting ".deprecated" into your import statements.

 
 
 I was thinking of moving older stuff to etc, is that ok?

What's the rationale for "etc"? Why not "deprecated", o something shorter
like "old", or "d1" (this last one could be good for future deprecated
libraries, like when D3 is available there probably be a "d2" too).

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Hey you, don't tell me there's no hope at all
Together we stand, divided we fall.

Feb 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:
 Btw, I've got no problems with you breaking the API of 2.0 either.
 Though you might consider moving the current implementation to
 std.deprecated.regex and leaving it there for a year with a
 pragma(msg, "This module is deprecated").
 That way making a quick fix to broken code is just a matter of
 inserting ".deprecated" into your import statements.

 I was thinking of moving older stuff to etc, is that ok?

 
 What's the rationale for "etc"? Why not "deprecated", o something shorter
 like "old", or "d1" (this last one could be good for future deprecated
 libraries, like when D3 is available there probably be a "d2" too).
 

In the words of George Costanza: "Because it's there!"

Andrei

Feb 19 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

Andrei Alexandrescu wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:
 Btw, I've got no problems with you breaking the API of 2.0 either.
 Though you might consider moving the current implementation to
 std.deprecated.regex and leaving it there for a year with a
 pragma(msg, "This module is deprecated").
 That way making a quick fix to broken code is just a matter of
 inserting ".deprecated" into your import statements.

 I was thinking of moving older stuff to etc, is that ok?

 What's the rationale for "etc"? Why not "deprecated", o something shorter
 like "old", or "d1" (this last one could be good for future deprecated
 libraries, like when D3 is available there probably be a "d2" too).

 
 In the words of George Costanza: "Because it's there!"
 
 Andrei

Shouldn't that be George Mallory?

Feb 19 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Ellery Newcomer wrote:
 Andrei Alexandrescu wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:
 Btw, I've got no problems with you breaking the API of 2.0 either.
 Though you might consider moving the current implementation to
 std.deprecated.regex and leaving it there for a year with a
 pragma(msg, "This module is deprecated").
 That way making a quick fix to broken code is just a matter of
 inserting ".deprecated" into your import statements.

 I was thinking of moving older stuff to etc, is that ok?

 What's the rationale for "etc"? Why not "deprecated", o something 
 shorter
 like "old", or "d1" (this last one could be good for future deprecated
 libraries, like when D3 is available there probably be a "d2" too).

 In the words of George Costanza: "Because it's there!"

 Andrei

 
 Shouldn't that be George Mallory?

No, he said "because it is there". George said "because it's there":

http://www.classictvquotes.com/quotes/characters/george-costanza/page_14.html

George: So, she fell, and then she started screaming, "My back! My 
back!" So, I picked her up and took her to the hospital.
Elaine: How is she?
George: She's in traction.
Elaine: Okay, I'm sorry.
George: It's not funny, Elaine.
Elaine: I know. I'm sorry. I'm serious.
George: Her back went out. She's gotta be there for a couple of days. 
All she said on the way over in the car was, "Why, George, why?!" I 
said, "Because it's there!"


Andrei

Feb 19 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 That way making a quick fix to broken code is just a matter of
 inserting ".deprecated" into your import statements.

 I was thinking of moving older stuff to etc, is that ok?

 What's the rationale for "etc"? Why not "deprecated"

 
 In the words of George Costanza: "Because it's there!"

With the critique you've given to the existing regexp stuff, deprecated 
would be the obvious choice.

Then we could have etc for Miscellaneous Stuff.

Feb 20 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sat, Feb 21, 2009 at 2:38 AM, Georg Wrede <georg.wrede iki.fi> wrote:
 Andrei Alexandrescu wrote:
 That way making a quick fix to broken code is just a matter of
 inserting ".deprecated" into your import statements.

 I was thinking of moving older stuff to etc, is that ok?

 What's the rationale for "etc"? Why not "deprecated"

 In the words of George Costanza: "Because it's there!"

 With the critique you've given to the existing regexp stuff, deprecated
 would be the obvious choice.

 Then we could have etc for Miscellaneous Stuff.

Agreed.
etc implies to me that it's stuff that might be useful sometimes but
not very often.
It does not suggest to me that you shouldn't use it if you can avoid it.

Or how about make it   std.etc.deprecated.regexp
That way it's clear that it's *both* something that might be useful
occasionally but something that you should avoid if possible.

... Deprecated is a keyword though, isn't it.  Dang.  :-P

--bb

Feb 20 2009

Leandro Lucarella <llucax gmail.com> writes:

Georg Wrede, el 20 de febrero a las 19:38 me escribiste:
 Andrei Alexandrescu wrote:
That way making a quick fix to broken code is just a matter of
inserting ".deprecated" into your import statements.

I was thinking of moving older stuff to etc, is that ok?

What's the rationale for "etc"? Why not "deprecated"

In the words of George Costanza: "Because it's there!"

 
 With the critique you've given to the existing regexp stuff, deprecated would
be the obvious choice.
 
 Then we could have etc for Miscellaneous Stuff.

Why not "misc" for that? =)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
TIGRE SE COMIO A EMPLEADO DE CIRCO: DETUVIERON A DUEÑO Y DOMADOR
	-- Crónica TV

Feb 20 2009

BCS <ao pathlink.com> writes:

Reply to Andrei,

 I'm quite unhappy with the API of std.regexp. It's a chaotic design
 that provides a hodgepodge of functionality and tries to use as many
 synonyms of "to find" in the dictionary (e.g. search, match). I could
 swear Walter never really cared for using regexps, and that is felt
 throughout the design: it fills the bullet point but it's asinine to
 use.
 
 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's
 favorite pastime would be to inherit Regexp and override some random
 function in it.
 
 In the upcoming releases of D 2.0 there will be rather dramatic
 breaking changes of phobos. I just wanted to ask whether y'all could
 stomach yet another rewritten API or you'd rather use std.regexp as it
 is for the time being.
 
 Andrei
 

For what it's worth, I have a partial clone of the .NET API built on top 
of PCRE. I would have to ask my boss but I expect I could donate it if anyone 
want to use it as a basis.

Feb 17 2009

Daniel de Kok <me danieldk.org> writes:

On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao pathlink.com> wrote:
 For what it's worth, I have a partial clone of the .NET API built on top of
 PCRE. I would have to ask my boss but I expect I could donate it if anyone
 want to use it as a basis.

Actually, I was wondering why nobody is considering real regular
languages anymore, that can be compiled to a normal finite state
recognizer or transducer. While this may not be as fancy as Perl-like
extensions, they are much faster, and it's easier to do fun stuff such
as composition.

Take care,
Daniel

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Daniel de Kok wrote:
 On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao pathlink.com> wrote:
 For what it's worth, I have a partial clone of the .NET API built on top of
 PCRE. I would have to ask my boss but I expect I could donate it if anyone
 want to use it as a basis.

 
 Actually, I was wondering why nobody is considering real regular
 languages anymore, that can be compiled to a normal finite state
 recognizer or transducer. While this may not be as fancy as Perl-like
 extensions, they are much faster, and it's easier to do fun stuff such
 as composition.

I am considering that. One nice feature of "classic" regexes is that 
they never backtrack, so they work with pure input iterators. This has 
crucial consequences with regard to where and how regexes fit the range 
concept hierarchy.


Andrei

Feb 17 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Feb 17, 2009 at 2:47 PM, Daniel de Kok <me danieldk.org> wrote:
 On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao pathlink.com> wrote:
 For what it's worth, I have a partial clone of the .NET API built on top of
 PCRE. I would have to ask my boss but I expect I could donate it if anyone
 want to use it as a basis.

 Actually, I was wondering why nobody is considering real regular
 languages anymore, that can be compiled to a normal finite state
 recognizer or transducer. While this may not be as fancy as Perl-like
 extensions, they are much faster, and it's easier to do fun stuff such
 as composition.

Tango's regex engine is just that.  It uses a tagged NFA method.
http://www.dsource.org/projects/tango/docs/current/tango.text.Regex.html

The problem with this method is that while it's certainly faster to
match, it's MUCH slower to compile.  There are no pathological
matches; only pathological compiles ;)  I'm talking 60-70 seconds to
compile a more complex regex.  This might be an acceptable tradeoff
for when you need to compile a regex in a long-running app like a
server, but it's completely unacceptable for most small, Perl-like
text munging programs.

Unless of course this slowdown is unique to Tango's implementation of
this method!

Feb 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Jarrett Billingsley:
I'm talking 60-70 seconds to compile a more complex regex.<

A modern CPU is able to do something like 60*2*2E9 operations in that time, DMD
needs 6 seconds or less to compile about 60000-80000 lines of my D code, so I
think it's a bit too much time (probably 100 or 1000 times too much).

Bye,
bearophile

Feb 17 2009

BCS <ao pathlink.com> writes:

Reply to Jarrett,

 On Tue, Feb 17, 2009 at 2:47 PM, Daniel de Kok <me danieldk.org>
 wrote:
 
 Actually, I was wondering why nobody is considering real regular
 languages anymore, that can be compiled to a normal finite state
 recognizer or transducer. While this may not be as fancy as Perl-like
 extensions, they are much faster, and it's easier to do fun stuff
 such as composition.
 

 Tango's regex engine is just that.  It uses a tagged NFA method.
 http://www.dsource.org/projects/tango/docs/current/tango.text.Regex.ht
 ml
 
 The problem with this method is that while it's certainly faster to
 match, it's MUCH slower to compile.  There are no pathological
 matches; only pathological compiles ;)  I'm talking 60-70 seconds to
 compile a more complex regex.

could this be transitioned to CTFE? you could even have a debug mode that 
delays till runtime

RegEx mather = new CTFERegEx!("some regex");


class CTFERegEx(char[] regex) : RegEx
{
       debug(NoCTFE)  static char[] done;
       else     static const char[] done = CTFECompile(regex);

       public this()
       {
          debug(NoCTFE) if(done == null) done = CTFECompile(regex);

          base(done)
       }
}

Feb 17 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao pathlink.com> wrote:
 could this be transitioned to CTFE? you could even have a debug mode that
 delays till runtime

 RegEx mather = new CTFERegEx!("some regex");


 class CTFERegEx(char[] regex) : RegEx
 {
      debug(NoCTFE)  static char[] done;
      else     static const char[] done = CTFECompile(regex);

      public this()
      {
         debug(NoCTFE) if(done == null) done = CTFECompile(regex);

         base(done)
      }
 }

For what it's worth the Tango regexes actually have a method to output
a D function that will implement the regex after it's compiled.  So
you _could_ precompile the regex into D code and use that.

But seriously, man - if something takes 60 seconds to complete at
_runtime_, making it CTFE would simply make your computer explode.

Feb 17 2009

BCS <ao pathlink.com> writes:

Reply to Jarrett,

 On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao pathlink.com> wrote:
 
 could this be transitioned to CTFE? you could even have a debug mode
 that delays till runtime
 
 RegEx mather = new CTFERegEx!("some regex");
 
 class CTFERegEx(char[] regex) : RegEx
 {
 debug(NoCTFE)  static char[] done;
 else     static const char[] done = CTFECompile(regex);
 public this()
 {
 debug(NoCTFE) if(done == null) done = CTFECompile(regex);
 base(done)
 }
 }

 For what it's worth the Tango regexes actually have a method to output
 a D function that will implement the regex after it's compiled.  So
 you _could_ precompile the regex into D code and use that.
 
 But seriously, man - if something takes 60 seconds to complete at
 _runtime_, making it CTFE would simply make your computer explode.
 

For any kind of debug, yeah, that's a problem. OTOH for release, as long 
as it /does/ compile, who cares? How many real release builds does anyone 
do a week?

Feb 17 2009

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Jarrett Billingsley wrote:
 On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao pathlink.com> wrote:
 could this be transitioned to CTFE? you could even have a debug mode that
 delays till runtime

 RegEx mather = new CTFERegEx!("some regex");


 class CTFERegEx(char[] regex) : RegEx
 {
      debug(NoCTFE)  static char[] done;
      else     static const char[] done = CTFECompile(regex);

      public this()
      {
         debug(NoCTFE) if(done == null) done = CTFECompile(regex);

         base(done)
      }
 }

 
 For what it's worth the Tango regexes actually have a method to output
 a D function that will implement the regex after it's compiled.  So
 you _could_ precompile the regex into D code and use that.

I feature which I *adore* by the way.  So long as the precompiled regex 
is "guaranteed" to run at best possible performance (hand-rolled, 
hand-optimized solutions notwithstanding) I for one prefer them.

-- Chris Nicholson-Sauls

Feb 17 2009

Daniel de Kok <me danieldk.org> writes:

On Tue, Feb 17, 2009 at 9:26 PM, Jarrett Billingsley
<jarrett.billingsley gmail.com> wrote:
 For what it's worth the Tango regexes actually have a method to output
 a D function that will implement the regex after it's compiled.  So
 you _could_ precompile the regex into D code and use that.

I have only been tinkering with Phobos, but that's good to hear, thanks!

Feb 17 2009

Daniel de Kok <me danieldk.org> writes:

On Tue, Feb 17, 2009 at 8:57 PM, Jarrett Billingsley
<jarrett.billingsley gmail.com> wrote:
 The problem with this method is that while it's certainly faster to
 match, it's MUCH slower to compile.  There are no pathological
 matches; only pathological compiles ;)  I'm talking 60-70 seconds to
 compile a more complex regex. This might be an acceptable tradeoff
 for when you need to compile a regex in a long-running app like a
 server, but it's completely unacceptable for most small, Perl-like
 text munging programs.

Hmmm, define "complex", I suppose it's ok for the general
line-splitting/matching stuff? I got into trouble (time-wise) when we
compiled a part of speech tagger into a transducer. In those cases we
generally pre-compile stuff, and output it as a large struct in the
target language. Of course, it would be fun if we can do it at
compile-time ;).

Besides that, if we'd have a good general recognizer/transducer
implementation it could also be used for compact dictionary storage,
perfect hashing automata, etc.

Take care,
Daniel

Feb 17 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me danieldk.org> wrote:
 Hmmm, define "complex"

\w+([\-+.]\w+)* \w+([\-.]\w+)*\.\w+([\-.]\w+)*

This is a simple email regexp.  This takes about 4 or 5 seconds to
compile on my lappy (Pentium M).

It only goes up from there.

Feb 17 2009

BCS <ao pathlink.com> writes:

Reply to Jarrett,

 On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me danieldk.org>
 wrote:
 
 Hmmm, define "complex"
 

 \w+([\-+.]\w+)* \w+([\-.]\w+)*\.\w+([\-.]\w+)*
 
 This is a simple email regexp.  This takes about 4 or 5 seconds to
 compile on my lappy (Pentium M).
 
 It only goes up from there.
 

I wonder how well it would work on this:

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

:b

Feb 17 2009

Daniel de Kok <me danieldk.org> writes:

On Tue, Feb 17, 2009 at 9:50 PM, Jarrett Billingsley
<jarrett.billingsley gmail.com> wrote:
 On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me danieldk.org> wrote:
 Hmmm, define "complex"

 \w+([\-+.]\w+)* \w+([\-.]\w+)*\.\w+([\-.]\w+)*

 This is a simple email regexp.  This takes about 4 or 5 seconds to
 compile on my lappy (Pentium M).

Hmm, odd. I have translated that regexp to the syntax of the tool that
we used, that is written in Prolog (it is generally a constant factor
slower than C/C++/D equivalents). Generating a minimized DFA takes far
less than a second. I used the following expression (abstracted a bit
with macros):

---
macro(letter, {a..z, 'A'..'Z'}).
macro(punctlet,[{-,+,.},letter+]).
macro(dompunctlet,[{-,.},letter+]).
macro(email,[letter+,punctlet*, ,letter+,dompunctlet*,.,letter+,dompunctlet*]).
---

The software is available from:
http://www.let.rug.nl/~vannoord/Fsa/fsa.html

Take care,
Daniel

Feb 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BCS wrote:
 Reply to Andrei,
 
 I'm quite unhappy with the API of std.regexp. It's a chaotic design
 that provides a hodgepodge of functionality and tries to use as many
 synonyms of "to find" in the dictionary (e.g. search, match). I could
 swear Walter never really cared for using regexps, and that is felt
 throughout the design: it fills the bullet point but it's asinine to
 use.

 Besides std.regexp only works with (narrow) strings and we want it to
 work on streams of all widths and structures. One pet complaint I have
 is that std.regexp puts a class around it all as if everybody's
 favorite pastime would be to inherit Regexp and override some random
 function in it.

 In the upcoming releases of D 2.0 there will be rather dramatic
 breaking changes of phobos. I just wanted to ask whether y'all could
 stomach yet another rewritten API or you'd rather use std.regexp as it
 is for the time being.

 Andrei

 
 For what it's worth, I have a partial clone of the .NET API built on top 
 of PCRE. I would have to ask my boss but I expect I could donate it if 
 anyone want to use it as a basis.

That would be cool; I find the engine in std.regexp rather hard to 
understand.

Andrei

Feb 17 2009

Derek Parnell <derek psych.ward> writes:

On Tue, 17 Feb 2009 10:36:06 -0800, Andrei Alexandrescu wrote:

 I'm quite unhappy with the API of std.regexp.

I was so happy with using it I wrote my own simplified regex ;-)

 In the upcoming releases of D 2.0 there will be rather dramatic breaking 
 changes of phobos. I just wanted to ask whether y'all could stomach yet 
 another rewritten API or you'd rather use std.regexp as it is for the 
 time being.

If your changes are going to make things better for coding and maintenance
then go for it.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Feb 17 2009

D Programming

C/C++ Programming

Other

digitalmars.D - earthquake changes of std.regexp to come