www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Splitter quiz / survey

reply Brad Roberts <braddr puremagic.com> writes:
Without looking at the docs, code, or compiling and running a test, what will
this do:

    foreach(x, splitter(",a,b,", ","))
        writefln("x = %s", a);

I'll make it multiple choice:

choice 1)
  x = a
  x = b

choice 2)
  x =
  x = a
  x = b

choice 3)
  x = a
  x = b
  x =

choice 4)
  x =
  x = a
  x = b
  x =

Later,
Brad
Apr 26 2009
next sibling parent "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Brad Roberts" <braddr puremagic.com> wrote in message 
news:mailman.1196.1240799812.22690.digitalmars-d puremagic.com...
 Without looking at the docs, code, or compiling and running a test, what 
 will
 this do:

    foreach(x, splitter(",a,b,", ","))
        writefln("x = %s", a);

Is it a trick question? Replacing , with ; and a with x: there are 3 comma's, so I'd expect 4 outputs:
  x =
  x = a
  x = b
  x =

ie. choice 4?
Apr 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 27 Apr 2009 06:36:33 +0400, Brad Roberts <braddr puremagic.com> wrote:

 Without looking at the docs, code, or compiling and running a test, what  
 will
 this do:

     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);

 I'll make it multiple choice:

 choice 1)
   x = a
   x = b

 choice 2)
   x =
   x = a
   x = b

 choice 3)
   x = a
   x = b
   x =

 choice 4)
   x =
   x = a
   x = b
   x =

 Later,
 Brad

I'd expect 4.
Apr 26 2009
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
eh, x = a and x = b
Apr 26 2009
prev sibling next sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Brad Roberts wrote:
 Without looking at the docs, code, or compiling and running a test, what will
 this do:
 
     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);
 
 I'll make it multiple choice:
 
 choice 1)
   x = a
   x = b
 
 choice 2)
   x =
   x = a
   x = b
 
 choice 3)
   x = a
   x = b
   x =
 
 choice 4)
   x =
   x = a
   x = b
   x =

Ehhh, one would /hope/ it's 4. Of course, if you're making a fancy library, then the bells-and-whistles version might contain a way to parameterize, so that it can "skip" superfluos commas at the beginning and/or end. But that might be overkill.
Apr 26 2009
parent Georg Wrede <georg.wrede iki.fi> writes:
Georg Wrede wrote:
 Brad Roberts wrote:
 Without looking at the docs, code, or compiling and running a test, 
 what will
 this do:

     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);

 I'll make it multiple choice:

 choice 1)
   x = a
   x = b

 choice 2)
   x =
   x = a
   x = b

 choice 3)
   x = a
   x = b
   x =

 choice 4)
   x =
   x = a
   x = b
   x =

Ehhh, one would /hope/ it's 4. Of course, if you're making a fancy library, then the bells-and-whistles version might contain a way to parameterize, so that it can "skip" superfluos commas at the beginning and/or end. But that might be overkill.

Of course, the current behavior is probably related to why you can write enum { a, b, } without a compiler error. But you're right, in this case it may not be a sane behavior. And definitely *not* what the programmer expects.
Apr 27 2009
prev sibling next sibling parent Max Samukha <samukha voliacable.com.removethis> writes:
On Sun, 26 Apr 2009 19:36:33 -0700, Brad Roberts
<braddr puremagic.com> wrote:

Without looking at the docs, code, or compiling and running a test, what will
this do:

    foreach(x, splitter(",a,b,", ","))
        writefln("x = %s", a);

I'll make it multiple choice:

choice 1)
  x = a
  x = b

choice 2)
  x =
  x = a
  x = b

choice 3)
  x = a
  x = b
  x =

choice 4)
  x =
  x = a
  x = b
  x =

Later,
Brad

I'd like 4 by default and a flag/policy for 1. Definitely not 2 or 3.
Apr 26 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brad Roberts wrote:
 Without looking at the docs, code, or compiling and running a test, what will
 this do:
 
     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);
 
 I'll make it multiple choice:
 
 choice 1)
   x = a
   x = b
 
 choice 2)
   x =
   x = a
   x = b
 
 choice 3)
   x = a
   x = b
   x =
 
 choice 4)
   x =
   x = a
   x = b
   x =
 
 Later,
 Brad

Thanks for bringing this to attention, Brad. Splitter does what Perl's split does: 2. This means comma is an item terminator and not an item separator. Why did I think this is a good idea? Because in most cases, I was thankful to Perl's split that it does exactly the right thing. Whenever I read text from linguistic corpora, I see that words (or other word properties) are separated by spaces. There is never a space before the first word on a line, but there is often a trailing space at the end of the line. Why? Because the text was processed by a program that output "word, ' '" or "tag, ' '" for each word of tag. Then if I split the text by whitespace, I'd be annoyed to see that trailing spaces do matter. For the same reason, C accepts enum X { a, b, } but not ,a ,b. Mechanically generating enum values is easier if each value has a trailing comma. Similarly, when you split a text by '\n', a leading empty line is important, whereas you wouldn't expect a final '\n' to introduce an empty line. Now clearly there are cases in which leading or trailing empty items are both important. I'm just saying they are more rare. We could add an enumerated parameter to Splitter: enum PleaseFindAGoodName { terminator, separator } foreach (line; splitter(",a,b,", ",")) ... terminator is implicit ... foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator)) ... separator ... We might just go with the terminator semantics and ask people who need separator semantics to use a stripl() or a munch() prior to splitting. I'd personally prefer having an enum there. Andrei
Apr 27 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Splitter does what Perl's 
 split does: 2.

Perl has to die. This is Python:
 ",a,b,".split(",")



My lazy xsplit too works like that. I strongly vote for (4). Bye, bearophile
Apr 27 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Perl has to die. This is Python:
 
 ",a,b,".split(",")




T-h-i-s I-s S-p-a-r-t-a: immortals.split("xiphos")
Apr 27 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 Splitter does what Perl's 
 split does: 2.

Perl has to die. This is Python:

This answer is wrong for a number of reasons. First comes the fallacy that if Perl "has to die", everything Perl did was wrong. Second comes the fallacy that if Python is overall better than Perl, everything it does is better than everything Perl does.
 ",a,b,".split(",")



My lazy xsplit too works like that. I strongly vote for (4).

Why? I'd be willing to change things no problem, but "perl must die, here's python" just doesn't seem to have much persuasive power. Andrei
Apr 27 2009
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 27 de abril a las 08:39 me escribiste:
",a,b,".split(",")



My lazy xsplit too works like that. I strongly vote for (4).

Why? I'd be willing to change things no problem, but "perl must die, here's python" just doesn't seem to have much persuasive power.

This thread shows that 4) is the result people expect. I think removing unexpected behaviour (bugs) is a good reason to change it. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
Apr 27 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Leandro Lucarella wrote:
 This thread shows that 4) is the result people expect. I think removing
 unexpected behaviour (bugs) is a good reason to change it.

Expect, yes, but Andrei made a good point that (4) is not the most useful behavior. Since Perl has been very successful in its niche of string processing, I would give a lot a weight to its behavior for basic functions.
Apr 27 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 Expect, yes, but Andrei made a good point that (4) is not the most 
 useful behavior.

If your language acts in an intuitive and logic way, people need less time to write programs, to debug then, and write less bugs in the first place. This outweighs most other things. If you have to add a stripping it's not so bad.
 Since Perl has been very successful in its niche of string processing, I 
 would give a lot a weight to its behavior for basic functions.

Perl is now (correctly) dying because it looks like it was designed by an army of crazy monkeys. It was acceptable years ago, when there was no better alternative, but today it's better to look at other places for design ideas, at Python, Clojure, C#4, Haskell, Scala, Chapel, F#, Ruby. If you want to see a small design error that may be partially derived from Perl you can see std.string of Phobos1, the chomp and chop functions. They have too much similar names and they do to much similar things. So you often need a manual to remember what does what. I am not a compiler writer, but I am quite able to see what a mess Perl is. Perl is nearly never a good place to copy language design ideas from. Bye, bearophile
Apr 27 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Walter Bright:
 Expect, yes, but Andrei made a good point that (4) is not the most 
 useful behavior.

If your language acts in an intuitive and logic way, people need less time to write programs, to debug then, and write less bugs in the first place. This outweighs most other things. If you have to add a stripping it's not so bad.
 Since Perl has been very successful in its niche of string processing, I 
 would give a lot a weight to its behavior for basic functions.

Perl is now (correctly) dying because it looks like it was designed by an army of crazy monkeys. It was acceptable years ago, when there was no better alternative, but today it's better to look at other places for design ideas, at Python, Clojure, C#4, Haskell, Scala, Chapel, F#, Ruby. If you want to see a small design error that may be partially derived from Perl you can see std.string of Phobos1, the chomp and chop functions. They have too much similar names and they do to much similar things. So you often need a manual to remember what does what. I am not a compiler writer, but I am quite able to see what a mess Perl is. Perl is nearly never a good place to copy language design ideas from.

This is again "Dogs are uncool because Hitler liked dogs." It would be great if we all steered our rhetoric off this fallacy (and generally arguments originating from and eliciting emotional response). Perl tried many things and broke many rules. Along the way it also found many nuggets and inspirational bits for other languages. Speaking of split in particular, I find Python's behavior for split inferior to that of Perl and undesirable for Phobos. The semantics of splitting depends on the contents of the splitter, so if you split by a non-literal, it's difficult to count on the result of the splitting. If I'm changing the behavior for Phobos' splitter, I'm doing so because I found a reason to, not because Hitler liked dogs. Andrei
Apr 27 2009
prev sibling next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
bearophile wrote:
 Walter Bright:
 Expect, yes, but Andrei made a good point that (4) is not the most 
 useful behavior.

If your language acts in an intuitive and logic way, people need less time to write programs, to debug then, and write less bugs in the first place. This outweighs most other things. If you have to add a stripping it's not so bad.
 Since Perl has been very successful in its niche of string processing, I 
 would give a lot a weight to its behavior for basic functions.

Perl is now (correctly) dying because it looks like it was designed by an army of crazy monkeys. It was acceptable years ago, when there was no better alternative, but today it's better to look at other places for design ideas, at Python, Clojure, C#4, Haskell, Scala, Chapel, F#, Ruby. If you want to see a small design error that may be partially derived from Perl you can see std.string of Phobos1, the chomp and chop functions. They have too much similar names and they do to much similar things. So you often need a manual to remember what does what. I am not a compiler writer, but I am quite able to see what a mess Perl is. Perl is nearly never a good place to copy language design ideas from. Bye, bearophile

For quick, simple, write-once scripts, I still haven't found anything that beats Perl. I've completely forgotten shell script syntax just because of how powerful & simple Perl is for these tasks (I refuse to use Ruby because implicit variable declarations means typos = bugs and I make a lot of typos... apparently Python is also a good scripting language; maybe I'll give it a try some day). If your code is longer than 50 lines and/or needs to be maintained in the future, Perl becomes a lot more questionable. But Perl hate seems to go along with goto hate, Vista hate, emo hate, etc. as bandwagon-jumping.
Apr 28 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Robert Fraser:
 For quick, simple, write-once scripts, I still haven't found anything that
beats Perl.<

Even if now it is slowly going out of fashion, Perl is a language used for many years by millions of people, so surely it's usable. For small text processing scripts it's probably "better" than Python too. It's currently widely used in bioinformatics too, see also "How Perl Saved the Human Genome Project", by Lincoln Stein: http://www.bioperl.org/wiki/How_Perl_saved_human_genome But I suggest the D designers to not copy it while designing D, because its design is often a dangerous mess.
 But Perl hate seems to go along with goto hate, Vista hate, emo hate, etc. as
bandwagon-jumping.<

I think they don't like grizzlymorphs jump on bandwagons :o) Bye, bearophile
Apr 28 2009
prev sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Brad Roberts wrote:
 Actually, perl is a risky language to take _syntax_ from, but _semantics_ 
 aren't nearly as dangerous.  Obviously there's some semantics that are 
 horrible (see it's OOP mechanisms), but parts of the rest are quite good.  
 I grip and groan every time I find myself having to touch perl code, but 
 it's rarely due to non-syntactical issues.

This is one of my favorite rants, anywhere on the world wide internets: http://steve.yegge.googlepages.com/ancient-languages-perl If nothing else, at least read the "Snake Eyes" section. It's not the syntax that make perl so bad. Sure, it takes some getting used to. But when the rubber hits the road, it's just syntax, and anyone can learn it. The semantics, though, are a complete and utter trainwreck. Even after two years of working at a company where perl was the primary development language, I still never felt comfortable unless I had the camel book within arm's reach. But amid that insanity there are a few gems. Most notably: regular expressions. And string splitting is largely based on the regex engine. So it's not too shocking to me that D might be influenced by it. On the other hand, I agree with most of the other people in this thread, that option (4) was the best of the possible splitting behaviors. --benji
Apr 28 2009
parent grauzone <none example.net> writes:
 This is one of my favorite rants, anywhere on the world wide internets:

I believe it is "Internet".
 http://steve.yegge.googlepages.com/ancient-languages-perl

That reminds me, will D templates ever be fixed not to auto-flatten tuples? Does D really want to be its own father's sister? ---- The mistake is that very early on, Larry decided to flatten lists by default. Hence, if you write this: x = (1, 2, 3, (4, 5)); It automagically turns into (1, 2, 3, 4, 5). Convenient, eh? Sure it is. If you want to be your own father's sister, it's extremely convenient. ---- (From your link.)
Apr 28 2009
prev sibling parent Brad Roberts <braddr bellevue.puremagic.com> writes:
On Mon, 27 Apr 2009, bearophile wrote:

 Walter Bright:
 Expect, yes, but Andrei made a good point that (4) is not the most 
 useful behavior.

If your language acts in an intuitive and logic way, people need less time to write programs, to debug then, and write less bugs in the first place. This outweighs most other things. If you have to add a stripping it's not so bad.
 Since Perl has been very successful in its niche of string processing, I 
 would give a lot a weight to its behavior for basic functions.

Perl is now (correctly) dying because it looks like it was designed by an army of crazy monkeys. It was acceptable years ago, when there was no better alternative, but today it's better to look at other places for design ideas, at Python, Clojure, C#4, Haskell, Scala, Chapel, F#, Ruby. If you want to see a small design error that may be partially derived from Perl you can see std.string of Phobos1, the chomp and chop functions. They have too much similar names and they do to much similar things. So you often need a manual to remember what does what. I am not a compiler writer, but I am quite able to see what a mess Perl is. Perl is nearly never a good place to copy language design ideas from. Bye, bearophile

Actually, perl is a risky language to take _syntax_ from, but _semantics_ aren't nearly as dangerous. Obviously there's some semantics that are horrible (see it's OOP mechanisms), but parts of the rest are quite good. I grip and groan every time I find myself having to touch perl code, but it's rarely due to non-syntactical issues. Later, Brad
Apr 27 2009
prev sibling parent reply downs <default_357-line yahoo.de> writes:
Andrei Alexandrescu wrote:
 bearophile wrote:
 Andrei Alexandrescu:
 Splitter does what Perl's split does: 2.

Perl has to die. This is Python:

This answer is wrong for a number of reasons. First comes the fallacy that if Perl "has to die", everything Perl did was wrong. Second comes the fallacy that if Python is overall better than Perl, everything it does is better than everything Perl does.
 ",a,b,".split(",")



My lazy xsplit too works like that. I strongly vote for (4).

Why? I'd be willing to change things no problem, but "perl must die, here's python" just doesn't seem to have much persuasive power. Andrei

It is my strong opinion that Perl did this wrong. Why? Because it quite strongly violates user expectations. Please keep magic out of D as far as possible. If unavoidable, mark it with big warning signs. Magic is the *enemy* of comprehension. Microsoft once experimented with tools that tried to predict what you wanted instead of what you _said_. The result was *Clippy*. I don't need to tell you how that went. FWIW: If I wanted to remove trailing spaces, I'd use the well-known .strip() function.
Apr 27 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
downs wrote:
 Andrei Alexandrescu wrote:
 bearophile wrote:
 Andrei Alexandrescu:
 Splitter does what Perl's split does: 2.


that if Perl "has to die", everything Perl did was wrong. Second comes the fallacy that if Python is overall better than Perl, everything it does is better than everything Perl does.
 ",a,b,".split(",")



My lazy xsplit too works like that. I strongly vote for (4).

here's python" just doesn't seem to have much persuasive power. Andrei

It is my strong opinion that Perl did this wrong. Why? Because it quite strongly violates user expectations. Please keep magic out of D as far as possible. If unavoidable, mark it with big warning signs. Magic is the *enemy* of comprehension. Microsoft once experimented with tools that tried to predict what you wanted instead of what you _said_. The result was *Clippy*. I don't need to tell you how that went. FWIW: If I wanted to remove trailing spaces, I'd use the well-known .strip() function.

I have been convinced. I will modify splitter to do (4), i.e., prepend or append an empty element if there's a leading, respectively trailing, separator. Thanks to all for destroying me :o). Actually, Brad, since it was your idea, I suggest you to operate the change and add yourself to the list of contributors to std.algorithm. I have constructed an information-theory-based argument that was alluded to by some posters without being clearly stated: if splitter does (4) and the separator is a character or a string, then you can reconstruct the original input from its output. So (4) is the behavior that loses the least information. (Brad was on a similar track by stating that it's desirable for splitter to produce the same number of items whether ran forward or in reverse.) Andrei
Apr 27 2009
next sibling parent Brad Roberts <braddr bellevue.puremagic.com> writes:
On Mon, 27 Apr 2009, Andrei Alexandrescu wrote:

 I have been convinced. I will modify splitter to do (4), i.e., prepend or
 append an empty element if there's a leading, respectively trailing,
 separator. Thanks to all for destroying me :o).
 
 Actually, Brad, since it was your idea, I suggest you to operate the change
 and add yourself to the list of contributors to std.algorithm.
 
 I have constructed an information-theory-based argument that was alluded to by
 some posters without being clearly stated: if splitter does (4) and the
 separator is a character or a string, then you can reconstruct the original
 input from its output. So (4) is the behavior that loses the least
 information. (Brad was on a similar track by stating that it's desirable for
 splitter to produce the same number of items whether ran forward or in
 reverse.)
 
 Andrei

I'll try to get to it tonight, but no promises. Later, Brad
Apr 27 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I have been convinced. I will modify splitter to do (4), i.e., prepend 
 or append an empty element if there's a leading, respectively trailing, 
 separator. Thanks to all for destroying me :o).

Looks like I must concede, too <g>.
Apr 27 2009
parent Derek Parnell <derek psych.ward> writes:
On Mon, 27 Apr 2009 15:12:59 -0700, Walter Bright wrote:

 Andrei Alexandrescu wrote:
 I have been convinced. I will modify splitter to do (4), i.e., prepend 
 or append an empty element if there's a leading, respectively trailing, 
 separator. Thanks to all for destroying me :o).

Looks like I must concede, too <g>.

The utility of a function that treats the delimiter as a teminator is still a good idea, but it would need a name that is not derived from "split". I'm struggling to think of one but so far something like "itemize" is coming closer to my expectations. Also, the documentation for splitter must say that the delimiter is being treated as a separator between items and thus the presence of a delimiter implies that an item exists before AND after the delimiter (because it is inbetween two items, by definition). The documentation for "itemize" (or whatever) must say that the delimiter is being treated as a terminator of an item. The presence of a delimiter thus implies that an item must exist before the delimiter (because it marks the end of an item, by definition). The only weirdo situation is then what to do with an item that is not terminated by a delimiter, that is the last one in the list. Ok, so we say that the list of items is seen as one in which each item is terminated by the delimiter or by the end of the list. On the other hand, "itemize" is kind of like ... split(trim(list, delim), delim) -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Apr 27 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Brad Roberts wrote:
 Without looking at the docs, code, or compiling and running a test, what will
 this do:
 
     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);
 
 I'll make it multiple choice:
 
 choice 1)
   x = a
   x = b
 
 choice 2)
   x =
   x = a
   x = b
 
 choice 3)
   x = a
   x = b
   x =
 
 choice 4)
   x =
   x = a
   x = b
   x =
 
 Later,
 Brad

Thanks for bringing this to attention, Brad. Splitter does what Perl's split does: 2. This means comma is an item terminator and not an item separator. Why did I think this is a good idea? Because in most cases, I was thankful to Perl's split that it does exactly the right thing.

Before reading your post, I was going to say that I'd expect 4, would accept 1, and consider 2 or 3 to be buggy! Notice how under your new proposal everyone would still get the behavior wrong when reading the code.
 Whenever I read text from linguistic corpora, I see that words (or other 
 word properties) are separated by spaces. There is never a space before 
 the first word on a line, but there is often a trailing space at the end 
 of the line. Why? Because the text was processed by a program that 
 output "word, ' '" or "tag, ' '" for each word of tag. Then if I split 
 the text by whitespace, I'd be annoyed to see that trailing spaces do 
 matter.
 
 For the same reason, C accepts enum X { a, b, } but not ,a ,b. 
 Mechanically generating enum values is easier if each value has a 
 trailing comma.
 
 Similarly, when you split a text by '\n', a leading empty line is 
 important, whereas you wouldn't expect a final '\n' to introduce an 
 empty line.
 
 Now clearly there are cases in which leading or trailing empty items are 
 both important. I'm just saying they are more rare. We could add an 
 enumerated parameter to Splitter:
 
 enum PleaseFindAGoodName { terminator, separator }
 
 foreach (line; splitter(",a,b,", ","))
      ... terminator is implicit ...
 foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator))
      ... separator ...
 
 We might just go with the terminator semantics and ask people who need 
 separator semantics to use a stripl() or a munch() prior to splitting. 
 I'd personally prefer having an enum there.
 
 
 Andrei

Apr 27 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone Andrei P.S. I scrolled down your post looking for counter-evidence that you might have brought, but found only the please-don't-do-this-again empty quote. It wastes everybody time looking in vain for nuggets of responses within the quoted text.
Apr 27 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Mon, 27 Apr 2009 09:43:40 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone

Not that I care, because I don't use phobos, but you haven't really presented any good argument that your method is the most intuitive except: 1. Some example of badly written code that outputs extra spaces (I don't consider this to be common).

No. I mentioned cases of corpora files that come from various sources.
 2. Perl does it that way.

That is not my argument. Please reread. It's simply that I've found split in Perl to do the right thing more often than not. So the presence of Perl is irrelevant, it could have been "a split function that I used..."
 The way I see it is: when I see a function named "splitter", I think the 
 function splits a string based on identified token separators.  If you 
 don't think of it that way, fine, you have every right to design Phobos 
 however you want, despite the fact that 100% of respondants surveyed (so 
 far) don't agree with your intuition.
 
 I have never thought of a list of tokens with terminators vs. 
 separators.  I think what you should have as an option to split is to be 
 able to ignore leading or trailing empty items, not "seperator is really 
 terminator" enums, which would require a paragraph of explanation.

I guess we could make a good decision based on the reusability of the enum for other pieces of functionality. That suggests that "noTrailing" and "noLeading" may find more uses. Andrei
Apr 27 2009
prev sibling next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Andrei Alexandrescu wrote:
 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone

Yes, but it's a representative (albeit small) sample of the user base.
 Andrei
 
 
 P.S. I scrolled down your post looking for counter-evidence that you 
 might have brought, but found only the please-don't-do-this-again empty 
 quote. It wastes everybody time looking in vain for nuggets of responses 
 within the quoted text.

Is consistency a good argument? std.string.split currently does (4). Java and C#'s split() methods work like (4). strtok does (4). Is there any other language/function besides Perl that does (2)?
Apr 27 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Fraser wrote:
 Andrei Alexandrescu wrote:
 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone

Yes, but it's a representative (albeit small) sample of the user base.

That I disagree with. I mean... you're just saying it. Participation to a newsgroup is not necessarily correlated with much else than interest and available time. Besides, it's hard to say how representative a sample of 10-15 is. If you said "influential" instead of "representative" then I'd agree you're on to something.
 P.S. I scrolled down your post looking for counter-evidence that you 
 might have brought, but found only the please-don't-do-this-again 
 empty quote. It wastes everybody time looking in vain for nuggets of 
 responses within the quoted text.

Is consistency a good argument? std.string.split currently does (4). Java and C#'s split() methods work like (4). strtok does (4). Is there any other language/function besides Perl that does (2)?

Yes, Phobos' Splitter :o). Alright, it's not like I'm fixated. I can make the change. I'd be glad to have a stronger criterion for making one choice or another. Andrei
Apr 27 2009
next sibling parent Robert Fraser <fraserofthenight gmail.com> writes:
Andrei Alexandrescu wrote:
 Robert Fraser wrote:
 Yes, but it's a representative (albeit small) sample of the user base.

That I disagree with. I mean... you're just saying it. Participation to a newsgroup is not necessarily correlated with much else than interest and available time.

Exactly! If it was correlated with a particular type of user (i.e. users who work a lot with linguistics corpora...) you might get a rather skewed sample.
Apr 27 2009
prev sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Robert Fraser wrote:
 Andrei Alexandrescu wrote:
 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone

Yes, but it's a representative (albeit small) sample of the user base.

That I disagree with. I mean... you're just saying it. Participation to a newsgroup is not necessarily correlated with much else than interest and available time. Besides, it's hard to say how representative a sample of 10-15 is.

Of course it isn't representative. Not even necessarily influential. But it is /something/. Normally, when we have discussed "new" things, half, or even more have been of the [not final] opinion, which has had to have been changed. Interestingly, this one was pretty unanimous. But you're right: participation in a newsgroup, happening to see a particular post, and happening to respond to it, before too many others find it useless to contribute, can hardly be considered representative. And that's precisely why we've had Walter as the Dictator, and lately you (for Phobos, at least). Majority lead development is not going to lead us anywhere, and I guess we all know it. Grudgingly or not. Instead of a voting community, the major contribution of this NG migth be to bring up unexpected issues with the latest developments (both concepts and implementations). Linus Torvalds used to welcome "more eyes", under the idea that haystacks shouold get rid of as many needles as possible, or else you can't enjoy jumping into them.
 If you said "influential" instead of "representative" then I'd agree 
 you're on to something.
 
 P.S. I scrolled down your post looking for counter-evidence that you 
 might have brought, but found only the please-don't-do-this-again 
 empty quote. It wastes everybody time looking in vain for nuggets of 
 responses within the quoted text.

Is consistency a good argument? std.string.split currently does (4). Java and C#'s split() methods work like (4). strtok does (4). Is there any other language/function besides Perl that does (2)?

Yes, Phobos' Splitter :o). Alright, it's not like I'm fixated. I can make the change. I'd be glad to have a stronger criterion for making one choice or another. Andrei

Apr 27 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 So the default is to act like Perl, but this only applies when
 splitting on whitespace. Otherwise it acts like 4).  re.split in
 python seems to do 4) pretty much all the time.

Sheesh. DWIM taken to the extreme. That sounds pretty awful to me. Python must die :o). Andrei
Apr 27 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Sheesh. DWIM taken to the extreme. That sounds pretty awful to me. 
 Python must die :o).

Python APIs are usually 5 times better than the ones you dream about. To design an API you have to start collecting possible (even ideal) users. Then you trim them down to the few most important people, and give a face and a name to each one of them. Then you create an API to make them happy. And then you write the code under the APIs. From some things you have currently designed in Phobos2 it seems you are doing it backwards. bearophile
Apr 27 2009
next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
bearophile wrote:
 To design an API you have to start collecting possible (even ideal) users.
Then you trim them down to the few most important people, and give a face and a
name to each one of them.

LOL, user personalities. I remember catering to the user "Mustafa" when I was working at MS, which I think I misunderstood to have something to do with the Circle of Life.
Apr 27 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Fraser:
 bearophile wrote:
 To design an API you have to start collecting possible (even ideal) users.
Then you trim them down to the few most important people, and give a face and a
name to each one of them.

LOL, user personalities. I remember catering to the user "Mustafa" when I was working at MS, which I think I misunderstood to have something to do with the Circle of Life.

Do you mean Mufasa :-) I was not talking about personalities, but mostly about classes of users; people have all different skills and necessities, but can often be categorized in few groups. Then you give a name to a representative of each class. You don't need to give them detailed personalities, just the characteristics that tell the classes apart. I haven't invented this method, it comes from Alan Cooper (Visual Basic) and Donald Norman ( human-centered design). Python core developers don't use this method, by the way. I have used it do design small programs with a GUI, starting from such user classes => use cases => GUI => logic. Bye, bearophile
Apr 27 2009
parent Robert Fraser <fraserofthenight gmail.com> writes:
bearophile wrote:
 Robert Fraser:
 bearophile wrote:
 To design an API you have to start collecting possible (even ideal) users.
Then you trim them down to the few most important people, and give a face and a
name to each one of them.

I was working at MS, which I think I misunderstood to have something to do with the Circle of Life.

Do you mean Mufasa :-)

That name sends chills up my spine. Ooh, say it again!
 I was not talking about personalities, but mostly about classes of users;
people have all different skills and necessities, but can often be categorized
in few groups. Then you give a name to a representative of each class. You
don't need to give them detailed personalities, just the characteristics that
tell the classes apart.
 I haven't invented this method, it comes from Alan Cooper (Visual Basic) and
Donald Norman ( human-centered design).
 Python core developers don't use this method, by the way. I have used it do
design small programs with a GUI, starting from such user classes => use cases 
=> GUI => logic.

That's what I meant. Mufasa is a user with significant database administration experience and expanded privileges, who has lots of general computer knowledge. He knows some scripting and SQL, but doesn't understand programming. He comes from a UNIX background but recently migrated to Windows/SQL Server... or something like that, I don't remember exactly. But if I remember right, Mufasa's personality archetype also included some example personal details (he is in his early 40s and married with two kids, IIRC). I guess that helps you picture the user better or something.
Apr 27 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 Sheesh. DWIM taken to the extreme. That sounds pretty awful to me.
  Python must die :o).

Python APIs are usually 5 times better than the ones you dream about.

I meant that in jest, so no need to get annoyed. I account emotional load for your response. Andrei
Apr 27 2009
prev sibling next sibling parent Jason House <jason.james.house gnail.com> writes:
Andrei Alexandrescu Wrote:

 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone Andrei P.S. I scrolled down your post looking for counter-evidence that you might have brought, but found only the please-don't-do-this-again empty quote. It wastes everybody time looking in vain for nuggets of responses within the quoted text.

I am limited by the iPhone interface. Sitting with my finger on the delete button for 5 minutes is really painful... Especially since it can keep deleting after I take my finger off the key and force me to start all over. I'm sorry that I include too much text, but it's the best I can do for 98% of my posts.
Apr 27 2009
prev sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Steven Schveighoffer wrote:
 On Mon, 27 Apr 2009 09:43:40 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone

Not that I care, because I don't use phobos, but you haven't really presented any good argument that your method is the most intuitive except: 1. Some example of badly written code that outputs extra spaces (I don't consider this to be common). 2. Perl does it that way. The way I see it is: when I see a function named "splitter", I think the function splits a string based on identified token separators. If you don't think of it that way, fine, you have every right to design Phobos however you want, despite the fact that 100% of respondants surveyed (so far) don't agree with your intuition. I have never thought of a list of tokens with terminators vs. separators. I think what you should have as an option to split is to be able to ignore leading or trailing empty items, not "seperator is really terminator" enums, which would require a paragraph of explanation.

An after-the-fact thought: if the function is called "splitter", then one unavoidably starts thinking about separators. And not terminators. Had the function been called "separate" or something else, then the notion of "something in between" hadn't been that strong.
Apr 27 2009
prev sibling next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 ...
 
 We might just go with the terminator semantics and ask people who need
 separator semantics to use a stripl() or a munch() prior to splitting.
 I'd personally prefer having an enum there.
 
 
 Andrei

I'm going to invoke the principle of least surprise here; 6 out of 7 respondents said it should be #4. The dissenter voted #1. No one thought it would be the behaviour as actually implemented. [1] The problem here is that the word 'split' to me just doesn't mesh with this terminator business. I think the default should be the behaviour that people *expect* it to have. That, or give the current Splitter a new name. Itemize, perhaps? Also keep in mind that this discrepancy won't show up in trivial testing. It won't be until people start getting obscure bugs that they'll wonder what's going on. </2c> -- Daniel [1] That's a pretty tiny sample, but you work with what you have. :P
Apr 27 2009
parent =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Keep wrote:
|
| Andrei Alexandrescu wrote:
|> ...
|>
|> We might just go with the terminator semantics and ask people who
need
|> separator semantics to use a stripl() or a munch() prior to
splitting.
|> I'd personally prefer having an enum there.
|>
|>
|> Andrei
|
| I'm going to invoke the principle of least surprise here; 6 out of 7
| respondents said it should be #4.  The dissenter voted #1.  No one
| thought it would be the behaviour as actually implemented. [1]
|
	Funny thing is, I answered before looking at the responses and
without knowing what the current behaviour was. IMO, #1 is the worst
solution because even if you know how many fields you expect, there
is no way to know which are empty. All three other solutions allow
you to find out.

	I vote #2 before #3 because in my experience, you never want to
drop the first few empty fields but you sometimes want to ignore the
trailing ones.

	Ideally, I'd say #4 (principle of least surprise) with an option to
choose #2 (probably the most useful IMO) or #1.

		Jerome

PS: I've never used Perl and I agree that it should die, but I still
vote #2 over #1...
- --
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkn1+rMACgkQd0kWM4JG3k8UuwCePGgWCm5teSloTePVPQVruDJn
03QAni1hWgdXPQetVNNJHlpizpKSYz13
=yGRf
-----END PGP SIGNATURE-----
Apr 27 2009
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Brad Roberts wrote:
 Without looking at the docs, code, or compiling and running a test, what will
 this do:

     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);

 I'll make it multiple choice:

 choice 1)
   x = a
   x = b

 choice 2)
   x =
   x = a
   x = b

 choice 3)
   x = a
   x = b
   x =

 choice 4)
   x =
   x = a
   x = b
   x =

split does: 2. This means comma is an item terminator and not an item separator.

Interesting. It never occurred to me to think of the comma as a terminator. I'm actually quite surprised by Perl's behavior.
 Why did I think this is a good idea? Because in most cases, I
 was thankful to Perl's split that it does exactly the right thing.
 Whenever I read text from linguistic corpora, I see that words (or other
 word properties) are separated by spaces. There is never a space before
 the first word on a line, but there is often a trailing space at the end
 of the line. Why? Because the text was processed by a program that
 output "word, ' '" or "tag, ' '" for each word of tag. Then if I split
 the text by whitespace, I'd be annoyed to see that trailing spaces do
 matter.

Only because the program that generated this text was doing something unexpected though, right?
 For the same reason, C accepts enum X { a, b, } but not ,a ,b.
 Mechanically generating enum values is easier if each value has a
 trailing comma.

This has always seemed weird to me. C doesn't accept a trailing comma in function parameter lists. I don't mind it accepting commas in enum blocks mostly because leaving a trailing comma in multi-line blocks can mean a smaller diff if I want to append new elements to the block later, but it certainly isn't sufficient to justify the syntax IMO.
 Similarly, when you split a text by '\n', a leading empty line is
 important, whereas you wouldn't expect a final '\n' to introduce an
 empty line.

I very well may. It really depends on the use.
 Now clearly there are cases in which leading or trailing empty items are
 both important. I'm just saying they are more rare.

I think there are two issues worth considering here. First is semantics-- the term "split" clearly suggests a division between two things. Second is that it's easier to throw out null strings than to infer their existence from a function that doesn't communicate this information. A similar issue was raised regarding readln preserving line terminators in the strings it returns. As for rarity, CSV is an extremely popular format for tabulated text files, and split seems like a natural fit for processing lines from such files. I'd think that processing such files would at least be a very common need in the business sector.
 We could add an
 enumerated parameter to Splitter:
 enum PleaseFindAGoodName { terminator, separator }
 foreach (line; splitter(",a,b,", ","))
      ... terminator is implicit ...
 foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator))
      ... separator ...
 We might just go with the terminator semantics and ask people who need
 separator semantics to use a stripl() or a munch() prior to splitting.

Did you perhaps meant the reverse? It would be easy enough to strip trailing whitespace from your text files to get the behavior you expect, but I don't see how this would help people who consider the trailing token significant (the separator case).
Apr 27 2009
parent Robert Fraser <fraserofthenight gmail.com> writes:
Steven Schveighoffer wrote:
 On Mon, 27 Apr 2009 18:36:55 -0400, Sean Kelly <sean invisibleduck.org> 
 wrote:
 
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s 
 article
 For the same reason, C accepts enum X { a, b, } but not ,a ,b.
 Mechanically generating enum values is easier if each value has a
 trailing comma.

This has always seemed weird to me. C doesn't accept a trailing comma in function parameter lists. I don't mind it accepting commas in enum blocks mostly because leaving a trailing comma in multi-line blocks can mean a smaller diff if I want to append new elements to the block later, but it certainly isn't sufficient to justify the syntax IMO.

You know, this just reminded me of something. What is the purpose of allowing trailing commas in enums in C? mostly for this: enum { val1, val2, #ifdef INCLUDE_VAL_3 val3 #endif }; Which would require some weird preprocessor logic for val2 if a trailing comma weren't allowed But hasn't this behavior been *specifically* frowned upon by Walter due to it's lack of maintainability? In fact, I'd say that except for C portability (which is becoming more and more a moot argument), we could get rid of allowing the comma at the end of the last enum definition. In fact, it would discourage the undesirable behavior of versioning around elements versus versioning around the enum. I know the argument is over for splitter, but I just thought this was an interesting connection to explore. -Steve

NO! Allowing trailing comma in stuff is great if it's being generated by CTFE, or if it's just a long list you're adding to/removing from/commenting parts out during development. I'd rather trailing commas be allowed in array literals, too.
Apr 28 2009
prev sibling parent Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Brad Roberts wrote:
 Without looking at the docs, code, or compiling and running a test, 
 what will
 this do:

     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);

 I'll make it multiple choice:

 choice 1)
   x = a
   x = b

 choice 2)
   x =
   x = a
   x = b

 choice 3)
   x = a
   x = b
   x =

 choice 4)
   x =
   x = a
   x = b
   x =

 Later,
 Brad

Thanks for bringing this to attention, Brad. Splitter does what Perl's split does: 2. This means comma is an item terminator and not an item separator.

This is the problem, I think. Linguistically, "split" happens at seperators (split points), so it should be 4. The argument of utility from Perl is reasonable, but with those semantics I think it needs a different name - you don't "split" at "terminators". You could get away with terminators in a function name like C's "strcol" or "collate", where the intuition for how the separation occurs isn't nearly as strong, but not for "split".
Apr 28 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 26 Apr 2009 22:36:33 -0400, Brad Roberts <braddr puremagic.com>  
wrote:

 Without looking at the docs, code, or compiling and running a test, what  
 will
 this do:

     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);

 I'll make it multiple choice:

 choice 1)
   x = a
   x = b

 choice 2)
   x =
   x = a
   x = b

 choice 3)
   x = a
   x = b
   x =

 choice 4)
   x =
   x = a
   x = b
   x =

Normally for standardized tests, I try to make all my pencil bubbles draw a picture, it makes tests more fun. But it seems we have no standardized form here, so I actually have to take the test :( I choose 4. -Steve
Apr 27 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 27 Apr 2009 09:43:40 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group != everyone

Not that I care, because I don't use phobos, but you haven't really presented any good argument that your method is the most intuitive except: 1. Some example of badly written code that outputs extra spaces (I don't consider this to be common). 2. Perl does it that way. The way I see it is: when I see a function named "splitter", I think the function splits a string based on identified token separators. If you don't think of it that way, fine, you have every right to design Phobos however you want, despite the fact that 100% of respondants surveyed (so far) don't agree with your intuition. I have never thought of a list of tokens with terminators vs. separators. I think what you should have as an option to split is to be able to ignore leading or trailing empty items, not "seperator is really terminator" enums, which would require a paragraph of explanation. -Steve
Apr 27 2009
prev sibling next sibling parent =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brad Roberts wrote:
| Without looking at the docs, code, or compiling and running a
test, what will
| this do:
|
|     foreach(x, splitter(",a,b,", ","))
|         writefln("x = %s", a);
|
| I'll make it multiple choice:
|
| choice 1)
|   x = a
|   x = b
|
| choice 2)
|   x =
|   x = a
|   x = b
|
| choice 3)
|   x = a
|   x = b
|   x =
|
| choice 4)
|   x =
|   x = a
|   x = b
|   x =
|
	I'd say either 2 or 4.

		Jerome
- --
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkn19asACgkQd0kWM4JG3k8Y+QCeM0BJEL+TEPZekPSVRVYK+dOs
eeUAoIxyiSYxqL9AmGSl3rx+nbIlS/bq
=l7H8
-----END PGP SIGNATURE-----
Apr 27 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Mon, Apr 27, 2009 at 9:55 AM, Robert Fraser
<fraserofthenight gmail.com> wrote:
 Andrei Alexandrescu wrote:
 Jason House wrote:
 Before reading your post, I was going to say that I'd expect 4, would
 accept 1, and consider 2 or 3 to be buggy! Notice how under your new
 proposal everyone would still get the behavior wrong when reading the
 code.

everyone posting heavily in thiss group !=3D everyone

Yes, but it's a representative (albeit small) sample of the user base.
 Andrei


 P.S. I scrolled down your post looking for counter-evidence that you mig=


It wastes everybody time looking in vain for nuggets of responses within t= he quoted text.
 Is consistency a good argument? std.string.split currently does (4). Java=

r language/function besides Perl that does (2)? Python's split is rather more nuanced than has been intimated here: """ str.split([sep[, maxsplit]]) Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made). If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an empty string with a specified separator returns ['']. If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns []. For example, ' 1=A0 2=A0=A0 3=A0 '.split() returns ['1', '2', '3'], and '= =A0 1 2=A0=A0 3=A0 '.split(None, 1) returns ['1', '2=A0=A0 3=A0 ']. """ So the default is to act like Perl, but this only applies when splitting on whitespace. Otherwise it acts like 4). re.split in python seems to do 4) pretty much all the time. --bb
Apr 27 2009
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
Georg Wrede wrote:

 An after-the-fact thought: if the function is called "splitter", then  
 one unavoidably starts thinking about separators. And not terminators.

 Had the function been called "separate" or something else, then the  
 notion of "something in between" hadn't been that strong.

So you're saying "separate" does not evoke the thought of separators? -- Simen
Apr 27 2009
prev sibling next sibling parent Derek Parnell <derek psych.ward> writes:
On Sun, 26 Apr 2009 19:36:33 -0700, Brad Roberts wrote:

 Without looking at the docs, code, or compiling and running a test, what will
 this do:
 
     foreach(x, splitter(",a,b,", ","))
         writefln("x = %s", a);

I picked (4) ... then read Andrei's response.
 choice 4)
   x =
   x = a
   x = b
   x =

I see the delimiter as a separator and not a terminator, because the function name is "splitter" which I see as meaning to separate into component parts. Thus the delimiter is a separator. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Apr 27 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 27 Apr 2009 18:36:55 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  
 article
 For the same reason, C accepts enum X { a, b, } but not ,a ,b.
 Mechanically generating enum values is easier if each value has a
 trailing comma.

This has always seemed weird to me. C doesn't accept a trailing comma in function parameter lists. I don't mind it accepting commas in enum blocks mostly because leaving a trailing comma in multi-line blocks can mean a smaller diff if I want to append new elements to the block later, but it certainly isn't sufficient to justify the syntax IMO.

You know, this just reminded me of something. What is the purpose of allowing trailing commas in enums in C? mostly for this: enum { val1, val2, #ifdef INCLUDE_VAL_3 val3 #endif }; Which would require some weird preprocessor logic for val2 if a trailing comma weren't allowed But hasn't this behavior been *specifically* frowned upon by Walter due to it's lack of maintainability? In fact, I'd say that except for C portability (which is becoming more and more a moot argument), we could get rid of allowing the comma at the end of the last enum definition. In fact, it would discourage the undesirable behavior of versioning around elements versus versioning around the enum. I know the argument is over for splitter, but I just thought this was an interesting connection to explore. -Steve
Apr 28 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Apr 28, 2009 at 9:49 AM, grauzone <none example.net> wrote:
 This is one of my favorite rants, anywhere on the world wide internets:

I believe it is "Internet".

It's "internets."
Apr 28 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 Apr 2009 13:12:41 -0400, Robert Fraser  
<fraserofthenight gmail.com> wrote:

 Steven Schveighoffer wrote:
 On Mon, 27 Apr 2009 18:36:55 -0400, Sean Kelly <sean invisibleduck.org>  
 wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  
 article
 For the same reason, C accepts enum X { a, b, } but not ,a ,b.
 Mechanically generating enum values is easier if each value has a
 trailing comma.

This has always seemed weird to me. C doesn't accept a trailing comma in function parameter lists. I don't mind it accepting commas in enum blocks mostly because leaving a trailing comma in multi-line blocks can mean a smaller diff if I want to append new elements to the block later, but it certainly isn't sufficient to justify the syntax IMO.

allowing trailing commas in enums in C? mostly for this: enum { val1, val2, #ifdef INCLUDE_VAL_3 val3 #endif }; Which would require some weird preprocessor logic for val2 if a trailing comma weren't allowed But hasn't this behavior been *specifically* frowned upon by Walter due to it's lack of maintainability? In fact, I'd say that except for C portability (which is becoming more and more a moot argument), we could get rid of allowing the comma at the end of the last enum definition. In fact, it would discourage the undesirable behavior of versioning around elements versus versioning around the enum. I know the argument is over for splitter, but I just thought this was an interesting connection to explore. -Steve

NO! Allowing trailing comma in stuff is great if it's being generated by CTFE, or if it's just a long list you're adding to/removing from/commenting parts out during development. I'd rather trailing commas be allowed in array literals, too.

I'm not strongly in favor of removing the commas, but your arguments aren't that convincing to me. How hard is it to output ", x" instead of "x, " when building an enum body, and then substring the result[2..$]? Adding to/removing from/commenting isn't that hard to deal with the comma. I think this is a small amount of work you are saving. I generally do not leave the trailing comma and it isn't that hard to deal with. It also only affects you if you are messing with the last element. -Steve
Apr 28 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Apr 28, 2009 at 10:23 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 On Tue, 28 Apr 2009 13:12:41 -0400, Robert Fraser
 <fraserofthenight gmail.com> wrote:

 Steven Schveighoffer wrote:
 On Mon, 27 Apr 2009 18:36:55 -0400, Sean Kelly <sean invisibleduck.org>
 wrote:

 =3D=3D Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'=




 article
 For the same reason, C accepts enum X { a, b, } but not ,a ,b.
 Mechanically generating enum values is easier if each value has a
 trailing comma.

This has always seemed weird to me. =A0C doesn't accept a trailing com=




 in function parameter lists. =A0I don't mind it accepting commas in en=




 blocks mostly because leaving a trailing comma in multi-line blocks
 can mean a smaller diff if I want to append new elements to the block
 later, but it certainly isn't sufficient to justify the syntax IMO.

=A0You know, this just reminded me of something. =A0What is the purpose=



 allowing trailing commas in enums in C? =A0mostly for this:
 =A0enum {
 =A0val1,
 =A0val2,
 #ifdef INCLUDE_VAL_3
 =A0val3
 #endif
 };
 =A0Which would require some weird preprocessor logic for val2 if a trai=



 comma weren't allowed
 =A0But hasn't this behavior been *specifically* frowned upon by Walter =



 to it's lack of maintainability? =A0In fact, I'd say that except for C
 portability (which is becoming more and more a moot argument), we could=



 rid of allowing the comma at the end of the last enum definition. =A0In=



 it would discourage the undesirable behavior of versioning around eleme=



 versus versioning around the enum.
 =A0I know the argument is over for splitter, but I just thought this wa=



 interesting connection to explore.
 =A0-Steve

NO! Allowing trailing comma in stuff is great if it's being generated by CTFE, or if it's just a long list you're adding to/removing from/comment=


 parts out during development. I'd rather trailing commas be allowed in a=


 literals, too.

I'm not strongly in favor of removing the commas, but your arguments aren=

 that convincing to me.

 How hard is it to output ", x" instead of "x, " when building an enum bod=

 and then substring the result[2..$]?

.. unless your list is empty in which case you must not substring the resul= t. Some kind of flexible "join" function in the std lib is my preferred soluti= on. --bb
Apr 28 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 Apr 2009 13:40:44 -0400, Bill Baxter <wbaxter gmail.com> wrote:

 On Tue, Apr 28, 2009 at 10:23 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Tue, 28 Apr 2009 13:12:41 -0400, Robert Fraser
 <fraserofthenight gmail.com> wrote:

 Steven Schveighoffer wrote:
 On Mon, 27 Apr 2009 18:36:55 -0400, Sean Kelly  
 <sean invisibleduck.org>
 wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 article
 For the same reason, C accepts enum X { a, b, } but not ,a ,b.
 Mechanically generating enum values is easier if each value has a
 trailing comma.

This has always seemed weird to me.  C doesn't accept a trailing comma in function parameter lists.  I don't mind it accepting commas in enum blocks mostly because leaving a trailing comma in multi-line blocks can mean a smaller diff if I want to append new elements to the block later, but it certainly isn't sufficient to justify the syntax IMO.

 You know, this just reminded me of something.  What is the purpose of allowing trailing commas in enums in C?  mostly for this:  enum {  val1,  val2, #ifdef INCLUDE_VAL_3  val3 #endif };  Which would require some weird preprocessor logic for val2 if a trailing comma weren't allowed  But hasn't this behavior been *specifically* frowned upon by Walter due to it's lack of maintainability?  In fact, I'd say that except for C portability (which is becoming more and more a moot argument), we could get rid of allowing the comma at the end of the last enum definition.  In fact, it would discourage the undesirable behavior of versioning around elements versus versioning around the enum.  I know the argument is over for splitter, but I just thought this was an interesting connection to explore.  -Steve

NO! Allowing trailing comma in stuff is great if it's being generated by CTFE, or if it's just a long list you're adding to/removing from/commenting parts out during development. I'd rather trailing commas be allowed in array literals, too.

I'm not strongly in favor of removing the commas, but your arguments aren't that convincing to me. How hard is it to output ", x" instead of "x, " when building an enum body, and then substring the result[2..$]?

.. unless your list is empty in which case you must not substring the result.

Ah yes, the extremely valuable but seldom used empty enum ;) -Steve
Apr 28 2009