digitalmars.D.bugs - inconsistent behavior of std.string.split
- zwang <nehzgnaw gmail.com> Aug 20 2005
- "Jarrett Billingsley" <kb3ctd2 yahoo.com> Aug 20 2005
- zwang <nehzgnaw gmail.com> Aug 20 2005
- "Jarrett Billingsley" <kb3ctd2 yahoo.com> Aug 20 2005
According to the documentation:
<spec>
char[][] split(char[] s)
Split s[] into an array of words, using whitespace as the delimiter.
char[][] split(char[] s, char[] delim)
Split s[] into an array of words, using delim[] as the delimiter.
</spec>
Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
But the former function discards empty lines while the latter does not.
The following example demonstrates the difference.
<code>
import std.stdio;
import std.string;
void main(){
writefln(std.string.split("0 3"," ")); //[0,,3]
writefln(std.string.split("0 3")); //[0,3]
writefln(std.string.split(" "," ")); //[,,,,]
writefln(std.string.split(" ")); //[]
}
</code>
Aug 20 2005
"zwang" <nehzgnaw gmail.com> wrote in message news:de7c7e$17au$1 digitaldaemon.com...According to the documentation: <spec> char[][] split(char[] s) Split s[] into an array of words, using whitespace as the delimiter. char[][] split(char[] s, char[] delim) Split s[] into an array of words, using delim[] as the delimiter. </spec> Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v"). But the former function discards empty lines while the latter does not. The following example demonstrates the difference. <code> import std.stdio; import std.string; void main(){ writefln(std.string.split("0 3"," ")); //[0,,3] writefln(std.string.split("0 3")); //[0,3] writefln(std.string.split(" "," ")); //[,,,,] writefln(std.string.split(" ")); //[] } </code>
Yeah, the one that takes a delimiter string should skip any zero-length strings in-between delimiters. The whitespace one will keep skipping characters until it hits a non-whitespace one, but the delimiter one will create a new string after every delimiter, when it should just keep reading delimiters until it hits a non-delimiter sequence.
Aug 20 2005
Jarrett Billingsley wrote:"zwang" <nehzgnaw gmail.com> wrote in message news:de7c7e$17au$1 digitaldaemon.com...According to the documentation: <spec> char[][] split(char[] s) Split s[] into an array of words, using whitespace as the delimiter. char[][] split(char[] s, char[] delim) Split s[] into an array of words, using delim[] as the delimiter. </spec> Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v"). But the former function discards empty lines while the latter does not. The following example demonstrates the difference. <code> import std.stdio; import std.string; void main(){ writefln(std.string.split("0 3"," ")); //[0,,3] writefln(std.string.split("0 3")); //[0,3] writefln(std.string.split(" "," ")); //[,,,,] writefln(std.string.split(" ")); //[] } </code>
Yeah, the one that takes a delimiter string should skip any zero-length strings in-between delimiters. The whitespace one will keep skipping characters until it hits a non-whitespace one, but the delimiter one will create a new string after every delimiter, when it should just keep reading delimiters until it hits a non-delimiter sequence.
Keeping zero-length strings is sometimes useful, for example, when parsing a CSV or tab-delimited file. A better solution might be two versions of split that handle consecutive delimiters differently. Or another two overloaded split functions for the special case of whitespace delimiters.
Aug 20 2005
"zwang" <nehzgnaw gmail.com> wrote in message news:de7e7l$18se$1 digitaldaemon.com...Keeping zero-length strings is sometimes useful, for example, when parsing a CSV or tab-delimited file. A better solution might be two versions of split that handle consecutive delimiters differently. Or another two overloaded split functions for the special case of whitespace delimiter
Good point.
Aug 20 2005








"Jarrett Billingsley" <kb3ctd2 yahoo.com>