www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - inconsistent behavior of std.string.split

reply zwang <nehzgnaw gmail.com> writes:
According to the documentation:
<spec>
char[][] split(char[] s)
     Split s[] into an array of words, using whitespace as the delimiter.

char[][] split(char[] s, char[] delim)
     Split s[] into an array of words, using delim[] as the delimiter.
</spec>

Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
But the former function discards empty lines while the latter does not.
The following example demonstrates the difference.

<code>
import std.stdio;
import std.string;
void main(){
	writefln(std.string.split("0  3"," ")); //[0,,3]
	writefln(std.string.split("0  3"));     //[0,3]
	writefln(std.string.split("    "," ")); //[,,,,]
	writefln(std.string.split("    "));     //[]
}
</code>
Aug 20 2005
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"zwang" <nehzgnaw gmail.com> wrote in message 
news:de7c7e$17au$1 digitaldaemon.com...
 According to the documentation:
 <spec>
 char[][] split(char[] s)
     Split s[] into an array of words, using whitespace as the delimiter.

 char[][] split(char[] s, char[] delim)
     Split s[] into an array of words, using delim[] as the delimiter.
 </spec>

 Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
 But the former function discards empty lines while the latter does not.
 The following example demonstrates the difference.

 <code>
 import std.stdio;
 import std.string;
 void main(){
 writefln(std.string.split("0  3"," ")); //[0,,3]
 writefln(std.string.split("0  3"));     //[0,3]
 writefln(std.string.split("    "," ")); //[,,,,]
 writefln(std.string.split("    "));     //[]
 }
 </code>

Yeah, the one that takes a delimiter string should skip any zero-length strings in-between delimiters. The whitespace one will keep skipping characters until it hits a non-whitespace one, but the delimiter one will create a new string after every delimiter, when it should just keep reading delimiters until it hits a non-delimiter sequence.
Aug 20 2005
parent reply zwang <nehzgnaw gmail.com> writes:
Jarrett Billingsley wrote:
 "zwang" <nehzgnaw gmail.com> wrote in message 
 news:de7c7e$17au$1 digitaldaemon.com...
 
According to the documentation:
<spec>
char[][] split(char[] s)
    Split s[] into an array of words, using whitespace as the delimiter.

char[][] split(char[] s, char[] delim)
    Split s[] into an array of words, using delim[] as the delimiter.
</spec>

Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
But the former function discards empty lines while the latter does not.
The following example demonstrates the difference.

<code>
import std.stdio;
import std.string;
void main(){
writefln(std.string.split("0  3"," ")); //[0,,3]
writefln(std.string.split("0  3"));     //[0,3]
writefln(std.string.split("    "," ")); //[,,,,]
writefln(std.string.split("    "));     //[]
}
</code>

Yeah, the one that takes a delimiter string should skip any zero-length strings in-between delimiters. The whitespace one will keep skipping characters until it hits a non-whitespace one, but the delimiter one will create a new string after every delimiter, when it should just keep reading delimiters until it hits a non-delimiter sequence.

Keeping zero-length strings is sometimes useful, for example, when parsing a CSV or tab-delimited file. A better solution might be two versions of split that handle consecutive delimiters differently. Or another two overloaded split functions for the special case of whitespace delimiters.
Aug 20 2005
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"zwang" <nehzgnaw gmail.com> wrote in message 
news:de7e7l$18se$1 digitaldaemon.com...
 Keeping zero-length strings is sometimes useful, for example, when parsing 
 a CSV or tab-delimited file. A better solution might be two versions of 
 split that handle consecutive delimiters differently. Or another two 
 overloaded split functions for the special case of whitespace delimiter

Good point.
Aug 20 2005