www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - odd behavior of split() function

reply "Bedros" <2bedros gmail.com> writes:
I would like to split "A+B+C+D" into "A", "B", "C", "D"

but when using split() I get

"A+B+C+D", "B+C+D", "C+D", "D"


the code is below


import std.stdio;
import std.string;
import std.array;

int main()
{
      string [] str_list;
      string test_str = "A+B+C+D";
      str_list = test_str.split("+");
      foreach(item; str_list)
              printf("%s\n", cast(char*)item);

      return 0;
}
Jun 07 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, June 07, 2013 09:18:57 Bedros wrote:
 I would like to split "A+B+C+D" into "A", "B", "C", "D"
 
 but when using split() I get
 
 "A+B+C+D", "B+C+D", "C+D", "D"
 
 
 the code is below
 
 
 import std.stdio;
 import std.string;
 import std.array;
 
 int main()
 {
       string [] str_list;
       string test_str = "A+B+C+D";
       str_list = test_str.split("+");
       foreach(item; str_list)
               printf("%s\n", cast(char*)item);
 
       return 0;
 }
That would be because of your misuse of printf. If you used foreach(item; str_list) writeln(item); you would have been fine. D string literals do happen to have a null character one past their end so that you can pass them directly to C functions, but D strings in general are _not_ null terminated, and printf expects strings to be null terminated. If you want to convert a D string to a null terminated string, you need to use std.string.toStringz, not a cast. You should pretty much never cast a D string to char* or const char* or any variant thereof. So, you could have done printf("%s\n", toStringz(item)); but I don't know why you'd want to use printf rather than writeln or writefln - both of which (unlike printf) are typesafe and understand D types. You got "A+B+C+D", "B+C+D", "C+D", "D" because the original string (being a string literal) had a null character one past its end, and each of the strings returned by split was a slice of the original string, and printf blithely ignored the actual boundaries of the slice looking for the next null character that it happened to find in memory, which - because they were all slices of the same string literal - happened to be the end of the original string literal. And the strings printed differed, because each slice started in a different portion of the underlying array. - Jonathan M Davis
Jun 07 2013
parent reply "Bedros" <2bedros gmail.com> writes:
first of all, many thanks for the quick reply.

I'm learning D and it's just because of the habit I unconsciously 
used printf instead of writef

thanks again.

-Bedros

On Friday, 7 June 2013 at 07:29:48 UTC, Jonathan M Davis wrote:
 On Friday, June 07, 2013 09:18:57 Bedros wrote:
 I would like to split "A+B+C+D" into "A", "B", "C", "D"
 
 but when using split() I get
 
 "A+B+C+D", "B+C+D", "C+D", "D"
 
 
 the code is below
 
 
 import std.stdio;
 import std.string;
 import std.array;
 
 int main()
 {
       string [] str_list;
       string test_str = "A+B+C+D";
       str_list = test_str.split("+");
       foreach(item; str_list)
               printf("%s\n", cast(char*)item);
 
       return 0;
 }
That would be because of your misuse of printf. If you used foreach(item; str_list) writeln(item); you would have been fine. D string literals do happen to have a null character one past their end so that you can pass them directly to C functions, but D strings in general are _not_ null terminated, and printf expects strings to be null terminated. If you want to convert a D string to a null terminated string, you need to use std.string.toStringz, not a cast. You should pretty much never cast a D string to char* or const char* or any variant thereof. So, you could have done printf("%s\n", toStringz(item)); but I don't know why you'd want to use printf rather than writeln or writefln - both of which (unlike printf) are typesafe and understand D types. You got "A+B+C+D", "B+C+D", "C+D", "D" because the original string (being a string literal) had a null character one past its end, and each of the strings returned by split was a slice of the original string, and printf blithely ignored the actual boundaries of the slice looking for the next null character that it happened to find in memory, which - because they were all slices of the same string literal - happened to be the end of the original string literal. And the strings printed differed, because each slice started in a different portion of the underlying array. - Jonathan M Davis
Jun 07 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 07.06.2013 09:53, schrieb Bedros:
 first of all, many thanks for the quick reply.

 I'm learning D and it's just because of the habit I unconsciously used
 printf instead of writef

 thanks again.

 -Bedros

 On Friday, 7 June 2013 at 07:29:48 UTC, Jonathan M Davis wrote:
 On Friday, June 07, 2013 09:18:57 Bedros wrote:
 I would like to split "A+B+C+D" into "A", "B", "C", "D"

 but when using split() I get

 "A+B+C+D", "B+C+D", "C+D", "D"


 the code is below


 import std.stdio;
 import std.string;
 import std.array;

 int main()
 {
       string [] str_list;
       string test_str = "A+B+C+D";
       str_list = test_str.split("+");
       foreach(item; str_list)
               printf("%s\n", cast(char*)item);

       return 0;
 }
That would be because of your misuse of printf. If you used foreach(item; str_list) writeln(item); you would have been fine. D string literals do happen to have a null character one past their end so that you can pass them directly to C functions, but D strings in general are _not_ null terminated, and printf expects strings to be null terminated. If you want to convert a D string to a null terminated string, you need to use std.string.toStringz, not a cast. You should pretty much never cast a D string to char* or const char* or any variant thereof. So, you could have done printf("%s\n", toStringz(item)); but I don't know why you'd want to use printf rather than writeln or writefln - both of which (unlike printf) are typesafe and understand D types. You got "A+B+C+D", "B+C+D", "C+D", "D" because the original string (being a string literal) had a null character one past its end, and each of the strings returned by split was a slice of the original string, and printf blithely ignored the actual boundaries of the slice looking for the next null character that it happened to find in memory, which - because they were all slices of the same string literal - happened to be the end of the original string literal. And the strings printed differed, because each slice started in a different portion of the underlying array. - Jonathan M Davis
You can use printf if you want to, the correct usage is not so nice though: string str = "test"; printf("%.*s", str.length, str.ptr); Kind Regards Benjamin Thaut
Jun 07 2013