digitalmars.D.learn - String created from buffer has wrong length and strip() result is

Lucas Burson (29/29) Oct 16 2014 When creating a string from a ubyte[], I have an invalid length

thedeemon (9/22) Oct 17 2014 Unlike C, strings in D are not zero-terminated by default, they

thedeemon (2/5) Oct 17 2014 Edit: you fill first 5 chars and have 11 bytes of zeroes in the
spir via Digitalmars-d-learn (4/24) Oct 17 2014 Side-note: since your string has those zeroes at the end, strip only rem...

Lucas Burson (9/44) Oct 17 2014 Okay things are becoming more clear. The cast to string is

ketmar via Digitalmars-d-learn (9/13) Oct 17 2014 On Fri, 17 Oct 2014 15:24:21 +0000

Lucas Burson (6/21) Oct 17 2014 The buffer is populated from a scsi ioctl so it "should" be only

ketmar via Digitalmars-d-learn (6/10) Oct 17 2014 On Fri, 17 Oct 2014 16:08:04 +0000

Lucas Burson (38/42) Oct 17 2014 Perfect, great idea. Below is my utility method to pull strings

ketmar via Digitalmars-d-learn (32/81) Oct 17 2014 On Sat, 18 Oct 2014 00:32:09 +0000
ketmar via Digitalmars-d-learn (5/5) Oct 17 2014 On Sat, 18 Oct 2014 00:32:09 +0000

Lucas Burson (4/7) Oct 18 2014 Wow, your changes made it much simpler. Thank you for the

ketmar via Digitalmars-d-learn (4/6) Oct 18 2014 On Sat, 18 Oct 2014 16:56:09 +0000

ketmar via Digitalmars-d-learn (5/7) Oct 17 2014 On Fri, 17 Oct 2014 18:30:43 +0300

"Lucas Burson" <ljdelight+dlang gmail.com> writes:

When creating a string from a ubyte[], I have an invalid length 
and string.strip() doesn't strip off all whitespace. I'm new to 
the language. Is this a compiler issue?


import std.string : strip;
import std.stdio  : writefln;

int main()
{
    const string ATA_STR = " ATA ";

    // this works fine
    {
       ubyte[] buffer = [' ', 'A', 'T', 'A', ' ' ];
       string test = strip(cast(string)(buffer));
       assert(test == strip(ATA_STR));
    }

    // This is where things breaks
    {
       ubyte[] buff = new ubyte[16];
       buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

       // read the string back from the buffer, stripping 
whitespace
       string stringFromBuffer = strip(cast(string)(buff[0..16]));
       // this shows strip() doesn't remove all whitespace
       writefln("StrFromBuff is '%s'; length %d", 
stringFromBuffer, stringFromBuffer.length);

       // !! FAILS. stringFromBuffer is length 15, not 3.
       assert(stringFromBuffer.length == strip(ATA_STR).length);

    }

    return 0;
}

Oct 16 2014

"thedeemon" <dlang thedeemon.com> writes:

On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:

    // This is where things breaks
    {
       ubyte[] buff = new ubyte[16];
       buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

       // read the string back from the buffer, stripping 
 whitespace
       string stringFromBuffer = 
 strip(cast(string)(buff[0..16]));
       // this shows strip() doesn't remove all whitespace
       writefln("StrFromBuff is '%s'; length %d", 
 stringFromBuffer, stringFromBuffer.length);

       // !! FAILS. stringFromBuffer is length 15, not 3.
       assert(stringFromBuffer.length == strip(ATA_STR).length);

Unlike C, strings in D are not zero-terminated by default, they 
are just arrays, i.e. a pair of pointer and size. You create an 
array of 16 bytes and cast it to string, now you have a 16-chars 
string. You fill first few chars with data from ATA_STR but the 
rest 10 bytes of the array are still part of the string, not 
initialized with data, so having zeroes. Since this tail of 
zeroes is not whitespace (tabs or spaces etc.) 'strip' doesn't 
remove it.

Oct 17 2014

"thedeemon" <dlang thedeemon.com> writes:

You fill first few chars with data from
 ATA_STR but the rest 10 bytes of the array are still part of 
 the string

Edit: you fill first 5 chars and have 11 bytes of zeroes in the 
tail. My counting skill is too bad. ;)

Oct 17 2014

spir via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote:
 On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:

    // This is where things breaks
    {
       ubyte[] buff = new ubyte[16];
       buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

       // read the string back from the buffer, stripping whitespace
       string stringFromBuffer = strip(cast(string)(buff[0..16]));
       // this shows strip() doesn't remove all whitespace
       writefln("StrFromBuff is '%s'; length %d", stringFromBuffer,
 stringFromBuffer.length);

       // !! FAILS. stringFromBuffer is length 15, not 3.
       assert(stringFromBuffer.length == strip(ATA_STR).length);

 Unlike C, strings in D are not zero-terminated by default, they are just
arrays,
 i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to
 string, now you have a 16-chars string. You fill first few chars with data from
 ATA_STR but the rest 10 bytes of the array are still part of the string, not
 initialized with data, so having zeroes. Since this tail of zeroes is not
 whitespace (tabs or spaces etc.) 'strip' doesn't remove it.

Side-note: since your string has those zeroes at the end, strip only removes
the 
space at start (thus, final size=15), instead of at both ends.

d

Oct 17 2014

"Lucas Burson" <ljdelight+dlang gmail.com> writes:

On Friday, 17 October 2014 at 08:31:04 UTC, spir via 
Digitalmars-d-learn wrote:
 On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote:
 On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:

   // This is where things breaks
   {
      ubyte[] buff = new ubyte[16];
      buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

      // read the string back from the buffer, stripping 
 whitespace
      string stringFromBuffer = 
 strip(cast(string)(buff[0..16]));
      // this shows strip() doesn't remove all whitespace
      writefln("StrFromBuff is '%s'; length %d", 
 stringFromBuffer,
 stringFromBuffer.length);

      // !! FAILS. stringFromBuffer is length 15, not 3.
      assert(stringFromBuffer.length == strip(ATA_STR).length);

 Unlike C, strings in D are not zero-terminated by default, 
 they are just arrays,
 i.e. a pair of pointer and size. You create an array of 16 
 bytes and cast it to
 string, now you have a 16-chars string. You fill first few 
 chars with data from
 ATA_STR but the rest 10 bytes of the array are still part of 
 the string, not
 initialized with data, so having zeroes. Since this tail of 
 zeroes is not
 whitespace (tabs or spaces etc.) 'strip' doesn't remove it.

 Side-note: since your string has those zeroes at the end, strip 
 only removes the space at start (thus, final size=15), instead 
 of at both ends.

 d

Okay things are becoming more clear. The cast to string is 
nothing like the C++ string ctor, I made a bad assumption.

So given the below buffer would I use fromStringz (is this in the 
stdlib?) to cast it from a null-terminated buffer to a good 
string? Shouldn't the compiler give a warning about casting a 
buffer to a string without using fromStringz?

Buffer = [ 0x20, 0x41, 0x54, 0x41, 0x20, 0x00, 0x00, ...]?

Oct 17 2014