www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - A use case for fromStringz

reply Andrej Mitrovic <none none.none> writes:
There are situations where you have to call a C dispatch function, and pass it
a void* and a selector. The selector lets you choose what the C function does,
for example an enum constant selector `kGetProductName` could ask the C
function to fill a null-terminated string at the location of the void* you've
passed in.

One way of doing this is to pass the .ptr field of a static or dynamic char
array to the C function, letting it fill the array with a null-terminated
string.

But here's the problem: If you try to print out that array in D code with e.g.
writefln, it will print out the _entire length_ of the array.

This is a problem because the array could quite likely be filled with garbage
values after the null terminator. In fact I just had that case when interfacing
with C.

to!string can convert a null-terminated C string to a D string, with the length
matching the location of the null-terminator. But for char arrays, it won't do
any checks for null terminators. It only does this if you explicitly pass it a
char*.

So I've come up with a very simple solution:

module fromStringz2;

import std.stdio;
import std.conv;
import std.traits;
import std.string;

enum
{
    kGetProductName = 1
}

// imagine this function is defined in a C DLL
extern(C) void cDispatch(void* payload, int selector)
{
    if (selector == kGetProductName)
    {
        char* val = cast(char*)payload;
        val[0] = 'a';
        val[1] = 'b';
        val[2] = 'c';
        val[3] = '\0';
    }
}

string fromStringz(T)(T value) 
{
    static if (isArray!T)
    {
        return to!string(cast(char*)value);
    }
    else
    {
        return to!string(value);
    }
}

string getNameOld()
{
    static char[256] name;
    cDispatch(name.ptr, kGetProductName);
    return to!string(name);
}

string getNameNew()
{
    static char[256] name;
    cDispatch(name.ptr, kGetProductName);
    return fromStringz(name);    
}

void main()
{
    assert(getNameOld().length == 256);  // values after [3] could quite
                                                          // likely be garbage
    assert(getNameNew().length == 3);
}


I admit I didn't take Unicode into account, so its far from being perfect or
safe.

In any case I think its useful to have such a function, since you generally do
not want the part of a C string after the null terminator.
Mar 31 2011
next sibling parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
Why not:

 string getNameOld()
 {
     static char[256] name;
     cDispatch(name.ptr, kGetProductName);
     return to!string(name.ptr);
 }
Mar 31 2011
parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/31/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:
 Why not:

  string getNameOld()
  {
      static char[256] name;
      cDispatch(name.ptr, kGetProductName);
      return to!string(name.ptr);
  }
Nice catch! But see my second reply. If a null terminator is missing and we know we're operating on a D array (which has a length), then it could be best to check for a null terminator. If there isn't one it is highly likely that the array contains garbage.
Mar 31 2011
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Actually, this still suffers from the problem when the returned char*
doesn't have a null terminator. It really sucks when C code does that,
and I've just experienced that. There is a solution though:

Since we can detect the length of the D array passed into
`fromStringz`, we can do the job of to!string ourselves and check for
a null terminator. If one isn't found, we return a string of length 0.
Here's an updated version which doesn't suffer from the missing null
terminator problem:

string fromStringz(T)(T value)
{
    static if (isArray!T)
    {
        if (value is null || value.length == 0)
        {
            return "";
        }

        auto nullPos = value.indexOf("\0");

        if (nullPos == -1)
            return "";

        return to!string(value[0..nullPos]);
    }
    else
    {
        return to!string(value);
    }
}
Mar 31 2011
next sibling parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
Andrej Mitrovic Wrote:

 Actually, this still suffers from the problem when the returned char*
 doesn't have a null terminator. It really sucks when C code does that,
 and I've just experienced that. There is a solution though:
 
 Since we can detect the length of the D array passed into
 `fromStringz`, we can do the job of to!string ourselves and check for
 a null terminator. If one isn't found, we return a string of length 0.
 Here's an updated version which doesn't suffer from the missing null
 terminator problem:
I do not know the proper action if the string you receive is garbage. Shouldn't it throw an exception since it did not receive a string? This to me seems like a validation issue. If the functions you are calling are expected to return improper data _you_ must validate what your receive, that includes running it through utf validation.
Mar 31 2011
parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Oh I'm not trying to get this into Phobos, I just needed the function
so I wrote it and sharing it here. Maybe it should throw. For my
purposes I don't need it to throw. :)
Mar 31 2011
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 3/31/11 11:18 PM, Andrej Mitrovic wrote:
 Actually, this still suffers from the problem when the returned char*
 doesn't have a null terminator. It really sucks when C code does that,
 and I've just experienced that. There is a solution though:
In those cases, doesn't the function return the length of the filled data or something like that?
 Since we can detect the length of the D array passed into
 `fromStringz`, we can do the job of to!string ourselves and check for
 a null terminator. If one isn't found, we return a string of length 0.
 Here's an updated version which doesn't suffer from the missing null
 terminator problem:

 string fromStringz(T)(T value)
 {
      static if (isArray!T)
      {
          if (value is null || value.length == 0)
          {
              return "";
          }

          auto nullPos = value.indexOf("\0");

          if (nullPos == -1)
              return "";

          return to!string(value[0..nullPos]);
      }
      else
      {
          return to!string(value);
      }
 }
-- /Jacob Carlborg
Apr 01 2011
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 4/1/11, Jacob Carlborg <doob me.com> wrote:
 In those cases, doesn't the function return the length of the filled
 data or something like that?
I know what you mean. I would expect a C function to do just that, but in this case it does not. Its lame but I have to deal with it.
Apr 01 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Hmm.. now I need a function that converts a wchar* to a wchar[] or
wstring. There doesn't seem to be anything in Phobos for this type of
conversion. Or maybe I haven't looked hard enough?

I don't know whether this is safe since I'm not sure how the null
terminator is represented in utf16, but it does seem to work ok from a
few test cases:

wstring fromWStringz(wchar* value)
{
    if (value is null)
        return "";

    auto oldPos = value;

    uint nullPos;
    while (*value++ != '\0')
    {
        nullPos++;
    }

    if (nullPos == 0)
        return "";

    return to!wstring(oldPos[0..nullPos]);
}

I thought we would pay more attention to interfacing with C code.
Since D is supposed to work side-by-side with C, we should have more
functions that convert common data types between the two languages.
Apr 15 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Microsoft has some of the most ridiculous functions. This one
(GetEnvironmentStrings) returns a pointer to a block of
null-terminated strings, with no information on the count of strings
returned. Each string ends with a null-terminator, standard stuff. But
only when you find two null terminators in succession you'll know that
you've reached the end of the entire block of strings.

So from some example code I've seen, people usually create a count
variable and increment it for every null terminator in the block until
they find a double null terminator. And then they have to loop all
over again when constructing a list of strings.

Talk about inefficient designs.. There's also a wchar* edition of this
function, I don't want to even touch it. Here's what the example code
looks like:

    char *l_EnvStr;
    l_EnvStr = GetEnvironmentStrings();

    LPTSTR l_str = l_EnvStr;

    int count = 0;
    while (true)
    {
        if (*l_str == 0)
            break;

        while (*l_str != 0)
            l_str++;

        l_str++;
        count++;
    }

    for (int i = 0; i < count; i++)
    {
        printf("%s\n", l_EnvStr);
        while(*l_EnvStr != '\0')
            l_EnvStr++;

        l_EnvStr++;
    }

    FreeEnvironmentStrings(l_EnvStr);

I wonder.. in all these years.. have they ever thought about using a
convention in C where the length is embedded as a 32/64bit value at
the pointed location of a pointer, followed by the array contents?

I mean something like the following (I'm pseudocoding here, this is
not valid C code, and it's 7 AM.):

// allocate memory for the length field + character count
char* mystring = malloc(sizeof(size_t) + sizeof(char)*length);
*(cast(size_t*)mystring) = length;  // embed the length

// call a function expecting a char*
printString(mystring);

// void printString(char* string)
{
    size_t length = *(cast(size_t*)string);
    (cast(size_t*)string)++;  // skip count to reach first char

    // now print all chars one by one
    for (size_t i; i < length; i++)
    {
        printChar(*string++);
    }
}

Well, they can always use an extra parameter in a function that has
the length, but it seems many people are too lazy to even do that. I
guess C programmers just *love* their nulls. :p
Apr 15 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Yeah I basically took the idea from the existing D implementation.
Although D's arrays are a struct with a length and a pointer (I think
so).
Apr 16 2011