www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Biggest problems w/ D - strings

reply C. Dunn <cdunn2001 gmail.com> writes:
Kirk McDonald Wrote:

 C. Dunn wrote:
 4) Not enough help for converting between D strings and C char*. 
 There must be conversion functions which work regardless of whether
 the D string is dynamic or not, and regardless of whether the C char*
 is null terminated.  I'm not sure what the answer is, but this has
 lead to a large number of runtime bugs for me as a novice.
 
The std.string module has the toStringz and toString functions.
I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated). template min(T) { T min( T a, T b ) { if ( a < b ) return a; else return b; } } template max(T) { T max( T a, T b ) { if ( a < b ) return b; else return a; } } size_t strnlen(char* s, size_t maxlen){ for(size_t i=0; i<maxlen; ++i){ if (!s[i]) return i; } return maxlen; } int compare(ConstString lhs, ConstString rhs){ char* lptr = cast(char*)lhs; char* rptr = cast(char*)rhs; size_t len_lhs = strnlen(lptr, lhs.length); size_t len_rhs = strnlen(rptr, rhs.length); int comp = strncmp(lptr, rptr, min!(size_t)(len_lhs, len_rhs)); if (comp) return comp; if (len_lhs < len_rhs) return -1; else if (len_lhs > len_rhs) return 1; else return 0; }
Aug 10 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
C. Dunn wrote:
 Kirk McDonald Wrote:
 
 C. Dunn wrote:
 4) Not enough help for converting between D strings and C char*. 
 There must be conversion functions which work regardless of whether
 the D string is dynamic or not, and regardless of whether the C char*
 is null terminated.  I'm not sure what the answer is, but this has
 lead to a large number of runtime bugs for me as a novice.
The std.string module has the toStringz and toString functions.
I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).
I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
Aug 10 2007
parent reply BCS <ao pathlink.com> writes:
Reply to Sean,

 C. Dunn wrote:
 
 Kirk McDonald Wrote:
 
 C. Dunn wrote:
 
 4) Not enough help for converting between D strings and C char*.
 There must be conversion functions which work regardless of whether
 the D string is dynamic or not, and regardless of whether the C
 char* is null terminated.  I'm not sure what the answer is, but
 this has lead to a large number of runtime bugs for me as a novice.
 
The std.string module has the toStringz and toString functions.
I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).
I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
He might be using a D char[] as an oversized buffer for a c style string.
Aug 10 2007
parent reply C. Dunn <cdunn2001 gmail.com> writes:
BCS Wrote:

 Reply to Sean,
 
 C. Dunn wrote:
 
 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
 
I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
He might be using a D char[] as an oversized buffer for a c style string.
Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).
Aug 10 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
C. Dunn wrote:
 BCS Wrote:
 
 Reply to Sean,

 C. Dunn wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
He might be using a D char[] as an oversized buffer for a c style string.
Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).
Oh I see. Well, it isn't much help, but std::string in C++ isn't null-terminated either, so this issue isn't unique to D. Unfortunately, I think a custom comparator, like the one you've written, is the best choice here. That or property methods to make Data act more D-like. The get/set routines could return and accept 'normal' D strings, perform length validation, etc. Sean
Aug 10 2007
parent Regan Heath <regan netmail.co.nz> writes:
Sean Kelly wrote:
 C. Dunn wrote:
 BCS Wrote:

 Reply to Sean,

 C. Dunn wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
He might be using a D char[] as an oversized buffer for a c style string.
Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).
Oh I see. Well, it isn't much help, but std::string in C++ isn't null-terminated either, so this issue isn't unique to D. Unfortunately, I think a custom comparator, like the one you've written, is the best choice here. That or property methods to make Data act more D-like. The get/set routines could return and accept 'normal' D strings, perform length validation, etc.
Something like this: (borrowing from Derek's solution, which I quite liked BTW) import std.string, std.stdio; // Return a slice of the leftmost portion of 'x' // up to but not including the first 'c' string lefts(string x, char c) { int p; p = std.string.find(x,c); if (p < 0) p = x.length; return x[0..p]; } struct Data { int id; char[32] _name; string name() { return lefts(_name, '\0'); } } Data zero; Data full; Data some; static this() { zero._name[] = '\0'; full._name[] = 'a'; some._name[0..10] = 'a'; some._name[10..$] = '\0'; } void main() { char[] other; other.length = 10; other[] = 'a'; assert(other != zero.name); assert(other != full.name); assert(other == some.name); } Regan
Aug 11 2007
prev sibling parent kenny <funisher gmail.com> writes:
C. Dunn wrote:
 BCS Wrote:
 
 Reply to Sean,

 C. Dunn wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
He might be using a D char[] as an oversized buffer for a c style string.
Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).
I use postgre and mysql for lots of things. Postgre is much easier to grab the string length from cause it returns with the tuple. If I remember right, internally, the schema is stored, then each string looks like this: for varchar <= 255 struct Firstname { ubyte length; char[size] data; } for varchar <= 65535 struct Firstname { ushort length; char[size] data; } -------------------------- why would you want to zero terminate your strings in a database form. It doesn't make any sense. you trade off 1 byte of savings for up to 255 loops to find zero, or two bytes of savings for up to 65535 loops. Consider you have two options... you can always null terminate it -- which means that for strings shorter than 256 chars, you don't save anything -- or you could do this to keep the last char: uint i; for(i = 0; i < 256; i++) { if(str[i] == 0) { break; } } return i; but then that has two checks instead of one (i < 256 && str[i] != 0). In postgre, using libpq, something like what you're saying is very easy... int len = PQgetlength(res, row, offset); if(len >= 0) { char* r = PQgetvalue(res, row, offset); char[] rr; rr.length = len; rr[0 .. len] = r[0 .. len]; } I really suggest using string lengths. it will save you tons of processing power. (especially if you are > 65535 chars in length) and also, by storing the length, you also have the added advantage of being able to store binary data in there, because a zero in the string won't terminate the string. Also, you may find out that people can end strings early passing malformed utf-8 sequences and such too. Every C library that I use, which uses null terminated strings, I quickly convert them to the dark side for the above reasons. walter is very smart making strings that way -- for slicing purposes too :) Example, imagine a RIGHT(str, 5) function with null terminated strings, then think of it in D: (str.length > 5 ? str[length-5 .. length] : str); ok, enough rambling... I LOVE strings in D :) Kenny
Aug 11 2007
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 10 Aug 2007 17:49:01 -0400, C. Dunn wrote:

 
 I have a field of n chars stored on disk.  It holds a
 null-terminated string, padded with zeroes.
 It is amazingly difficult to compare such a char[n]
 with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
You could try this simpler method ... import std.string; // Return a slice of the leftmost portion of 'x' // up to but not including the first 'c' string lefts(string x, char c) { int p; p = std.string.find(x,c); if (p < 0) p = x.length; return x[0..p]; } int compare(string lhs, string rhs, char d = '\0') { return std.string.cmp( lefts(lhs,d), lefts(rhs,d) ); } and use it like ... char[32] NameA; char[56] NameB; NameA[] = ' '; NameB[] = ' '; NameA[0..5] = "derek"; NameB[0..7] = "parnell"; result = compare(NameA, NameB); -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Aug 11 2007
parent Derek Parnell <derek psych.ward> writes:
On Sat, 11 Aug 2007 20:07:51 +1000, Derek Parnell wrote:


Oops! Of course I really meant ... 

and use it like ...

   char[32] NameA;
   char[56] NameB;
   NameA[] = '\0'; 
   NameB[] = '\0';

   NameA[0..5] = "derek";
   NameB[0..7] = "parnell";
   result = compare(NameA, NameB);


-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
Aug 11 2007
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated string,
padded with zeroes.  It is amazingly difficult to compare such a char[n] with
some other char[] (which, by the dictates of D, may or may not be
null-terminated).
This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings are somewhat similar to D's strings - they have a length property, and allow the string to contain zeroes. Because of that, you couldn't just typecast a string to a PChar, due to lack of a terminating zero. Borland solved the problem by having strings always have a null terminating byte at their end, thus allowing you to typecast a string directly to a PChar. I noticed that memory for arrays are always allocated with an extra byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related and can be used for this purpose..? -- Best regards, Vladimir mailto:thecybershadow gmail.com
Aug 13 2007
next sibling parent Regan Heath <regan netmail.co.nz> writes:
Vladimir Panteleev wrote:
 On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com>
 wrote:
 
 I have a field of n chars stored on disk.  It holds a
 null-terminated string, padded with zeroes.  It is amazingly
 difficult to compare such a char[n] with some other char[] (which,
 by the dictates of D, may or may not be null-terminated).
This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings are somewhat similar to D's strings - they have a length property, and allow the string to contain zeroes. Because of that, you couldn't just typecast a string to a PChar, due to lack of a terminating zero. Borland solved the problem by having strings always have a null terminating byte at their end, thus allowing you to typecast a string directly to a PChar. I noticed that memory for arrays are always allocated with an extra byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related and can be used for this purpose..?
It could but the problem remains when dealing with slices, eg. string foo = "this is a test"; string bar = foo[5..9]; the byte following the end of the slice 'bar' is ' ' not '\0'. I believe there was/is a hack in toStringz which checks the byte following the slice and if it's '\0' already, does nothing but return the input string. Regan
Aug 13 2007
prev sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Vladimir Panteleev wrote:
 On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:
 
 I have a field of n chars stored on disk.  It holds a null-terminated string,
padded with zeroes.  It is amazingly difficult to compare such a char[n] with
some other char[] (which, by the dictates of D, may or may not be
null-terminated).
This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings are somewhat similar to D's strings - they have a length property, and allow the string to contain zeroes. Because of that, you couldn't just typecast a string to a PChar, due to lack of a terminating zero. Borland solved the problem by having strings always have a null terminating byte at their end, thus allowing you to typecast a string directly to a PChar. I noticed that memory for arrays are always allocated with an extra byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related and can be used for this purpose..?
It's probably to make sure a one-past-the-end pointer is also counted as a reference to a memory block by the garbage collector. Though if it's initialized to 0, it could be used for the purpose you describe as a side-effect.
Aug 13 2007