digitalmars.D - Re: Biggest problems w/ D

digitalmars.D - Re: Biggest problems w/ D - strings

C. Dunn (27/36) Aug 10 2007 I have a field of n chars stored on disk. It holds a null-terminated st...

Sean Kelly (4/16) Aug 10 2007 I'm not sure I understand. Why bother computing string length in the C

BCS (2/27) Aug 10 2007 He might be using a D char[] as an oversized buffer for a c style string...

C. Dunn (8/25) Aug 10 2007 Exactly. This is very common in the database world. The disk record ha...

Sean Kelly (8/36) Aug 10 2007 Oh I see. Well, it isn't much help, but std::string in C++ isn't

Regan Heath (42/84) Aug 11 2007 Something like this: (borrowing from Derek's solution, which I quite

kenny (34/62) Aug 11 2007 I use postgre and mysql for lots of things. Postgre is much easier to gr...

Derek Parnell (29/35) Aug 11 2007 You could try this simpler method ...

Derek Parnell (14/14) Aug 11 2007 On Sat, 11 Aug 2007 20:07:51 +1000, Derek Parnell wrote:

Vladimir Panteleev (6/7) Aug 13 2007 This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), yo...

Regan Heath (9/29) Aug 13 2007 It could but the problem remains when dealing with slices, eg.
Frits van Bommel (5/12) Aug 13 2007 It's probably to make sure a one-past-the-end pointer is also counted as...

C. Dunn <cdunn2001 gmail.com> writes:

Kirk McDonald Wrote:

 C. Dunn wrote:
 4) Not enough help for converting between D strings and C char*. 
 There must be conversion functions which work regardless of whether
 the D string is dynamic or not, and regardless of whether the C char*
 is null terminated.  I'm not sure what the answer is, but this has
 lead to a large number of runtime bugs for me as a novice.
 

 
 The std.string module has the toStringz and toString functions.

I have a field of n chars stored on disk.  It holds a null-terminated string,
padded with zeroes.  It is amazingly difficult to compare such a char[n] with
some other char[] (which, by the dictates of D, may or may not be
null-terminated).


template min(T)
{
   T min( T a, T b ) { if ( a < b ) return a; else return b; }
}

template max(T)
{
   T max( T a, T b ) { if ( a < b ) return b; else return a; }
}

size_t strnlen(char* s, size_t maxlen){
  for(size_t i=0; i<maxlen; ++i){
    if (!s[i]) return i;
  }
  return maxlen;
}

int compare(ConstString lhs, ConstString rhs){
  char* lptr = cast(char*)lhs;
  char* rptr = cast(char*)rhs;
  size_t len_lhs = strnlen(lptr, lhs.length);
  size_t len_rhs = strnlen(rptr, rhs.length);
  int comp = strncmp(lptr, rptr, min!(size_t)(len_lhs, len_rhs));
  if (comp) return comp;
  if (len_lhs < len_rhs) return -1;
  else if (len_lhs > len_rhs) return 1;
  else return 0;
}

Aug 10 2007

Sean Kelly <sean f4.ca> writes:

C. Dunn wrote:
 Kirk McDonald Wrote:
 
 C. Dunn wrote:
 4) Not enough help for converting between D strings and C char*. 
 There must be conversion functions which work regardless of whether
 the D string is dynamic or not, and regardless of whether the C char*
 is null terminated.  I'm not sure what the answer is, but this has
 lead to a large number of runtime bugs for me as a novice.

 The std.string module has the toStringz and toString functions.

 
 I have a field of n chars stored on disk.  It holds a null-terminated string,
padded with zeroes.  It is amazingly difficult to compare such a char[n] with
some other char[] (which, by the dictates of D, may or may not be
null-terminated).

I'm not sure I understand.  Why bother computing string length in the C 
fashion when D provides a .length property which holds this information?


Sean

Aug 10 2007

BCS <ao pathlink.com> writes:

Reply to Sean,

 C. Dunn wrote:
 
 Kirk McDonald Wrote:
 
 C. Dunn wrote:
 
 4) Not enough help for converting between D strings and C char*.
 There must be conversion functions which work regardless of whether
 the D string is dynamic or not, and regardless of whether the C
 char* is null terminated.  I'm not sure what the answer is, but
 this has lead to a large number of runtime bugs for me as a novice.
 

 The std.string module has the toStringz and toString functions.
 

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
 

 I'm not sure I understand.  Why bother computing string length in the
 C fashion when D provides a .length property which holds this
 information?
 
 Sean
 

He might be using a D char[] as an oversized buffer for a c style string.

Aug 10 2007

C. Dunn <cdunn2001 gmail.com> writes:

BCS Wrote:

 Reply to Sean,
 
 C. Dunn wrote:
 
 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).
 

 I'm not sure I understand.  Why bother computing string length in the
 C fashion when D provides a .length property which holds this
 information?
 
 Sean
 

 
 He might be using a D char[] as an oversized buffer for a c style string.

Exactly.  This is very common in the database world.  The disk record has a
fixed size, so I have a struct which looks like this:

struct Data{
  int id;
  char[32] name;
  // ...
};

A C function produces this data.  D can accept the C struct with no problems. 
'name' is just a static array.  But processing the name field in D is awkward. 
'name.length' is 32, but 'strlen(name)' could be less (or infinity if the
string is a full 32 characters sans zeroes, which is why I need strnlen()).

Aug 10 2007

Sean Kelly <sean f4.ca> writes:

C. Dunn wrote:
 BCS Wrote:
 
 Reply to Sean,

 C. Dunn wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).

 I'm not sure I understand.  Why bother computing string length in the
 C fashion when D provides a .length property which holds this
 information?

 Sean

 He might be using a D char[] as an oversized buffer for a c style string.

 
 Exactly.  This is very common in the database world.  The disk record has a
fixed size, so I have a struct which looks like this:
 
 struct Data{
   int id;
   char[32] name;
   // ...
 };
 
 A C function produces this data.  D can accept the C struct with no problems. 
'name' is just a static array.  But processing the name field in D is awkward. 
'name.length' is 32, but 'strlen(name)' could be less (or infinity if the
string is a full 32 characters sans zeroes, which is why I need strnlen()).

Oh I see.  Well, it isn't much help, but std::string in C++ isn't 
null-terminated either, so this issue isn't unique to D.  Unfortunately, 
I think a custom comparator, like the one you've written, is the best 
choice here.  That or property methods to make Data act more D-like. 
The get/set routines could return and accept 'normal' D strings, perform 
length validation, etc.


Sean

Aug 10 2007

Regan Heath <regan netmail.co.nz> writes:

Sean Kelly wrote:
 C. Dunn wrote:
 BCS Wrote:

 Reply to Sean,

 C. Dunn wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).

 I'm not sure I understand.  Why bother computing string length in the
 C fashion when D provides a .length property which holds this
 information?

 Sean

 He might be using a D char[] as an oversized buffer for a c style 
 string.

 Exactly.  This is very common in the database world.  The disk record 
 has a fixed size, so I have a struct which looks like this:

 struct Data{
   int id;
   char[32] name;
   // ...
 };

 A C function produces this data.  D can accept the C struct with no 
 problems.  'name' is just a static array.  But processing the name 
 field in D is awkward.  'name.length' is 32, but 'strlen(name)' could 
 be less (or infinity if the string is a full 32 characters sans 
 zeroes, which is why I need strnlen()).

 
 Oh I see.  Well, it isn't much help, but std::string in C++ isn't 
 null-terminated either, so this issue isn't unique to D.  Unfortunately, 
 I think a custom comparator, like the one you've written, is the best 
 choice here.  That or property methods to make Data act more D-like. The 
 get/set routines could return and accept 'normal' D strings, perform 
 length validation, etc.

Something like this: (borrowing from Derek's solution, which I quite 
liked BTW)

import std.string, std.stdio;

// Return a slice of the leftmost portion of 'x'
// up to but not including the first 'c'
string lefts(string x, char c)
{
     int p;
     p = std.string.find(x,c);
     if (p < 0)
         p = x.length;
     return x[0..p];
}

struct Data
{
   int id;
   char[32] _name;
   string name() { return lefts(_name, '\0'); }
}

Data zero;
Data full;
Data some;

static this()
{
	zero._name[] = '\0';
	full._name[] = 'a';
	some._name[0..10] = 'a';
	some._name[10..$] = '\0';
}

void main()
{
	char[] other;
	
	other.length = 10;
	other[] = 'a';
	
	assert(other != zero.name);
	assert(other != full.name);
	assert(other == some.name);
}

Regan

Aug 11 2007

kenny <funisher gmail.com> writes:

C. Dunn wrote:
 BCS Wrote:
 
 Reply to Sean,

 C. Dunn wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated
 string, padded with zeroes.  It is amazingly difficult to compare
 such a char[n] with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).

 I'm not sure I understand.  Why bother computing string length in the
 C fashion when D provides a .length property which holds this
 information?

 Sean

 He might be using a D char[] as an oversized buffer for a c style string.

 
 Exactly.  This is very common in the database world.  The disk record has a
fixed size, so I have a struct which looks like this:
 
 struct Data{
   int id;
   char[32] name;
   // ...
 };
 
 A C function produces this data.  D can accept the C struct with no problems. 
'name' is just a static array.  But processing the name field in D is awkward. 
'name.length' is 32, but 'strlen(name)' could be less (or infinity if the
string is a full 32 characters sans zeroes, which is why I need strnlen()).

I use postgre and mysql for lots of things. Postgre is much easier to grab the
string length from cause it returns with the tuple. If I remember right,
internally, the schema is stored, then each string looks like this:

for varchar <= 255
struct Firstname {
	ubyte length;
	char[size] data;
}

for varchar <= 65535
struct Firstname {
	ushort length;
	char[size] data;
}

--------------------------

why would you want to zero terminate your strings in a database form. It
doesn't make any sense. you trade off 1 byte of savings for up to 255 loops to
find zero, or two bytes of savings for up to 65535 loops. Consider you have two
options... you can always null terminate it -- which means that for strings
shorter than 256 chars, you don't save anything -- or you could do this to keep
the last char:
uint i;
for(i = 0; i < 256; i++) {
	if(str[i] == 0) {
		break;
	}
}

return i;

but then that has two checks instead of one (i < 256 && str[i] != 0).

In postgre, using libpq, something like what you're saying is very easy...

int len = PQgetlength(res, row, offset);
if(len >= 0) {
	char* r = PQgetvalue(res, row, offset);
	char[] rr;
	rr.length = len;
	rr[0 .. len] = r[0 .. len];
}

I really suggest using string lengths. it will save you tons of processing
power. (especially if you are > 65535 chars in length) and also, by storing the
length, you also have the added advantage of being able to store binary data in
there, because a zero in the string won't terminate the string. Also, you may
find out that people can end strings early passing malformed utf-8 sequences
and such too.

Every C library that I use, which uses null terminated strings, I quickly
convert them to the dark side for the above reasons. walter is very smart
making strings that way -- for slicing purposes too :) Example, imagine a
RIGHT(str, 5) function with null terminated strings, then think of it in D:
(str.length > 5 ? str[length-5 .. length] : str);

ok, enough rambling... I LOVE strings in D :)

Kenny

Aug 11 2007

Derek Parnell <derek psych.ward> writes:

On Fri, 10 Aug 2007 17:49:01 -0400, C. Dunn wrote:

 
 I have a field of n chars stored on disk.  It holds a
 null-terminated string, padded with zeroes.
 It is amazingly difficult to compare such a char[n]
 with some other char[] (which, by the dictates of D,
 may or may not be null-terminated).

You could try this simpler method ...

import std.string;
// Return a slice of the leftmost portion of 'x'
// up to but not including the first 'c'
string lefts(string x, char c)
{
    int p;
    p = std.string.find(x,c);
    if (p < 0)
        p = x.length;
    return x[0..p];
}

int compare(string lhs, string rhs, char d = '\0')
{
    return std.string.cmp( lefts(lhs,d), lefts(rhs,d) );
}

and use it like ...

   char[32] NameA;
   char[56] NameB;
   NameA[] = ' '; 
   NameB[] = ' ';

   NameA[0..5] = "derek";
   NameB[0..7] = "parnell";
   result = compare(NameA, NameB);

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Aug 11 2007

Derek Parnell <derek psych.ward> writes:

On Sat, 11 Aug 2007 20:07:51 +1000, Derek Parnell wrote:


Oops! Of course I really meant ... 

and use it like ...

   char[32] NameA;
   char[56] NameB;
   NameA[] = '\0'; 
   NameB[] = '\0';

   NameA[0..5] = "derek";
   NameB[0..7] = "parnell";
   result = compare(NameA, NameB);


-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Aug 11 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:

 I have a field of n chars stored on disk.  It holds a null-terminated string,
padded with zeroes.  It is amazingly difficult to compare such a char[n] with
some other char[] (which, by the dictates of D, may or may not be
null-terminated).

This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had
to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings
are somewhat similar to D's strings - they have a length property, and allow
the string to contain zeroes. Because of that, you couldn't just typecast a
string to a PChar, due to lack of a terminating zero. Borland solved the
problem by having strings always have a null terminating byte at their end,
thus allowing you to typecast a string directly to a PChar.

I noticed that memory for arrays are always allocated with an extra byte
(internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related
and can be used for this purpose..?

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Aug 13 2007

Regan Heath <regan netmail.co.nz> writes:

Vladimir Panteleev wrote:
 On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com>
 wrote:
 
 I have a field of n chars stored on disk.  It holds a
 null-terminated string, padded with zeroes.  It is amazingly
 difficult to compare such a char[n] with some other char[] (which,
 by the dictates of D, may or may not be null-terminated).

 
 This reminds me of the Delphi string problem. Before Delphi 3 (OSLT),
 you had to do crazy stuff to get a PChar (char*) out of a string.
 Delphi "long" strings are somewhat similar to D's strings - they have
 a length property, and allow the string to contain zeroes. Because of
 that, you couldn't just typecast a string to a PChar, due to lack of
 a terminating zero. Borland solved the problem by having strings
 always have a null terminating byte at their end, thus allowing you
 to typecast a string directly to a PChar.
 
 I noticed that memory for arrays are always allocated with an extra
 byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if
 this is related and can be used for this purpose..?

It could but the problem remains when dealing with slices, eg.

string foo = "this is a test";
string bar = foo[5..9];

the byte following the end of the slice 'bar' is ' ' not '\0'.

I believe there was/is a hack in toStringz which checks the byte 
following the slice and if it's '\0' already, does nothing but return 
the input string.

Regan

Aug 13 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Vladimir Panteleev wrote:
 On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:
 
 I have a field of n chars stored on disk.  It holds a null-terminated string,
padded with zeroes.  It is amazingly difficult to compare such a char[n] with
some other char[] (which, by the dictates of D, may or may not be
null-terminated).

 
 This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had
to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings
are somewhat similar to D's strings - they have a length property, and allow
the string to contain zeroes. Because of that, you couldn't just typecast a
string to a PChar, due to lack of a terminating zero. Borland solved the
problem by having strings always have a null terminating byte at their end,
thus allowing you to typecast a string directly to a PChar.
 
 I noticed that memory for arrays are always allocated with an extra byte
(internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related
and can be used for this purpose..?

It's probably to make sure a one-past-the-end pointer is also counted as 
a reference to a memory block by the garbage collector.
Though if it's initialized to 0, it could be used for the purpose you 
describe as a side-effect.

Aug 13 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Biggest problems w/ D - strings