www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - .init property for char[] type

reply Justin Johansson <procode adam-dott-com.au> writes:
In a templated class (D1.0) along lines ...

class Foo(T) {
//..
  static T bar() { return T.init; }
//..
}

Foo!(int).bar() returns 0 and Foo!(char[]).bar() returns nil.

I'd much prefer (at least for my purposes) that (char[]).init returned an empty
string rather than effectively a null pointer.  Is there a convenient solution
for this, e.g. by specializing just the bar method of class Foo when T is
char[], or by some other means?

Maybe this type of question best be asked on D.learn, but I do wonder if an
empty string is a more reasonable initializer for char[] .. well maybe not .. I
don't know .. I yield to your sensibilities.

Thanks to all.
Sep 22 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Justin Johansson wrote:
 In a templated class (D1.0) along lines ...
 
 class Foo(T) {
 //..
   static T bar() { return T.init; }
 //..
 }
 
 Foo!(int).bar() returns 0 and Foo!(char[]).bar() returns nil.
 
 I'd much prefer (at least for my purposes) that (char[]).init returned an
empty string rather than effectively a null pointer.  Is there a convenient
solution for this, e.g. by specializing just the bar method of class Foo when T
is char[], or by some other means?
 
 Maybe this type of question best be asked on D.learn, but I do wonder if an
empty string is a more reasonable initializer for char[] .. well maybe not .. I
don't know .. I yield to your sensibilities.
 
 Thanks to all.
 
You could use a custom type, which would solve your .init problem: typedef string myString = ""; Or you could specialize your bar(): static T bar() { static if(isSomeString!T) return ""; else return T.init; } I myself favor a null initializer, since char[] is a reference type, not a value type, it only makes sense to initialize it to a null reference.
Sep 22 2009
prev sibling next sibling parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Sep 22, 2009 at 8:07 AM, Justin Johansson
<procode adam-dott-com.au> wrote:
 In a templated class (D1.0) along lines ...

 class Foo(T) {
 //..
 =A0static T bar() { return T.init; }
 //..
 }

 Foo!(int).bar() returns 0 and Foo!(char[]).bar() returns nil.

 I'd much prefer (at least for my purposes) that (char[]).init returned an=
empty string rather than effectively a null pointer. =A0Is there a conveni= ent solution for this, e.g. by specializing just the bar method of class Fo= o when T is char[], or by some other means?
 Maybe this type of question best be asked on D.learn, but I do wonder if =
an empty string is a more reasonable initializer for char[] .. well maybe n= ot .. I don't know .. I yield to your sensibilities.
 Thanks to all.
There's no real difference between an empty string and a null reference. Both have 0 length.
Sep 22 2009
parent reply Justin Johansson <procode adam-dott-com.au> writes:
Jarrett Billingsley Wrote:

 On Tue, Sep 22, 2009 at 8:07 AM, Justin Johansson
 Maybe this type of question best be asked on D.learn, but I do wonder if an
empty string is a more reasonable initializer for char[] .. well maybe not .. I
don't know .. I yield to your sensibilities.
There's no real difference between an empty string and a null reference. Both have 0 length.
Big difference if you pass char[] variable .ptr to a C function. static if ( typeid(T) is typeid(char[])) { } else { init_sequence = new ExactlyOne!(T)( T.init); } Tks Jeremie got specialized method working with
Sep 22 2009
parent reply Justin Johansson <procode adam-dott-com.au> writes:
Justin Johansson Wrote:

Scratch that last garbled reply .. finger trouble.

Was going to say that ...


 There's no real difference between an empty string and a null
 reference. Both have 0 length.
Big difference if you pass char[] variable .ptr to a C function. And thanks Jeremie, got specialized method working with static if ( typeid(T) is typeid(char[])) { // .. } else { // .. } Cheers Justin Johansson
Sep 22 2009
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
In general, if you pass a string to a C function you should send it
through toStringz first.  If you don't, you're just begging for segfaults.

Justin Johansson wrote:
 There's no real difference between an empty string and a null
 reference. Both have 0 length.
Big difference if you pass char[] variable .ptr to a C function.
Sep 22 2009
parent reply Justin Johansson <procode adam-dott-com.au> writes:
Daniel Keep Wrote:

 Big difference if you pass char[] variable .ptr to a C function.
 In general, if you pass a string to a C function you should send it
 through toStringz first.  If you don't, you're just begging for segfaults.
Agreed .. fair enough. Actually I'm more interested in the semantics for default initialized char[]. Does it have exactly the same semantics as an empty string (in general D or runtime library, Phobos et. al. context)?
Sep 22 2009
next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Justin Johansson wrote:
 Daniel Keep Wrote:
 
 Big difference if you pass char[] variable .ptr to a C function.
 In general, if you pass a string to a C function you should send it
 through toStringz first.  If you don't, you're just begging for segfaults.
Agreed .. fair enough. Actually I'm more interested in the semantics for default initialized char[]. Does it have exactly the same semantics as an empty string (in general D or runtime library, Phobos et. al. context)?
It isn't the same semantics: a null array is {0, null}, while an empty array is {0, &zero} where zero is of type 'char zero = 0;' since string literals are zero terminated. Their usage is mostly the same, you can concatenate both of them, append to both of them, and etc, all giving the same results. Where it makes a difference is when you need to enforce an invariant that .ptr is not null. Calling toStringz on either will give the same C string: a pointer to a zero value. You have to remember that arrays are reference types; they are perfectly valid without referenced data. Think of pointers or objects for example, which are also reference types. Besides, if you initialize character arrays to "", what do you initialize other arrays to, and other reference types to? It just wouldn't be consistent.
Sep 22 2009
parent reply Justin Johansson <procode adam-dott-com.au> writes:
Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.
Consistency. Since when is that an argument? Just to be a PITA, pick the inconsistent row in the table below (from spec_D1.00.pdf). The row ordering of the the table has been shuffled just to make it a bit more difficult to spot :-) short.init 0 int.init 0 bool.init false byte.init 0 double.init double.nan long.init 0L
Sep 22 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.
Consistency. Since when is that an argument? Just to be a PITA, pick the inconsistent row in the table below (from spec_D1.00.pdf). The row ordering of the the table has been shuffled just to make it a bit more difficult to spot :-) short.init 0 int.init 0 bool.init false byte.init 0 double.init double.nan long.init 0L
Obviously the nan floating points, which has annoyed me quite many times, every other type in D inits to zeroed memory, with the exception of void initializers.
Sep 22 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.
Consistency. Since when is that an argument? Just to be a PITA, pick the inconsistent row in the table below (from spec_D1.00.pdf). The row ordering of the the table has been shuffled just to make it a bit more difficult to spot :-) short.init 0 int.init 0 bool.init false byte.init 0 double.init double.nan long.init 0L
You forgot char.init 0xFF wchar.init 0xFFFF dchar.init 0xFFFFFFFF Andrei
Sep 22 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Andrei Alexandrescu wrote:
 Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character arrays to "", what do you 
 initialize other arrays to, and other reference types to? It just 
 wouldn't be consistent.
Consistency. Since when is that an argument? Just to be a PITA, pick the inconsistent row in the table below (from spec_D1.00.pdf). The row ordering of the the table has been shuffled just to make it a bit more difficult to spot :-) short.init 0 int.init 0 bool.init false byte.init 0 double.init double.nan long.init 0L
You forgot char.init 0xFF wchar.init 0xFFFF dchar.init 0xFFFFFFFF Andrei
Actually, dchar.init is "\U0000ffff". Jeremie
Sep 22 2009
prev sibling parent reply Justin Johansson <procode adam-dott-com.au> writes:
Andrei Alexandrescu Wrote:

 Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.
Consistency. Since when is that an argument? Just to be a PITA, pick the inconsistent row in the table below (from spec_D1.00.pdf). The row ordering of the the table has been shuffled just to make it a bit more difficult to spot :-) short.init 0 int.init 0 bool.init false byte.init 0 double.init double.nan long.init 0L
You forgot char.init 0xFF wchar.init 0xFFFF dchar.init 0xFFFFFFFF Andrei
Shhh; don't tell anybody; I left those out of the quiz to weigh in favour of zero bit pattern init values. (This trick, i.e. omitting information, is one I learned from the Ministries of Statistics and (un)Employment.) Seriously though, I imagine the D design choices to be influenced by the desire to propagate NaN and invalid UTF in their respective cases so as to detect uninitialized data errors. Hmm, guess one could argue the init issue for eons. -- Justin
Sep 22 2009
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-09-22 18:08:24 -0400, Justin Johansson <procode adam-dott-com.au> said:

 You forgot
 
 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF
 
 Andrei
Shhh; don't tell anybody; I left those out of the quiz to weigh in favour of zero bit pattern init values. (This trick, i.e. omitting information, is one I learned from the Ministries of Statistics and (un)Employment.) Seriously though, I imagine the D design choices to be influenced by the desire to propagate NaN and invalid UTF in their respective cases so as to detect uninitialized data errors. Hmm, guess one could argue the init issue for eons.
Well, I see this as a problem because I've often relied on default initialization being zero in my algorithms. I was bitten once when my algorithm worked perfectly with char but not with wchar. Turns out that char.init == 0 (contraty to what Andrei wrote) and wchar.init == 0xFFFF. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Sep 23 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Michel Fortin wrote:
 On 2009-09-22 18:08:24 -0400, Justin Johansson 
 <procode adam-dott-com.au> said:
 
 You forgot

 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF

 Andrei
Shhh; don't tell anybody; I left those out of the quiz to weigh in favour of zero bit pattern init values. (This trick, i.e. omitting information, is one I learned from the Ministries of Statistics and (un)Employment.) Seriously though, I imagine the D design choices to be influenced by the desire to propagate NaN and invalid UTF in their respective cases so as to detect uninitialized data errors. Hmm, guess one could argue the init issue for eons.
Well, I see this as a problem because I've often relied on default initialization being zero in my algorithms. I was bitten once when my algorithm worked perfectly with char but not with wchar. Turns out that char.init == 0 (contraty to what Andrei wrote) and wchar.init == 0xFFFF.
pragma(msg, char.init.stringof); outputs '\xff' in D2, wchar and dchar have the same initializer: '\U0000FFFF'. If you rely on char initializer being the null character, use char c = 0, or else your char gets initialized to an invalid character, just like floats get initialized to nan, other types have the invalid value as either null or do not have an invalid value and use 0.
Sep 23 2009
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Justin Johansson wrote:
 Seriously though, I imagine the D design choices to be influenced by
 the desire to propagate NaN and invalid UTF in their respective cases
 so as to detect uninitialized data errors.
That's exactly what drove the design choices. If there was a nan value for integers, D would use that. But there isn't, so 0 is the best we can do. Andrei and I were talking last night about the purity of software design principles and the reality, and how the reality forces compromise on the purity if you wanted to get anything done.
Sep 24 2009
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 22 Sep 2009 09:53:52 -0400, Justin Johansson  
<procode adam-dott-com.au> wrote:

 Daniel Keep Wrote:

 Big difference if you pass char[] variable .ptr to a C function.
 In general, if you pass a string to a C function you should send it
 through toStringz first.  If you don't, you're just begging for  
 segfaults.
Agreed .. fair enough. Actually I'm more interested in the semantics for default initialized char[]. Does it have exactly the same semantics as an empty string (in general D or runtime library, Phobos et. al. context)?
A null string *is* an empty string, but an empty string may not be a null string. The subtle difference is that the pointer points to null versus some data. A non-null empty string: - May be pointing to heap data, therefore keeping the data from being collected. - May reallocate in place on appending (a null string always must allocate new data on append). It's a difficult concept to get, but an array is really a hybrid type between a reference and a value type. The array is actually a value type struct with a pointer reference and a length value. If the length is zero, then the pointer value technically isn't needed, but in subtle cases, it makes a difference. When you copy the array, the length behaves like a value type (changing the length of one array doesn't affect the other), but the array data is referenced (changing an element of the array *does* affect the other). I think plans are to make the array a full reference type, and leave slices as these structs (in D2). This probably will clear up a lot of confusion people have. I hope this helps... Oh, and BTW, you can pass string literals to C functions, but *not* char[] variables. Always pass them through toStringz. It generally does not take much time/resources to add the zero. -Steve
Sep 22 2009
parent Justin Johansson <procode adam-dott-com.au> writes:
Steven Schveighoffer Wrote:

 A null string *is* an empty string, but an empty string may not be a null  
 string.
 
 The subtle difference is that the pointer points to null versus some data.
 
 A non-null empty string:
 
   - May be pointing to heap data, therefore keeping the data from being  
 collected.
   - May reallocate in place on appending (a null string always must  
 allocate new data on append).
 
 It's a difficult concept to get, but an array is really a hybrid type  
 between a reference and a value type.  The array is actually a value type  
 struct with a pointer reference and a length value.  If the length is  
 zero, then the pointer value technically isn't needed, but in subtle  
 cases, it makes a difference.  When you copy the array, the length behaves  
 like a value type (changing the length of one array doesn't affect the  
 other), but the array data is referenced (changing an element of the array  
 *does* affect the other).
 
 I think plans are to make the array a full reference type, and leave  
 slices as these structs (in D2).  This probably will clear up a lot of  
 confusion people have.
 
 I hope this helps...
 
 Oh, and BTW, you can pass string literals to C functions, but *not* char[]  
 variables.  Always pass them through toStringz.  It generally does not  
 take much time/resources to add the zero.
 
 -Steve
Good write-up Steve; thanks. Being relatively new to D, but from a strong C++ and assembler background, I did the usual interrogation for interest: writefln( "(char[]).sizeof=%d", (char[]).sizeof); 8 bytes. So if you wanted to intern string data to conserve memory, and reference such data with a single 32-bit pointer, sounds like you would have to do this with either a char* or perhaps a pointer to a char[], rather than a full char[] field in your class or struct. There's less reason to want to intern string data if you still need 8 bytes to reference said data. Justin
Sep 22 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Justin Johansson (I think):
 short.init    0
 int.init        0
 bool.init     false
 byte.init     0
 double.init  double.nan
 long.init     0L
 
You forgot char.init 0xFF wchar.init 0xFFFF dchar.init 0xFFFFFFFF
One small disadvantage of some of those init values (for not int variables) is that if you have a large global static array of floats or chars in your program, its memory can be found in the binary, that can become huge (you can avoid that setting the static array to void, and then I think the LDC compiler or the operating system resets such memory to zero anyway). To avoid such huge binaries D can keep small static arrays like now, but it can initialize at run-time (before the main) the large static arrays. Bye, bearophile
Sep 23 2009