digitalmars.D - .init property for char[] type

Justin Johansson (10/10) Sep 22 2009 In a templated class (D1.0) along lines ...

Jeremie Pelletier (12/28) Sep 22 2009 You could use a custom type, which would solve your .init problem:
Jarrett Billingsley (9/19) Sep 22 2009 empty string rather than effectively a null pointer. =A0Is there a conv...

Justin Johansson (8/14) Sep 22 2009 Big difference if you pass char[] variable .ptr to a C function.

Justin Johansson (15/17) Sep 22 2009 Justin Johansson Wrote:

Daniel Keep (3/7) Sep 22 2009 In general, if you pass a string to a C function you should send it

Justin Johansson (4/7) Sep 22 2009 Agreed .. fair enough.

Jeremie Pelletier (14/25) Sep 22 2009 It isn't the same semantics:

Justin Johansson (10/13) Sep 22 2009 Consistency. Since when is that an argument?

Jeremie Pelletier (4/21) Sep 22 2009 Obviously the nan floating points, which has annoyed me quite many
Andrei Alexandrescu (6/23) Sep 22 2009 You forgot

Jeremie Pelletier (3/32) Sep 22 2009 Actually, dchar.init is "\U0000ffff".
Justin Johansson (5/32) Sep 22 2009 Shhh; don't tell anybody; I left those out of the quiz to weigh in favou...

Michel Fortin (9/26) Sep 23 2009 Well, I see this as a problem because I've often relied on default

Jeremie Pelletier (8/34) Sep 23 2009 pragma(msg, char.init.stringof);

Walter Bright (7/10) Sep 24 2009 That's exactly what drove the design choices.

Steven Schveighoffer (26/36) Sep 22 2009 A null string *is* an empty string, but an empty string may not be a nul...

Justin Johansson (8/40) Sep 22 2009 Good write-up Steve; thanks.

bearophile (5/19) Sep 23 2009 One small disadvantage of some of those init values (for not int variabl...

Justin Johansson <procode adam-dott-com.au> writes:

In a templated class (D1.0) along lines ...

class Foo(T) {
//..
  static T bar() { return T.init; }
//..
}

Foo!(int).bar() returns 0 and Foo!(char[]).bar() returns nil.

I'd much prefer (at least for my purposes) that (char[]).init returned an empty
string rather than effectively a null pointer.  Is there a convenient solution
for this, e.g. by specializing just the bar method of class Foo when T is
char[], or by some other means?

Maybe this type of question best be asked on D.learn, but I do wonder if an
empty string is a more reasonable initializer for char[] .. well maybe not .. I
don't know .. I yield to your sensibilities.

Thanks to all.

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Justin Johansson wrote:
 In a templated class (D1.0) along lines ...
 
 class Foo(T) {
 //..
   static T bar() { return T.init; }
 //..
 }
 
 Foo!(int).bar() returns 0 and Foo!(char[]).bar() returns nil.
 
 I'd much prefer (at least for my purposes) that (char[]).init returned an
empty string rather than effectively a null pointer.  Is there a convenient
solution for this, e.g. by specializing just the bar method of class Foo when T
is char[], or by some other means?
 
 Maybe this type of question best be asked on D.learn, but I do wonder if an
empty string is a more reasonable initializer for char[] .. well maybe not .. I
don't know .. I yield to your sensibilities.
 
 Thanks to all.
 

You could use a custom type, which would solve your .init problem:
typedef string myString = "";

Or you could specialize your bar():

static T bar() {
     static if(isSomeString!T)
         return "";
     else
         return T.init;
}

I myself favor a null initializer, since char[] is a reference type, not 
a value type, it only makes sense to initialize it to a null reference.

Sep 22 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Sep 22, 2009 at 8:07 AM, Justin Johansson
<procode adam-dott-com.au> wrote:
 In a templated class (D1.0) along lines ...

 class Foo(T) {
 //..
 =A0static T bar() { return T.init; }
 //..
 }

 Foo!(int).bar() returns 0 and Foo!(char[]).bar() returns nil.

 I'd much prefer (at least for my purposes) that (char[]).init returned an=

 empty string rather than effectively a null pointer. =A0Is there a conveni=
ent solution for this, e.g. by specializing just the bar method of class Fo=
o when T is char[], or by some other means?
 Maybe this type of question best be asked on D.learn, but I do wonder if =

an empty string is a more reasonable initializer for char[] .. well maybe n=
ot .. I don't know .. I yield to your sensibilities.
 Thanks to all.

There's no real difference between an empty string and a null
reference. Both have 0 length.

Sep 22 2009

Justin Johansson <procode adam-dott-com.au> writes:

Jarrett Billingsley Wrote:

 On Tue, Sep 22, 2009 at 8:07 AM, Justin Johansson
 Maybe this type of question best be asked on D.learn, but I do wonder if an
empty string is a more reasonable initializer for char[] .. well maybe not .. I
don't know .. I yield to your sensibilities.

 There's no real difference between an empty string and a null
 reference. Both have 0 length.

Big difference if you pass char[] variable .ptr to a C function.


static if ( typeid(T) is typeid(char[])) {

		}

		else {
			init_sequence = new ExactlyOne!(T)( T.init);
		}

Tks Jeremie got specialized method working with

Sep 22 2009

Justin Johansson <procode adam-dott-com.au> writes:

Justin Johansson Wrote:

Scratch that last garbled reply .. finger trouble.

Was going to say that ...


 There's no real difference between an empty string and a null
 reference. Both have 0 length.


 
Big difference if you pass char[] variable .ptr to a C function.
 

And thanks Jeremie, got specialized method working with
static if ( typeid(T) is typeid(char[])) {
  // ..
}

else {
  // ..
}

Cheers
Justin Johansson

Sep 22 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

In general, if you pass a string to a C function you should send it
through toStringz first.  If you don't, you're just begging for segfaults.

Justin Johansson wrote:
 There's no real difference between an empty string and a null
 reference. Both have 0 length.


  
 Big difference if you pass char[] variable .ptr to a C function.

Sep 22 2009

Justin Johansson <procode adam-dott-com.au> writes:

Daniel Keep Wrote:

 Big difference if you pass char[] variable .ptr to a C function.


 In general, if you pass a string to a C function you should send it
 through toStringz first.  If you don't, you're just begging for segfaults.

Agreed .. fair enough.

Actually I'm more interested in the semantics for default initialized char[].
Does it have exactly the same semantics as an empty string (in general D or
runtime library, Phobos et. al. context)?

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Justin Johansson wrote:
 Daniel Keep Wrote:
 
 Big difference if you pass char[] variable .ptr to a C function.


 
 In general, if you pass a string to a C function you should send it
 through toStringz first.  If you don't, you're just begging for segfaults.

 
 Agreed .. fair enough.
 
 Actually I'm more interested in the semantics for default initialized char[].
 Does it have exactly the same semantics as an empty string (in general D or
runtime library, Phobos et. al. context)?

It isn't the same semantics:
a null array is {0, null}, while an empty array is {0, &zero} where zero 
is of type 'char zero = 0;' since string literals are zero terminated.

Their usage is mostly the same, you can concatenate both of them, append 
  to both of them, and etc, all giving the same results. Where it makes 
a difference is when you need to enforce an invariant that .ptr is not 
null. Calling toStringz on either will give the same C string: a pointer 
to a zero value.

You have to remember that arrays are reference types; they are perfectly 
valid without referenced data. Think of pointers or objects for example, 
which are also reference types. Besides, if you initialize character 
arrays to "", what do you initialize other arrays to, and other 
reference types to? It just wouldn't be consistent.

Sep 22 2009

Justin Johansson <procode adam-dott-com.au> writes:

Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.

Consistency.  Since when is that an argument?

Just to be a PITA, pick the inconsistent row in the table below (from
spec_D1.00.pdf).
The row ordering of the the table has been shuffled just to make it a bit more
difficult to spot :-)

short.init    0
int.init        0
bool.init     false
byte.init     0
double.init  double.nan
long.init     0L

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.

 
 Consistency.  Since when is that an argument?
 
 Just to be a PITA, pick the inconsistent row in the table below (from
spec_D1.00.pdf).
 The row ordering of the the table has been shuffled just to make it a bit more
difficult to spot :-)
 
 short.init    0
 int.init        0
 bool.init     false
 byte.init     0
 double.init  double.nan
 long.init     0L
 

Obviously the nan floating points, which has annoyed me quite many 
times, every other type in D inits to zeroed memory, with the exception 
of void initializers.

Sep 22 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.

 
 Consistency.  Since when is that an argument?
 
 Just to be a PITA, pick the inconsistent row in the table below (from
spec_D1.00.pdf).
 The row ordering of the the table has been shuffled just to make it a bit more
difficult to spot :-)
 
 short.init    0
 int.init        0
 bool.init     false
 byte.init     0
 double.init  double.nan
 long.init     0L
 

You forgot

char.init 0xFF
wchar.init 0xFFFF
dchar.init 0xFFFFFFFF


Andrei

Sep 22 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Andrei Alexandrescu wrote:
 Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character arrays to "", what do you 
 initialize other arrays to, and other reference types to? It just 
 wouldn't be consistent.

 Consistency.  Since when is that an argument?

 Just to be a PITA, pick the inconsistent row in the table below (from 
 spec_D1.00.pdf).
 The row ordering of the the table has been shuffled just to make it a 
 bit more difficult to spot :-)

 short.init    0
 int.init        0
 bool.init     false
 byte.init     0
 double.init  double.nan
 long.init     0L

 
 You forgot
 
 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF
 
 
 Andrei

Actually, dchar.init is "\U0000ffff".

Jeremie

Sep 22 2009

Justin Johansson <procode adam-dott-com.au> writes:

Andrei Alexandrescu Wrote:

 Justin Johansson wrote:
 Jeremie Pelletier Wrote:
 Besides, if you initialize character 
 arrays to "", what do you initialize other arrays to, and other 
 reference types to? It just wouldn't be consistent.

 
 Consistency.  Since when is that an argument?
 
 Just to be a PITA, pick the inconsistent row in the table below (from
spec_D1.00.pdf).
 The row ordering of the the table has been shuffled just to make it a bit more
difficult to spot :-)
 
 short.init    0
 int.init        0
 bool.init     false
 byte.init     0
 double.init  double.nan
 long.init     0L
 

 
 You forgot
 
 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF
 
 
 Andrei

Shhh; don't tell anybody; I left those out of the quiz to weigh in favour of
zero bit pattern init values.
(This trick, i.e. omitting information, is one I learned from the Ministries of
Statistics and (un)Employment.)

Seriously though, I imagine the D design choices to be influenced by the desire
to propagate NaN and invalid UTF in their respective cases so as to detect
uninitialized data errors.  Hmm, guess one could argue the init issue for eons.

-- Justin

Sep 22 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-09-22 18:08:24 -0400, Justin Johansson <procode adam-dott-com.au> said:

 You forgot
 
 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF
 
 Andrei

 
 Shhh; don't tell anybody; I left those out of the quiz to weigh in 
 favour of zero bit pattern init values.
 (This trick, i.e. omitting information, is one I learned from the 
 Ministries of Statistics and (un)Employment.)
 
 Seriously though, I imagine the D design choices to be influenced by 
 the desire to propagate NaN and invalid UTF in their respective cases 
 so as to detect uninitialized data errors.  Hmm, guess one could argue 
 the init issue for eons.

Well, I see this as a problem because I've often relied on default 
initialization being zero in my algorithms. I was bitten once when my 
algorithm worked perfectly with char but not with wchar. Turns out that 
char.init == 0 (contraty to what Andrei wrote) and wchar.init == 0xFFFF.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Sep 23 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Michel Fortin wrote:
 On 2009-09-22 18:08:24 -0400, Justin Johansson 
 <procode adam-dott-com.au> said:
 
 You forgot

 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF

 Andrei

 Shhh; don't tell anybody; I left those out of the quiz to weigh in 
 favour of zero bit pattern init values.
 (This trick, i.e. omitting information, is one I learned from the 
 Ministries of Statistics and (un)Employment.)

 Seriously though, I imagine the D design choices to be influenced by 
 the desire to propagate NaN and invalid UTF in their respective cases 
 so as to detect uninitialized data errors.  Hmm, guess one could argue 
 the init issue for eons.

 
 Well, I see this as a problem because I've often relied on default 
 initialization being zero in my algorithms. I was bitten once when my 
 algorithm worked perfectly with char but not with wchar. Turns out that 
 char.init == 0 (contraty to what Andrei wrote) and wchar.init == 0xFFFF.
 

pragma(msg, char.init.stringof);

outputs '\xff' in D2, wchar and dchar have the same initializer: 
'\U0000FFFF'.

If you rely on char initializer being the null character, use char c = 
0, or else your char gets initialized to an invalid character, just like 
  floats get initialized to nan, other types have the invalid value as 
either null or do not have an invalid value and use 0.

Sep 23 2009

Walter Bright <newshound1 digitalmars.com> writes:

Justin Johansson wrote:
 Seriously though, I imagine the D design choices to be influenced by
 the desire to propagate NaN and invalid UTF in their respective cases
 so as to detect uninitialized data errors.

That's exactly what drove the design choices.

If there was a nan value for integers, D would use that. But there 
isn't, so 0 is the best we can do.

Andrei and I were talking last night about the purity of software design 
principles and the reality, and how the reality forces compromise on the 
purity if you wanted to get anything done.

Sep 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 22 Sep 2009 09:53:52 -0400, Justin Johansson  
<procode adam-dott-com.au> wrote:

 Daniel Keep Wrote:

 Big difference if you pass char[] variable .ptr to a C function.


 In general, if you pass a string to a C function you should send it
 through toStringz first.  If you don't, you're just begging for  
 segfaults.

 Agreed .. fair enough.

 Actually I'm more interested in the semantics for default initialized  
 char[].
 Does it have exactly the same semantics as an empty string (in general D  
 or runtime library, Phobos et. al. context)?

A null string *is* an empty string, but an empty string may not be a null  
string.

The subtle difference is that the pointer points to null versus some data.

A non-null empty string:

  - May be pointing to heap data, therefore keeping the data from being  
collected.
  - May reallocate in place on appending (a null string always must  
allocate new data on append).

It's a difficult concept to get, but an array is really a hybrid type  
between a reference and a value type.  The array is actually a value type  
struct with a pointer reference and a length value.  If the length is  
zero, then the pointer value technically isn't needed, but in subtle  
cases, it makes a difference.  When you copy the array, the length behaves  
like a value type (changing the length of one array doesn't affect the  
other), but the array data is referenced (changing an element of the array  
*does* affect the other).

I think plans are to make the array a full reference type, and leave  
slices as these structs (in D2).  This probably will clear up a lot of  
confusion people have.

I hope this helps...

Oh, and BTW, you can pass string literals to C functions, but *not* char[]  
variables.  Always pass them through toStringz.  It generally does not  
take much time/resources to add the zero.

-Steve

Sep 22 2009

Justin Johansson <procode adam-dott-com.au> writes:

Steven Schveighoffer Wrote:

 A null string *is* an empty string, but an empty string may not be a null  
 string.
 
 The subtle difference is that the pointer points to null versus some data.
 
 A non-null empty string:
 
   - May be pointing to heap data, therefore keeping the data from being  
 collected.
   - May reallocate in place on appending (a null string always must  
 allocate new data on append).
 
 It's a difficult concept to get, but an array is really a hybrid type  
 between a reference and a value type.  The array is actually a value type  
 struct with a pointer reference and a length value.  If the length is  
 zero, then the pointer value technically isn't needed, but in subtle  
 cases, it makes a difference.  When you copy the array, the length behaves  
 like a value type (changing the length of one array doesn't affect the  
 other), but the array data is referenced (changing an element of the array  
 *does* affect the other).
 
 I think plans are to make the array a full reference type, and leave  
 slices as these structs (in D2).  This probably will clear up a lot of  
 confusion people have.
 
 I hope this helps...
 
 Oh, and BTW, you can pass string literals to C functions, but *not* char[]  
 variables.  Always pass them through toStringz.  It generally does not  
 take much time/resources to add the zero.
 
 -Steve

Good write-up Steve; thanks.

Being relatively new to D, but from a strong C++ and assembler background, I
did the usual interrogation for interest:

	writefln( "(char[]).sizeof=%d", (char[]).sizeof);

8 bytes.

So if you wanted to intern string data to conserve memory, and reference such
data with a single 32-bit pointer, sounds like you would have to do this with
either a char* or perhaps a pointer to a char[], rather than a full char[]
field in your class or struct.

There's less reason to want to intern string data if you still need 8 bytes to
reference said data.

Justin

Sep 22 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 Justin Johansson (I think):
 short.init    0
 int.init        0
 bool.init     false
 byte.init     0
 double.init  double.nan
 long.init     0L
 

 
 You forgot
 
 char.init 0xFF
 wchar.init 0xFFFF
 dchar.init 0xFFFFFFFF

One small disadvantage of some of those init values (for not int variables) is
that if you have a large global static array of floats or chars in your
program, its memory can be found in the binary, that can become huge (you can
avoid that setting the static array to void, and then I think the LDC compiler
or the operating system resets such memory to zero anyway).

To avoid such huge binaries D can keep small static arrays like now, but it can
initialize at run-time (before the main) the large static arrays.

Bye,
bearophile

Sep 23 2009

D Programming

C/C++ Programming

Other

digitalmars.D - .init property for char[] type