www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Checking if a string is null

reply Max Samukha <samukha voliacable.com.removethis> writes:
Using '== null' and 'is null' with strings gives odd results (DMD
1.019):

void main()
{
	char[] s;

	if (s is null) writefln("s is null");
	if (s == null) writefln("s == null");		
}

Output:
s is null
s == null

----

void main()
{
	char[] s = "";

	if (s is null) writefln("s is null");
	if (s == null) writefln("s == null");		
}

Output:
s == null

----

Can anybody explain why s == null is true in the second example?
Jul 24 2007
next sibling parent reply Hoenir <mrmocool gmx.de> writes:
Max Samukha schrieb:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?
 

something like that.
Jul 24 2007
parent Max Samukha <samukha voliacable.com.removethis> writes:
On Wed, 25 Jul 2007 08:32:52 +0200, Hoenir <mrmocool gmx.de> wrote:

Max Samukha schrieb:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?
 

something like that.

Then, it's unclear what null content means. If it is the same as empty string (ptr != null and length == 0), I remain confused. If it means a null string (ptr == null and length == 0), the second example should output nothing since s.ptr != null.
Jul 25 2007
prev sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Max Samukha wrote:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?

Not I, it's inconsistent IMO and it gets worse: import std.stdio; void main() { foo(null); foo(""); } void foo(string s) { writefln(s.ptr, ", ", s.length); if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); if (s < null) writefln("s < null"); if (s > null) writefln("s < null"); if (s <= null) writefln("s <= null"); if (s >= null) writefln("s < null"); writefln(""); } Output: 0000, 0 s is null s == null s <= null s < null 415080, 0 s == null s <= null s < null So, "" is < and == null!? and <=,== but not >=!? This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent! Regan
Jul 25 2007
next sibling parent Regan Heath <regan netmail.co.nz> writes:
Manfred Nowak wrote:
 Regan Heath wrote
 
 This all boils down to the empty vs null string debate where some
 people want to be able to distinguish between them and some see no
 point. 

I haven't seen such a debate.

There have been several, I did a brief search and came up with: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55270 (this one was my fault) http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=25804 http://www.digitalmars.com/d/archives/digitalmars/D/learn/3521.html http://www.digitalmars.com/d/archives/21782.html http://www.digitalmars.com/d/archives/digitalmars/D/27123.html http://www.digitalmars.com/d/archives/16905.html http://www.digitalmars.com/d/archives/digitalmars/D/bugs/Issue_1314_New_Dupping_an_empty_array_creates_a_null_array_11585.html http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=D&artnum=17083 Some of those go back a long, long way.
 Does it mean that it is not possible to implement a Kleene Algebra for 
 strings in D because there is no neutral element for the alternative 
 operator?

I have no idea. :) Regan
Jul 25 2007
prev sibling next sibling parent reply Max Samukha <samukha voliacable.com.removethis> writes:
On Wed, 25 Jul 2007 11:12:19 +0100, Regan Heath <regan netmail.co.nz>
wrote:

Max Samukha wrote:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?

Not I, it's inconsistent IMO and it gets worse: import std.stdio; void main() { foo(null); foo(""); } void foo(string s) { writefln(s.ptr, ", ", s.length); if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); if (s < null) writefln("s < null"); if (s > null) writefln("s < null"); if (s <= null) writefln("s <= null"); if (s >= null) writefln("s < null"); writefln(""); } Output: 0000, 0 s is null s == null s <= null s < null 415080, 0 s == null s <= null s < null So, "" is < and == null!? and <=,== but not >=!?

You didn't update all writefln's :)
This all boils down to the empty vs null string debate where some people 
want to be able to distinguish between them and some see no point.

I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
least it should be consistent!

Regan

Anyway, it feels like an undefined area in the language. Do the specs say anything about how exactly arrays/strings/delegates should compare to null? It seems to be more than comparing the pointer part of the structs.
Jul 25 2007
next sibling parent Ald <aldarri_s yahoo.com> writes:
I believe the manual says that, when comparing, the compiler tries to call the
opEquals() method.  And calling that from null pointer yields undefined
behavior.  You should use _!is null_ construct instead.

Max Samukha Wrote:
Jul 25 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
 So, "" is < and == null!?
 and <=,== but not >=!?

You didn't update all writefln's :)

<hangs head in shame> What can I say, I'm having a bad morning.
 Anyway, it feels like an undefined area in the language. Do the specs
 say anything about how exactly arrays/strings/delegates should compare
 to null? It seems to be more than comparing the pointer part of the
 structs.

Not that I can find. The array page does say: "Strings can be copied, compared, concatenated, and appended:" .. "with the obvious semantics." but not much more on the topic. Under "Array Initialization" we see: * Pointers are initialized to null. .. * Dynamic arrays are initialized to having 0 elements. .. Which does not state that an array will be initialised to "null" but rather to something with 0 elements. To my mind something with 0 elements is 'empty' as opposed to being 'non existant' which is typically represented by 'null' or a similar value (like NAN for floats, 0xFF for char, etc). So, it seems the spec is hinting/saying that arrays cannot be non-existant, only empty (or not empty). And yet in the current implementation there is clearly a difference between 'null' and "" when it comes to arrays. I'm still firmly in favour of there being 3 distinct states for an array: * non existant (null) * empty ("", length == 0) * not empty (length > 0) That said I'm all firmly in favour of not getting a seg-fault when I have a reference to a non-existant array (we currently have this behaviour and it's perfect). All I think that needs 'fixing', and going back to your initial test case: char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); neither of these tests should evaluate 'true'. The fact that the latter does indicates to me that the array compare is first comparing length, seeing they're both 0 and assuming the arrays must be equal. I think instead it should also check the data pointer because in the case of "" the data pointer is non-null. The same is true for a zero length slice i.e. s[0..0], it exists (data pointer is non-null) but is empty (length is zero). In short, the compare function should recognise the 3 states: * non existant (data pointer is null) * empty (data pointer is non-null, length is zero) * not empty (length is > zero) and never make the mistake of calling an array in one state equal to an array in another state. Regan p.s. I am cross-posting and setting followup to digitalmars.D as it has become more of a theory/discussion on D than a learning exercise :) p.p.s Plus, I figure if Manfred cannot recall a discussion on this topic we probably need another one about now.
Jul 25 2007
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Regan Heath wrote:
 Max Samukha wrote:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):

 void main()
 {
     char[] s;

     if (s is null) writefln("s is null");
     if (s == null) writefln("s == null");       
 }

 Output:
 s is null
 s == null

 ----

 void main()
 {
     char[] s = "";

     if (s is null) writefln("s is null");
     if (s == null) writefln("s == null");       
 }

 Output:
 s == null

 ----

 Can anybody explain why s == null is true in the second example?

Not I, it's inconsistent IMO and it gets worse: import std.stdio; void main() { foo(null); foo(""); } void foo(string s) { writefln(s.ptr, ", ", s.length); if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); if (s < null) writefln("s < null"); if (s > null) writefln("s < null"); if (s <= null) writefln("s <= null"); if (s >= null) writefln("s < null"); writefln(""); } Output: 0000, 0 s is null s == null s <= null s < null 415080, 0 s == null s <= null s < null So, "" is < and == null!? and <=,== but not >=!?

As Max said, you forgot to update some writeflns. The output of the corrected version is: === 0000, 0 s is null s == null s <= null s >= null 805BEF0, 0 s == null s <= null s >= null === Seems perfectly consistent to me. Anything with an equality comparison (==, <=, >=) is true in both cases, and 'is' is only true when the pointer as well as the length is equal.
 This all boils down to the empty vs null string debate where some people 
 want to be able to distinguish between them and some see no point.
 
 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

They *are* distinguishable. That's why above code returns different results for the 'is' comparison... I for one am perfectly fine with "cast(char[]) null" meaning ".length == 0 && .ptr == null" and with comparisons of arrays using == and friends only inspecting the contents (not location) of the data. Now, about comparisons: array comparisons basically operate like this: --- int opEquals(T)(T[] u, T[] v) { // bah to int return type if (u.length != v.length) return false; for (size_t i = 0; i < u.length; i++) { if (u[i] != v[i]) return false; } return true; } int opCmp(T)(T[] u, T[] v) { size_t len = min(u.length, v.length) for (size_t i = 0; i < len; i++) { if (auto diff = u[i].opCmp(v[i])) { return diff; } } return cast(int)u.length - cast(int)v.length; } --- (Taken from object.TypeInfo_Array and converted to templates instead of void*s + casting + element TypeInfo.{equals/compare} for readability) Since both the null string and "" have .length == 0, that means they compare equal using those methods (having no contents to compare and equal length) This is all perfectly consistent (and even useful) to me...
Jul 25 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

They *are* distinguishable. That's why above code returns different results for the 'is' comparison...

True. I guess what I meant to say was I'm in the '3 distict states' camp (which may be a camp of 1 for all I know). See my reply to digitalmars.D for a definition of the 3 states.
 I for one am perfectly fine with "cast(char[]) null" meaning ".length == 
 0 && .ptr == null" 

Same here.
 and with comparisons of arrays using == and friends
 only inspecting the contents (not location) of the data.

I don't think an empty string (non-null, length == 0) should compare equal to a non-existant string (null, length == 0). And vice-versa. The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array. My reasoning for this is consistency, see at end. Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.
 Now, about comparisons: array comparisons basically operate like this:
 ---
 int opEquals(T)(T[] u, T[] v) {              // bah to int return type
     if (u.length != v.length) return false;
     for (size_t i = 0; i < u.length; i++) {
         if (u[i] != v[i]) return false;
     }
     return true;
 }
 
 int opCmp(T)(T[] u, T[] v) {
     size_t len = min(u.length, v.length)
     for (size_t i = 0; i < len; i++) {
         if (auto diff = u[i].opCmp(v[i])) {
             return diff;
         }
     }
     return cast(int)u.length - cast(int)v.length;
 }
 ---
 (Taken from object.TypeInfo_Array and converted to templates instead of 
 void*s + casting + element TypeInfo.{equals/compare} for readability)

Thanks.
 Since both the null string and "" have .length == 0, that means they 
 compare equal using those methods (having no contents to compare and 
 equal length)

This is the bit I don't like.
 This is all perfectly consistent (and even useful) to me...

It's not consistent with other reference types, types which can represent 'non-existant', eg. char *p = null; //non-existant if (p == null) writefln("p == null"); if (p == "") writefln("p == \"\""); Output: p == null Compare that to: char[] p = null; if (p == null) writefln("p == null"); if (p == "") writefln("p == \"\""); Output: p == null p == "" All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg.
 int opEquals(T)(T[] u, T[] v) {
     if (u.length != v.length) return false;

     for (size_t i = 0; i < u.length; i++) {
         if (u[i] != v[i]) return false;
     }
     return true;
 }

This should mean "" == "" but not "" == null, likewise null == null but not null == "". Regan
Jul 25 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Regan Heath wrote:
 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

They *are* distinguishable. That's why above code returns different results for the 'is' comparison...

True. I guess what I meant to say was I'm in the '3 distict states' camp (which may be a camp of 1 for all I know). See my reply to digitalmars.D for a definition of the 3 states.
 I for one am perfectly fine with "cast(char[]) null" meaning ".length 
 == 0 && .ptr == null" 

Same here. > and with comparisons of arrays using == and friends
 only inspecting the contents (not location) of the data.

I don't think an empty string (non-null, length == 0) should compare equal to a non-existant string (null, length == 0). And vice-versa. The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array. My reasoning for this is consistency, see at end.

Since null arrays have length 0, they *are* empty arrays :P.
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

At least with that last paragraph I can agree ;) Now, about this:
 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.
 
  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }
 
 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

Let's look at this code: --- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way? (Same goes for other ways to create different-ptr empty strings) What you might have meant on that extra line might be more like: --- if (u.length == 0) return ((u.ptr is null) == (v.ptr is null)); --- which will return true if both .ptr values are null or both are non-null.
Jul 25 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
 The only thing that should compare equal to null is null.  Likewise an 
 empty array should only compare equal to another empty array.

> My reasoning for this is consistency, see at end. Since null arrays have length 0, they *are* empty arrays :P.

I can't tell in which way you're joking so I'm just going to come out with... The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length). It either exists or it does not. If it exists, it has a length which may or may not be zero. Something which exists cannot be equal to something which doesn't. Period.
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save 
 a bit of time on comparisons of large arrays.

At least with that last paragraph I can agree ;)

:)
 Your change would change the second line (even if it actually allocated 
 a new empty string like you probably want instead of returning null). 
 How would that be consistent in any way?

Oops, my bad. My suggested code change is totally incorrect. That'll teach me for posting while working on something else at the same time.
 (Same goes for other ways to create different-ptr empty strings)
 
 What you might have meant on that extra line might be more like:
 ---
        if (u.length == 0) return ((u.ptr is null) == (v.ptr is null));
 ---
 which will return true if both .ptr values are null or both are non-null.

Yes, and yes, I want "".dup to allocate a new 1 byte point at it and set length to 0. Regan
Jul 25 2007
parent reply Don Clugston <dac nospam.com.au> writes:
Regan Heath wrote:
 The only thing that should compare equal to null is null.  Likewise 
 an empty array should only compare equal to another empty array.

> My reasoning for this is consistency, see at end. Since null arrays have length 0, they *are* empty arrays :P.

I can't tell in which way you're joking so I'm just going to come out with... The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length). It either exists or it does not. If it exists, it has a length which may or may not be zero. Something which exists cannot be equal to something which doesn't.

I don't think that's really what's happening here. Consider vectors. If a vector has a length of zero, the direction doesn't exist. Take two arbitrary vectors with different directions, a and b. a*0 == b*0, even though the direction of a is completely different to that of b. This is the same model which is being used for arrays; if the .length is zero, the .ptr is irrelevant.
Jul 25 2007
parent Derek Parnell <derek psyc.ward> writes:
On Wed, 25 Jul 2007 22:07:15 +0200, Don Clugston wrote:

 Regan Heath wrote:
 The only thing that should compare equal to null is null.  Likewise 
 an empty array should only compare equal to another empty array.

> My reasoning for this is consistency, see at end. Since null arrays have length 0, they *are* empty arrays :P.

I can't tell in which way you're joking so I'm just going to come out with... The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length). It either exists or it does not. If it exists, it has a length which may or may not be zero. Something which exists cannot be equal to something which doesn't.

I don't think that's really what's happening here. Consider vectors. If a vector has a length of zero, the direction doesn't exist. Take two arbitrary vectors with different directions, a and b. a*0 == b*0, even though the direction of a is completely different to that of b. This is the same model which is being used for arrays; if the .length is zero, the .ptr is irrelevant.

But arrays are not vectors. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
prev sibling next sibling parent Carlos Santander <csantander619 gmail.com> writes:
Frits van Bommel escribió:
 
 Since null arrays have length 0, they *are* empty arrays :P.
 

But empty arrays are not null. You could even argue that null arrays don't have a length, thus they can't be empty. -- Carlos Santander Bernal
Jul 25 2007
prev sibling parent reply Derek Parnell <derek psyc.ward> writes:
On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:

 Since null arrays have length 0, they *are* empty arrays :P.

Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.
 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.
 
  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }
 
 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

Let's look at this code: --- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?

Your example is misleading for at least two reasons: ** The '==' operator compares the contents of the strings. A null string has no content so there is nothing to compare. This should fail but is doesn't in the current D. It should fail in the same manner that a null object reference fails the '==' operator. ** The output is 'writefln' attempt at given a string representation of the data presented. It (aka Walter) has decided that the string representation of a null array is an empty string. This does not mean that a null array is an empty strng but just that writefln represents it as such. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Derek Parnell wrote:
 On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:
 
 Since null arrays have length 0, they *are* empty arrays :P.

Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.

But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it. Or would you prefer a segfault or diagnostic when accessing (cast(T[])null).length? That'd introduce overhead on every .length access (unless the compiler can statically determine whether an array reference is null).
 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

--- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?

Your example is misleading for at least two reasons: ** The '==' operator compares the contents of the strings. A null string has no content so there is nothing to compare. This should fail but is doesn't in the current D. It should fail in the same manner that a null object reference fails the '==' operator.

This wasn't the point of the example. I could have left out the third element and change the .dup in the second line to a different empty string (f.e. a 0-length slice of the first one) and the point would remain the same: the proposed change would break comparison by '==' for empty non-null strings.
 ** The output is 'writefln' attempt at given a string representation of the
 data presented. It (aka Walter) has decided that the string representation
 of a null array is an empty string. This does not mean that a null array is
 an empty strng but just that writefln represents it as such.

Like I said, the point of the example didn't actually have anything to do with null strings, but rather with a bug in a change Regan proposed to make null strings and non-null empty strings compare unequal, which resulted in non-null empty strings comparing unequal.
Jul 25 2007
next sibling parent Derek Parnell <derek psyc.ward> writes:
On Thu, 26 Jul 2007 07:47:03 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:
 
 Since null arrays have length 0, they *are* empty arrays :P.

Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.

But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it.

I'm trying not to set in concrete the ABI of variable-length arrays. So even though the current D definition is that a VL array consists of a two-element struct and zero or one block of RAM, conceptually a null array doesn't point to anything and does not have a length. So to me it doesn't matter that D allocates space for .length and .ptr portions of the nullVL array, because it still should not use the .length value. But, because theoretically every RAM address possbiel could be stored in the .ptr portion, including zero, I conceed that in D the .ptr and .length both being zero is needed to indicate a null array, even though this disallows the conceptual empty array begining at address zero.
 Or would you prefer a segfault or diagnostic when accessing 
 (cast(T[])null).length? That'd introduce overhead on every .length 
 access (unless the compiler can statically determine whether an array 
 reference is null).

Yes I would. However, too many people are relying on this inconsistency so I'll live with that wart in the language.
 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

--- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?

Your example is misleading for at least two reasons: ** The '==' operator compares the contents of the strings. A null string has no content so there is nothing to compare. This should fail but is doesn't in the current D. It should fail in the same manner that a null object reference fails the '==' operator.

This wasn't the point of the example.

Sorry for misunderstanding.
 I could have left out the third 
 element and change the .dup in the second line to a different empty 
 string (f.e. a 0-length slice of the first one) and the point would 
 remain the same: the proposed change would break comparison by '==' for 
 empty non-null strings.

I agree with you. Two empty non-null strings should compare as equal because the equality test is against the contents of the array and not the addresses of the array. A null array has no content so one has nothing to compare it with; this is why I think that it is an illegal/meaningless operation. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
prev sibling next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Manfred Nowak wrote:
 Frits van Bommel wrote
 
 But the fact of the matter is, 'T[] x = null;' reserves space for
 the .length and sets it to 0. If you have a suggestion for a
 different value to put there, by all means make it.

Suggestion: After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e. `size_t.max' will no more be a valid length for an array.

Uhu... Why whould a slice of the full addressable memory space be a good initialization value?
 This is a hack to avoid some overhead in some places, but may introduce  
 more overhead in other places.

This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).
 Note: after `T[] x= null;' `x' holds an untyped array and so `y= x;' 
 should be a legal assignment for every `y' declared as `U[] y;' for 
 some type `U'---duck and run.

So you are proposing adding runtime type errors? :P -- Oskar
Jul 25 2007
next sibling parent Derek Parnell <derek psyc.ward> writes:
On Thu, 26 Jul 2007 08:37:13 +0200, Oskar Linde wrote:

 Manfred Nowak wrote:
 Frits van Bommel wrote
 
 But the fact of the matter is, 'T[] x = null;' reserves space for
 the .length and sets it to 0. If you have a suggestion for a
 different value to put there, by all means make it.

Suggestion: After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e. `size_t.max' will no more be a valid length for an array.

Uhu... Why whould a slice of the full addressable memory space be a good initialization value?

Maybe x.ptr = size_t.max and x.length = size_t.max might be useful representation of a null array as it is an illegal RAM reference otherwise. But I know, its too late now and probably too expensive at run-time to implement.
 This is a hack to avoid some overhead in some places, but may introduce  
 more overhead in other places.

This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).

You may very well be correct. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
prev sibling next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Oskar Linde wrote:
 Manfred Nowak wrote:
 Frits van Bommel wrote

 But the fact of the matter is, 'T[] x = null;' reserves space for
 the .length and sets it to 0. If you have a suggestion for a
 different value to put there, by all means make it.

Suggestion: After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e. `size_t.max' will no more be a valid length for an array.

Uhu... Why whould a slice of the full addressable memory space be a good initialization value?

It's not the *full* addressable memory space for 1-byte types (the last byte of the address space has an address equal to .ptr(0) + .length(size_t.max), which isn't a member of the array) and it's more than the address space for bigger types (though I guess it does indeed cover the entire address space, possibly several times over, due to wraparound on overflow...). </pedantic>
Jul 26 2007
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Oskar Linde wrote:
 Manfred Nowak wrote:
 
 This is a hack to avoid some overhead in some places, but may 
 introduce  more overhead in other places.

This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).

Today's T[] is "a slice type with value semantics and some provisions for making it behave as an array in some cases"? Whoa. What do you mean "making it behave as an array in some cases" ? What's the difference between a slice type and an array? And why would having null arrays in D break its slice semantics? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 26 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
Frits van Bommel wrote:
 Or would you prefer a segfault or diagnostic when accessing 
 (cast(T[])null).length? 

No, definately not. This is one of the things I love about arrays, they're both value and reference type. It takes a while to get your head round (if the many discussions on these forums are any indication) but once you have it worked out it's quite powerful. In fact it's the reason slicing can work the way it does. Further, for those cases where we do not care to differentiate between null and "" checking length == 0 is the perfect solution. I'm not interested in an array implementation which is 'pure' in any academic sense but rather one which is consistent in that null arrays do not become empty and vice-versa under any conditions (other than explicitly assigning those values). For example: In the past setting length to 0 would free the data pointer. The result of which was that a zero length (empty) array became a non-existant (null) array. And the problem we have now is that calling .dup on an empty array results in a null array. It is cases like these which I was to remove. The other thing I want is for == to tell me that null and "" are not the same. I suspect very little existing code is relying on the existing behaviour as it will likely be checking length as opposed to comparing to "" or null (note; comparing with == not checking identity with "is"). Regan
Jul 26 2007
prev sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Derek Parnell wrote:

 On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:
 
 Since null arrays have length 0, they *are* empty arrays :P.

Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.

But that is not how T[] behaves in D. T[]s are of a dual slice/array nature with semantics closer to a slice than an array. That is something Walter's T[new] suggestion has a potential to remedy. There is no difference between a "null" array and a slice starting at memory location null, 0 elements long. In my opinion, it would be quite strange for zero length slices to behave any differently if the starting position happens to be null. There is a very easy way to get the behavior you want BTW: class Array(T) { ... } :)
 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".



This would mean that "two arrays are equal if all elements are equal" would no longer hold. (Consider two zero length slices at arbitrary memory location, neither of them null). -- Oskar
Jul 25 2007
parent Regan Heath <regan netmail.co.nz> writes:
Oskar Linde wrote:
 This should mean "" == "" but not "" == null, likewise null == null 
 but not null == "".



This would mean that "two arrays are equal if all elements are equal" would no longer hold.

Not true, the two arrays you mention below would still compare 'true' as their contents are still equal. Ignore the suggested code changes, my one was patently incorrect and the first step is to make it clear what behaviour is desired, something I have obviously not done.
 (Consider two zero length slices at arbitrary
 memory location, neither of them null).

The content of these arrays is equal and would compare so. The case(s) I want to stop comparing as equal are: null == "" "" == null The cases which should continue to compare equal are: null == null "" == "" (your example above) No more, no less. Regan p.s. I know I said ignore the suggested code changes but it would have to go something like: if (lhs.length == 0) { if (lhs.ptr && rhs.ptr) return true; //"" == "" if (lhs.ptr || rhs.ptr) return false //"" == null && null == "" return true; //null == null }
Jul 26 2007
prev sibling parent reply Derek Parnell <derek psyc.ward> writes:
On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4? -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
 
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?

I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...
Jul 25 2007
parent reply Derek Parnell <derek psyc.ward> writes:
On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
 
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?

I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...

There is no basis for assuming that any RAM location is not addressable. I know that some operating systems prevent unprivileged programs from accessing certain locations, and that some RAM is hardware-mapped to I/O ports, but in theory, D as a system language should be able to address any RAM location. For example, if D had been implemented for the Amiga system, access to RAM address 4 is vital. As that location contained the 32-bit address of the list that contains all addresses of the loaded shared libraries. And every program needed to access that location. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Derek Parnell wrote:
 On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:
 
 Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

four bytes at RAM location 4?

already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...

There is no basis for assuming that any RAM location is not addressable. I know that some operating systems prevent unprivileged programs from accessing certain locations, and that some RAM is hardware-mapped to I/O ports, but in theory, D as a system language should be able to address any RAM location. For example, if D had been implemented for the Amiga system, access to RAM address 4 is vital. As that location contained the 32-bit address of the list that contains all addresses of the loaded shared libraries. And every program needed to access that location.

I'm sorry, but what would then be the problem with accessing (cast(byte)4)[0..4] if it's a valid memory location? I thought your question implied it was an invalid memory location, though I'm very aware that's not always the case (which was why I had the parenthesized sentence in there). By the way, null is a valid address on x86 too, but most operating systems don't map the first page to any memory to generate pagefaults for null pointer dereferences (and IIRC Linux treats the last page similarly, for null pointers with negative indices). IIRC DOS didn't (and probably couldn't on machines of the time), do this; the interrupt table was located there (which would seem to be a pretty bad idea for a system without memory protection -- a null pointer write could potentially crash the entire system...). Also, there's no particular reason null has to be cast(whatever)0, that just happens to be a convenient easily-checked-for value...
Jul 26 2007
parent Derek Parnell <derek psyc.ward> writes:
On Thu, 26 Jul 2007 09:28:16 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:
 
 Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.





 I'm sorry, but what would then be the problem with accessing 
 (cast(byte)4)[0..4] if it's a valid memory location?

Duh! I am so stupid! I misread Regan's original post. When he said "If the location and length are identical" I incorrectly read that as "if an array's location and length are identical" and not "if the locations and lengths of two arrays are identical". Sorry (as he sulks off hoping no one notices) ... -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 26 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
 
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?

What I meant was: if (lhs.length == rhs.length && lhs.ptr == rhs.ptr) return true; Not: if (lhs.length == lhs.ptr) return true; ;) Regan
Jul 26 2007
prev sibling next sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Frits van Bommel wrote:
 Regan Heath wrote:
 This all boils down to the empty vs null string debate where some 
 people want to be able to distinguish between them and some see no point.

 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

They *are* distinguishable. That's why above code returns different results for the 'is' comparison...

The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable. Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error). -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 25 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Bruno Medeiros wrote:
 Frits van Bommel wrote:
 Regan Heath wrote:
 This all boils down to the empty vs null string debate where some 
 people want to be able to distinguish between them and some see no 
 point.

 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

They *are* distinguishable. That's why above code returns different results for the 'is' comparison...

The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable. Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).

Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and create a reference to it with length of 0. What do you mean by "empty arrays are conceptually the same as null arrays"? To me null arrays (non-existant) and "" arrays (empty) are conceptually different. null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set). Regan
Jul 25 2007
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Regan Heath wrote:
 Bruno Medeiros wrote:
 Frits van Bommel wrote:
 Regan Heath wrote:
 This all boils down to the empty vs null string debate where some 
 people want to be able to distinguish between them and some see no 
 point.

 I'm in the 'distinguishable' camp.  I can see the merit.  At the 
 very least it should be consistent!

They *are* distinguishable. That's why above code returns different results for the 'is' comparison...

The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable. Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).

Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and create a reference to it with length of 0. What do you mean by "empty arrays are conceptually the same as null arrays"?

I meant that in current D they are semantically the same. (I should have used those words)
 To me null arrays (non-existant) and "" arrays (empty) are conceptually 
 different.  null indicates the array does not exist (no set at all), "" 
 indicates it does but contains no items (an empty set).
 
 Regan

I know, and I agree, don't you recall the V2 string discussion: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55388 -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 25 2007
parent Regan Heath <regan netmail.co.nz> writes:
Bruno Medeiros wrote:
 Regan Heath wrote:
 What do you mean by "empty arrays are conceptually the same as null 
 arrays"?

I meant that in current D they are semantically the same. (I should have used those words)

:)
 To me null arrays (non-existant) and "" arrays (empty) are 
 conceptually different.  null indicates the array does not exist (no 
 set at all), "" indicates it does but contains no items (an empty set).

 Regan

I know, and I agree, don't you recall the V2 string discussion: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=55388

Yes, I remember it. I just forgot who was involved and what their opinions were. I have a hard enough time keeping track of my own opinion let alone others. Regan
Jul 25 2007
prev sibling parent reply Derek Parnell <derek psyc.ward> writes:
On Wed, 25 Jul 2007 14:31:28 +0100, Bruno Medeiros wrote:

 The .ptr of empty arrays may be different than the .ptr of null arrays, 
 but they are conceptually the same, and thus not safely distinguishable.

No they are not! Conceptually they are different things. However, D sometimes implements them as the same thing.
 Example:
 	writefln("" is null); // false
 	writefln("".dup is null); // true

 "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, 
 because empty arrays are conceptually the same as null arrays, and 
 trying to use .ptr do distinguish them is unsafe, 
 implementation-depedendent behavior (aka a program error).

But I believe that the implementation here is wrong. "".dup should create another empty string and not a null string. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:31:28 +0100, Bruno Medeiros wrote:
 
 The .ptr of empty arrays may be different than the .ptr of null arrays, 
 but they are conceptually the same, and thus not safely distinguishable.

No they are not! Conceptually they are different things. However, D sometimes implements them as the same thing.

Check my reply to Regan just above, what I meant to say is that in current D they are semantically the same.
 Example:
 	writefln("" is null); // false
 	writefln("".dup is null); // true

 "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, 
 because empty arrays are conceptually the same as null arrays, and 
 trying to use .ptr do distinguish them is unsafe, 
 implementation-depedendent behavior (aka a program error).

But I believe that the implementation here is wrong. "".dup should create another empty string and not a null string.

The implementation is not wrong, it is according to Walter's intention, as you know. If anything, it is Walter's intention that is wrong. ^^' -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 26 2007
prev sibling parent Derek Parnell <derek psyc.ward> writes:
On Wed, 25 Jul 2007 15:05:25 +0200, Frits van Bommel wrote:

 Since both the null string and "" have .length == 0, that means they 
 compare equal using those methods (having no contents to compare and 
 equal length)
 
 This is all perfectly consistent (and even useful) to me...

However, string x = ""; means that 'x' is not null because it has a pointer and that points a string with no content. Something that is null has no pointer and therefore the length component is not significant. But of course, in order to represent something that really does have the address of zero we should only consider 'x' to be null when both x.ptr and x.length are both zero. In every other case it is not null. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007