www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Empty Array is Null?

reply Brian White <bcwhite pobox.com> writes:
char[] array = "".dup;
assert(array !is null);

This will exit because the assert condition is false.

Why is that?

-- Brian
Mar 19 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Brian White" wrote
 char[] array = "".dup;
 assert(array !is null);

 This will exit because the assert condition is false.

 Why is that?

Here is my guess: The compiler does not allocate a piece of memory for "", and so the array struct for it looks like: { ptr = null, length = 0 } If you dup this, it gives you the same thing (no need to allocate an array of size 0). Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code: array !is null is translated to: array.ptr !is null And this is why the program fails the assert. The sucky part about all this is that if you have an empty array where the pointer is NOT null, then you get a different result (that array is not considered to be null) So an array is null ONLY if the pointer is null. An array is empty if the length is 0. If you want to check for an empty array, just check that the length is 0. If you want to make sure that the pointer is null (which implies the length is 0), then check against null. So code like this looks weird to people who are used to C# or Java: array = null; array.length = 5; // you would expect a segfault here Because array is really a struct with some compiler magic, the variable array itself can never truly be null. Anyways, hope this helps. -Steve
Mar 19 2008
next sibling parent Brian White <bcwhite pobox.com> writes:
 Anyways, hope this helps.

It confirms I'm not going insane, and that's always helpful. :-) -- Brian
Mar 19 2008
prev sibling next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Steven Schveighoffer wrote:
 "Brian White" wrote
 char[] array = "".dup;
 assert(array !is null);

 This will exit because the assert condition is false.

 Why is that?

Here is my guess: The compiler does not allocate a piece of memory for "", and so the array struct for it looks like: { ptr = null, length = 0 }

Sorry, but your guess is wrong: --- urxae urxae:~/tmp$ cat test.d import std.stdio; void main() { writefln("%s", "".ptr); } urxae urxae:~/tmp$ dmd -run test.d 805C41C ---
 If you dup this, it gives you the same thing (no need to allocate an array 
 of size 0).

Since as I mentioned above the input wasn't null, so it's not "the same thing". Otherwise, this is correct (including the reason given; actually allocating 0 bytes is pretty useless). The fact that empty_arr.dup returns null has been the topic of some discussion in the newsgroups IIRC, but the fact is that it's equivalent to allocating a zero-byte array on the heap in the most important aspects: * The returned array has the correct length. * All elements of the returned array are identical to the original array. [1] * All of the returned array's elements can be freely modified without modifying the original array. [1] * Changing any of the original elements doesn't change the returned array. [1] * Appending anything to the returned value doesn't risk changing anything previously allocated (as the GC will allocate a new block of memory when appending to a non-gc-allocated array; which includes null arrays). On top of all that, it's also very efficient since it doesn't require any allocation (at least, until anything is appended onto it). The *only* property it doesn't have that 'normal' .dups do have is that normal .dups return unique non-null values. The only ways to even detect that are by 'is'-comparing to null (or a null-valued array) or (implicitly or explicitly) casting it to a boolean. All other behavior is completely consistent. The discussion on the NGs was, IIRC, between those who considered 'null' to mean "no string" while considering other empty strings as "empty string" and those who just don't see any reason to explicitly distinguish between the two. In the end, I believe, it came down to "Walter is in the latter camp". [1]: These are trivially true since having no elements that can be read or written means they don't actually require anything for empty arrays.
 Now, here is the weird part.  The compiler does some magic with arrays.  If 
 you are comparing an array with null, it changes the code to actually just 
 compare the array pointer to null.  So, the the following code:
 
 array !is null
 
 is translated to:
 
 array.ptr !is null
 
 And this is why the program fails the assert.

Actually, if you compare an array to null (using 'is') DMD performs an 'or' instruction on the .ptr and .length and tests for the flag that it sets if the result is zero. This is just an optimization; this is equivalent to checking if both .ptr and .length are 0 (though presumably faster, since it's a single instruction that doesn't even implement full comparison).
 The sucky part about all this is that if you have an empty array where the 
 pointer is NOT null, then you get a different result (that array is not 
 considered to be null)

Actually, 'array == null' should return true for any empty array. Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.
 So an array is null ONLY if the pointer is null.  An array is empty if the 
 length is 0.  If you want to check for an empty array, just check that the 
 length is 0.  If you want to make sure that the pointer is null (which 
 implies the length is 0), then check against null.

Other ways to check for an empty array are 'arr == ""' or 'arr == null' (using '==' instead of 'is')
Mar 19 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Frits van Bommel" wrote
 Steven Schveighoffer wrote:
 "Brian White" wrote
 char[] array = "".dup;
 assert(array !is null);

 This will exit because the assert condition is false.

 Why is that?

Here is my guess: The compiler does not allocate a piece of memory for "", and so the array struct for it looks like: { ptr = null, length = 0 }

Sorry, but your guess is wrong: --- urxae urxae:~/tmp$ cat test.d import std.stdio; void main() { writefln("%s", "".ptr); } urxae urxae:~/tmp$ dmd -run test.d 805C41C

Hm... ok, like I said it was a guess :)
 On top of all that, it's also very efficient since it doesn't require any 
 allocation (at least, until anything is appended onto it).
 The *only* property it doesn't have that 'normal' .dups do have is that 
 normal .dups return unique non-null values. The only ways to even detect 
 that are by 'is'-comparing to null (or a null-valued array) or (implicitly 
 or explicitly) casting it to a boolean. All other behavior is completely 
 consistent.
 The discussion on the NGs was, IIRC, between those who considered 'null' 
 to mean "no string" while considering other empty strings as "empty 
 string" and those who just don't see any reason to explicitly distinguish 
 between the two. In the end, I believe, it came down to "Walter is in the 
 latter camp".

My view is that array is null should not compile, as array is not a pointer type. Having statements like this confuses new coders into thinking array is a pure pointer or reference type, when in fact it is a struct. This is espeically confusing to Java or C# (and probably other) coders who are used to an array being a heap-allocated type. But I seriously doubt my view is going to change anything like others before me :)
 Actually, if you compare an array to null (using 'is') DMD performs an 
 'or' instruction on the .ptr and .length and tests for the flag that it 
 sets if the result is zero. This is just an optimization; this is 
 equivalent to checking if both .ptr and .length are 0 (though presumably 
 faster, since it's a single instruction that doesn't even implement full 
 comparison).

Huh? Why does it do that? If you have a null pointer, then clearly the length should be 0. An optimization in my mind would be to just replace array is null to array.ptr is null. Is there a good reason to have a null pointer array with a non-zero length?
 The sucky part about all this is that if you have an empty array where 
 the pointer is NOT null, then you get a different result (that array is 
 not considered to be null)

Actually, 'array == null' should return true for any empty array. Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.

I would guess that the newest D compiler would not allow that, since comparing to null is now an error except for using 'x is null' Of course, this is another guess, since I haven't downloaded the new compiler yet :) -Steve
Mar 19 2008
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Steven Schveighoffer wrote:
 "Frits van Bommel" wrote
 Actually, if you compare an array to null (using 'is') DMD performs an 
 'or' instruction on the .ptr and .length and tests for the flag that it 
 sets if the result is zero. This is just an optimization; this is 
 equivalent to checking if both .ptr and .length are 0 (though presumably 
 faster, since it's a single instruction that doesn't even implement full 
 comparison).

Huh? Why does it do that? If you have a null pointer, then clearly the length should be 0. An optimization in my mind would be to just replace array is null to array.ptr is null. Is there a good reason to have a null pointer array with a non-zero length?

Indeed, no program should be able to get a non-empty array with .ptr == null. However, it appears the compiler currently doesn't use that as an optimization opportunity. Maybe even only because Walter didn't think of it, or just because it doesn't really save that much and it isn't worth the trouble of checking if one of the values is known to be null at compile time. The 'or' is itself an optimization that only applies when comparing to a 0-length null array, but this optimization may well be implemented completely in the compiler backend which doesn't know that the length should always be null if the pointer is; it may only know that it needs to compare these two numbers against those other two numbers and jump based on the result...
 The sucky part about all this is that if you have an empty array where 
 the pointer is NOT null, then you get a different result (that array is 
 not considered to be null)

arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.

I would guess that the newest D compiler would not allow that, since comparing to null is now an error except for using 'x is null' Of course, this is another guess, since I haven't downloaded the new compiler yet :)

I'm pretty sure it's only an error when comparing class instances. It shouldn't be an error to compare pointers or arrays against null. (There's no reason for it to be since they don't use vtables)
Mar 19 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Frits van Bommel" wrote
 Steven Schveighoffer wrote:
 "Frits van Bommel" wrote
 Actually, if you compare an array to null (using 'is') DMD performs an 
 'or' instruction on the .ptr and .length and tests for the flag that it 
 sets if the result is zero. This is just an optimization; this is 
 equivalent to checking if both .ptr and .length are 0 (though presumably 
 faster, since it's a single instruction that doesn't even implement full 
 comparison).

Huh? Why does it do that? If you have a null pointer, then clearly the length should be 0. An optimization in my mind would be to just replace array is null to array.ptr is null. Is there a good reason to have a null pointer array with a non-zero length?

Indeed, no program should be able to get a non-empty array with .ptr == null. However, it appears the compiler currently doesn't use that as an optimization opportunity. Maybe even only because Walter didn't think of it, or just because it doesn't really save that much and it isn't worth the trouble of checking if one of the values is known to be null at compile time. The 'or' is itself an optimization that only applies when comparing to a 0-length null array, but this optimization may well be implemented completely in the compiler backend which doesn't know that the length should always be null if the pointer is; it may only know that it needs to compare these two numbers against those other two numbers and jump based on the result...

Good point. I wonder if comparing any struct to null is equivalent to comparing if all it's values are 0...
 The sucky part about all this is that if you have an empty array where 
 the pointer is NOT null, then you get a different result (that array is 
 not considered to be null)

Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.

I would guess that the newest D compiler would not allow that, since comparing to null is now an error except for using 'x is null' Of course, this is another guess, since I haven't downloaded the new compiler yet :)

I'm pretty sure it's only an error when comparing class instances. It shouldn't be an error to compare pointers or arrays against null. (There's no reason for it to be since they don't use vtables)

I think you are right. Now that I look at Walter's message, he said specifically comparing class to null is invalid... Thanks -Steve
Mar 19 2008
prev sibling parent reply BCS <BCS pathlink.com> writes:
Steven Schveighoffer wrote:
 Now, here is the weird part.  The compiler does some magic with arrays.  If 
 you are comparing an array with null, it changes the code to actually just 
 compare the array pointer to null.  So, the the following code:
 

no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);
Mar 19 2008
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
BCS wrote:
 Steven Schveighoffer wrote:
 Now, here is the weird part.  The compiler does some magic with 
 arrays.  If you are comparing an array with null, it changes the code 
 to actually just compare the array pointer to null.  So, the the 
 following code:

no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);

It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing is invoking undefined behavior. Or at least it should be, but I can't seem to find any mention of it in the spec...
Mar 19 2008
parent reply torhu <no spam.invalid> writes:
Frits van Bommel wrote:
 BCS wrote:
 Steven Schveighoffer wrote:
 Now, here is the weird part.  The compiler does some magic with 
 arrays.  If you are comparing an array with null, it changes the code 
 to actually just compare the array pointer to null.  So, the the 
 following code:

no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);

It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing is invoking undefined behavior. Or at least it should be, but I can't seem to find any mention of it in the spec...

Can't see why that would be undefined. It's pretty clear what it means. Perhaps it should be an error when the compiler detects that you're setting .ptr to null but .length to nonzero. But the compiler can't be expected to detect that in the general case, so it would be of limited usefulness. A bit like disallowing comparing objects to 'null' with ==.
Mar 19 2008
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
torhu wrote:
 Frits van Bommel wrote:
 BCS wrote:
 Steven Schveighoffer wrote:
 Now, here is the weird part.  The compiler does some magic with 
 arrays.  If you are comparing an array with null, it changes the 
 code to actually just compare the array pointer to null.  So, the 
 the following code:

no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);

It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing is invoking undefined behavior. Or at least it should be, but I can't seem to find any mention of it in the spec...

Can't see why that would be undefined. It's pretty clear what it means.

Yes, it means constructing a completely invalid array. :) Though perhaps it should only be undefined if you ever try to read or write the elements?
 Perhaps it should be an error when the compiler detects that you're 
 setting .ptr to null but .length to nonzero.  But the compiler can't be 
 expected to detect that in the general case, so it would be of limited 
 usefulness.  A bit like disallowing comparing objects to 'null' with ==.

I didn't say it should be detected, only that the compiler should be well within its rights to make your code crash if you do that :P. An error message would also be nice of course, but by no means required. Though as mentioned above, maybe the undefined behavior could be postponed until you actually try to read or write to the array. It's quite similar to dereferencing null pointers: the compiler can refuse to compile code that tries to do it, but most compilers will just generate crashing code...
Mar 19 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Frits van Bommel:
 It's quite similar to dereferencing null pointers: the compiler can 
 refuse to compile code that tries to do it, but most compilers will just 
 generate crashing code...

This can be acceptable for a little C compiler, like TinyCC, but D language is supposed to a safer and less bug-prone language. Otherwise it's just sugared C++ ;-) Bye, bearophile
Mar 20 2008