www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - null Vs [] return arrays

reply bearophile <bearophileHUGS lycos.com> writes:
I have compiled this little D2 program:


int[] foo() {
    return [];
}
int[] bar() {
    return null;
}
void main() {}



Using DMD 2.052,  dmd -O -release -inline test2.d

This is the asm of the two functions:

_D5test23fooFZAi    comdat
L0:     push    EAX
        mov EAX,offset FLAT:_D11TypeInfo_Ai6__initZ
        push    0
        push    EAX
        call    near ptr __d_arrayliteralT
        mov EDX,EAX
        add ESP,8
        pop ECX
        xor EAX,EAX
        ret

_D5test23barFZAi    comdat
        xor EAX,EAX
        xor EDX,EDX
        ret

Is this expected and desired? Isn't it better to compile the foo() as bar()?

Bye,
bearophile
Mar 27 2011
next sibling parent reply Kagamin <spam here.lot> writes:
bearophile Wrote:

 I have compiled this little D2 program:
 
 
 int[] foo() {
     return [];
 }
 int[] bar() {
     return null;
 }
 void main() {}
 
 
 
 Using DMD 2.052,  dmd -O -release -inline test2.d
 
 This is the asm of the two functions:
 
 _D5test23fooFZAi    comdat
 L0:     push    EAX
         mov EAX,offset FLAT:_D11TypeInfo_Ai6__initZ
         push    0
         push    EAX
         call    near ptr __d_arrayliteralT
         mov EDX,EAX
         add ESP,8
         pop ECX
         xor EAX,EAX
         ret
 
 _D5test23barFZAi    comdat
         xor EAX,EAX
         xor EDX,EDX
         ret
 
 Is this expected and desired? Isn't it better to compile the foo() as bar()?
 
 Bye,
 bearophile
[] is not null, it's an array of 0 elements, what is done exactly. edx points to the allocated array.
Mar 27 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Kagamin:

 [] is not null, it's an array of 0 elements, what is done exactly.
 edx points to the allocated array.
I don't understand what you say. I think the caller of foo() and bar() receive the same thing, two empty registers. I think that cast(int[])null and cast(int[])[] are the same thing for D. void main() { assert(cast(int[])null == cast(int[])null); auto a1 = cast(int[])null; a1 ~= 1; auto a2 = 1 ~ cast(int[])null; } Bye, bearophile
Mar 27 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-03-27 11:42, bearophile wrote:
 Kagamin:
 [] is not null, it's an array of 0 elements, what is done exactly.
 edx points to the allocated array.
I don't understand what you say. I think the caller of foo() and bar() receive the same thing, two empty registers. I think that cast(int[])null and cast(int[])[] are the same thing for D. void main() { assert(cast(int[])null == cast(int[])null); auto a1 = cast(int[])null; a1 ~= 1; auto a2 = 1 ~ cast(int[])null; }
What I would _expect_ the difference between a null array and an empty one to be would be that the null one's ptr property would be null, whereas the empty one wouldn't be. But dmd treats them pretty much the same. empty returns true for both. You can append to both. The null one would be a guaranteed memory reallocation when you append to it whereas the empty one may not be, but their behavior is almost identical. How that affects the generated assembly code, I don't know. Particularly if you're compiling with -inline and and -O, the compiler can likely make assumptions about null that it can't make about [], since it probably treats [] more generally without worrying about the fact that it happens to be empty as far as optimizations go - that and there _is_ a semantic difference between null and [] if you're messing with the ptr property, so Walter may think that it's best for null to not be turned into the same thing as [] automatically. - Jonathan M Davis
Mar 27 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 the compiler can likely make 
 assumptions about null that it can't make about [], since it probably treats 
 [] more generally without worrying about the fact that it happens to be empty 
 as far as optimizations go - that and there _is_ a semantic difference between 
 null and [] if you're messing with the ptr property, so Walter may think that 
 it's best for null to not be turned into the same thing as [] automatically.
Thank you for your answer. I have added a low-priority enhancement request. Bye, bearophile
Mar 27 2011
prev sibling parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 Kagamin:
 
 [] is not null, it's an array of 0 elements, what is done exactly.
 edx points to the allocated array.
I don't understand what you say. I think the caller of foo() and bar() receive the same thing, two empty registers. I think that cast(int[])null and cast(int[])[] are the same thing for D.
That's a mistake. Well, if there's no differnce for you, you can use either of them. What's the problem?
Mar 28 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 27 Mar 2011 09:37:47 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 I have compiled this little D2 program:


 int[] foo() {
     return [];
 }
 int[] bar() {
     return null;
 }
 void main() {}



 Using DMD 2.052,  dmd -O -release -inline test2.d

 This is the asm of the two functions:

 _D5test23fooFZAi    comdat
 L0:     push    EAX
         mov EAX,offset FLAT:_D11TypeInfo_Ai6__initZ
         push    0
         push    EAX
         call    near ptr __d_arrayliteralT
         mov EDX,EAX
         add ESP,8
         pop ECX
         xor EAX,EAX
         ret

 _D5test23barFZAi    comdat
         xor EAX,EAX
         xor EDX,EDX
         ret

 Is this expected and desired? Isn't it better to compile the foo() as  
 bar()?
Probably. The runtime that allocates an array looks like this (irrelevant parts collapsed): extern (C) void* _d_arrayliteralT(TypeInfo ti, size_t length, ...) { auto sizeelem = ti.next.tsize(); // array element size void* result; ... if (length == 0 || sizeelem == 0) result = null; else { ... } return result; } So essentially, you are getting the same thing, but using [] is slower. -Steve
Mar 28 2011
parent reply bearophile <bearophileHUGS lycps.com> writes:
Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is slower.
It seems I was right then, thank you and Kagamin for the answers. Bye, bearophile
Mar 28 2011
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 28 Mar 2011 17:54:29 +0100, bearophile <bearophileHUGS lycps.com>  
wrote:
 Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is slower.
It seems I was right then, thank you and Kagamin for the answers.
This may be slightly OT but I just wanted to raise the point that conceptually it's nice to be able to express (exists but is empty) and (does not exist). Pointers/references have null as a (does not exist) "value" and this is incredibly useful. Try doing the same thing with 'int' .. it requires you either use int* or pass an additional boolean to indicate existence.. yuck. I'd suggest if someone types '[]' they mean (exists but is empty) and if they type 'null' they mean (does not exist) and they may be relying on the .ptr value to differentiate these cases, which is useful. If you're not interested in the difference, and you need performance, you simply use 'null'. Everybody is happy. :) R
Apr 01 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Regan Heath:

 conceptually it's nice to be able to express (exists but is empty) and  
 (does not exist).
You may want to express that, but for the implementation of the language those two situations are the same, because in the [] literal the ptr is null. So I think it's better for the programmer to not differentiate the two situations, because they are not different. If the programmer tells them apart, he/she is doing something bad in D, creating a false illusion. Bye, bearophile
Apr 01 2011
prev sibling next sibling parent Torarin <torarind gmail.com> writes:
2011/4/1 Regan Heath <regan netmail.co.nz>:
 On Mon, 28 Mar 2011 17:54:29 +0100, bearophile <bearophileHUGS lycps.com>
 wrote:
 Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is slower.
It seems I was right then, thank you and Kagamin for the answers.
This may be slightly OT but I just wanted to raise the point that conceptually it's nice to be able to express (exists but is empty) and (d=
oes
 not exist). =A0Pointers/references have null as a (does not exist) "value=
" and
 this is incredibly useful. =A0Try doing the same thing with 'int' .. it
 requires you either use int* or pass an additional boolean to indicate
 existence.. yuck.

 I'd suggest if someone types '[]' they mean (exists but is empty) and if
 they type 'null' they mean (does not exist) and they may be relying on th=
e
 .ptr value to differentiate these cases, which is useful. =A0If you're no=
t
 interested in the difference, and you need performance, you simply use
 'null'. =A0Everybody is happy. :)

 R
For associative arrays it certainly would be nice to be able to do something like string[string] options =3D [:]; so that functions can manipulate an empty aa without using ref. Torarin
Apr 01 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 04/01/2011 12:38 PM, Regan Heath wrote:
 On Mon, 28 Mar 2011 17:54:29 +0100, bearophile <bearophileHUGS lycps.com>
wrote:
 Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is slower.
It seems I was right then, thank you and Kagamin for the answers.
This may be slightly OT but I just wanted to raise the point that conceptually it's nice to be able to express (exists but is empty) and (does not exist). Pointers/references have null as a (does not exist) "value" and this is incredibly useful. Try doing the same thing with 'int' .. it requires you either use int* or pass an additional boolean to indicate existence.. yuck. I'd suggest if someone types '[]' they mean (exists but is empty) and if they type 'null' they mean (does not exist) and they may be relying on the .ptr value to differentiate these cases, which is useful. If you're not interested in the difference, and you need performance, you simply use 'null'. Everybody is happy. :)
That's the way I understand this distinction. Unfortunately, D does not really allow this, by semantically treating both indifferently (eg one can put a new element into an null array). Denis -- _________________ vita es estrany spir.wikidot.com
Apr 01 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Apr 2011 06:38:56 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Mon, 28 Mar 2011 17:54:29 +0100, bearophile  
 <bearophileHUGS lycps.com> wrote:
 Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is slower.
It seems I was right then, thank you and Kagamin for the answers.
This may be slightly OT but I just wanted to raise the point that conceptually it's nice to be able to express (exists but is empty) and (does not exist). Pointers/references have null as a (does not exist) "value" and this is incredibly useful. Try doing the same thing with 'int' .. it requires you either use int* or pass an additional boolean to indicate existence.. yuck. I'd suggest if someone types '[]' they mean (exists but is empty) and if they type 'null' they mean (does not exist) and they may be relying on the .ptr value to differentiate these cases, which is useful. If you're not interested in the difference, and you need performance, you simply use 'null'. Everybody is happy. :)
The distinction is useful if you have something to reference (e.g. an empty slice that points at the end of a pre-existing non-empty array). But [] is a new array, no point in allocating memory just so the pointer can be non-null. Can you come up with a use case to show why you'd want such a thing? Your plan would mean that [] is a memory allocation. I'd rather not have the runtime do the lower performing thing unless there is a good reason. As an alternative, you could use (cast(T *)null)[1..1] if you really needed it (this also would be higher performing, BTW since the runtime array literal function would not be called). -Steve
Apr 01 2011
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 01 Apr 2011 13:38:45 +0100, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Fri, 01 Apr 2011 06:38:56 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:

 On Mon, 28 Mar 2011 17:54:29 +0100, bearophile  
 <bearophileHUGS lycps.com> wrote:
 Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is  
 slower.
It seems I was right then, thank you and Kagamin for the answers.
This may be slightly OT but I just wanted to raise the point that conceptually it's nice to be able to express (exists but is empty) and (does not exist). Pointers/references have null as a (does not exist) "value" and this is incredibly useful. Try doing the same thing with 'int' .. it requires you either use int* or pass an additional boolean to indicate existence.. yuck. I'd suggest if someone types '[]' they mean (exists but is empty) and if they type 'null' they mean (does not exist) and they may be relying on the .ptr value to differentiate these cases, which is useful. If you're not interested in the difference, and you need performance, you simply use 'null'. Everybody is happy. :)
The distinction is useful if you have something to reference (e.g. an empty slice that points at the end of a pre-existing non-empty array). But [] is a new array, no point in allocating memory just so the pointer can be non-null. Can you come up with a use case to show why you'd want such a thing?
Ok. Recently I wrote (in C) a function proxy interface. I had to execute a set of functions from one thread, and wanted to 'call' them from potentially many. So, I set up the thread, added events, and a queue, etc and I wrote a proxy function for 'calling' them from the many threads which looks like... void proxy(int func, ...) {} So, it accepts a variable list of args, places them in a structure, places that in the queue, and waits on an event for the proxy thread to execute the command and return the result. Lets say the function I am executing is a database lookup, lets say I have a database field which is a string, lets say it can be NULL (database definition allows NULLS). Now, lets say I want to do these lookups: 1. lookup all objects where the field is NULL 2. lookup all objects where the field is "reganwashere" 3. lookup all objects where the field is "" (empty/non-null) proxy(LOOKUP, NULL); proxy(LOOKUP, "reganwashere"); and in the actual lookup function, invoked by proxy, I call: pFieldValue = va_arg(pArgs, char*); and I get NULL, and "reganwashere". the actual lookup function pFieldValue would be "" (not NULL). But, in D it seems I cannot do this. In D I would have to pass an additional boolean parameter, or add another level of indirection i.e. pass a string[]*. The same problem exists in C if I want to pass an 'int' or any primitive type, I have to pass it as int*, use a boolean, or invent a 'special' value which means essentially NULL/not-set/ignored. There are plenty of other use cases, essentially anywhere where you have something that can exist in one of 3 states: 1. NULL (not set) 2. "" (set, to blank) 3. "anything" (set, to anything) Like.. parsing input from a web page, where a field can: 1. not be present on the page (NULL) 2. be present, but left blank ("") 3. be present, contains "anything" ("anything") This one came up a lot when I worked with web software, we had to be able to detect whether the user was trying to set something to a blank string, and in some cases we wanted that to remove the setting entirely (null & "" being identical ok) or actually set it to a blank string (null & "" being identical, not ok). Or... saving settings to a file from user input, where the user selects a setting from a menu, then enters the value and could: 1. not select setting A, therefore save no value (NULL) 2. select the setting A, enter blank string ("") 3. select the setting A, enter the value "anything" ("anything") Granted (and this was the response 2 years back when this topic came up) I can "work around" the deficiency by using a map/hash/dictionary where I insert key/value pairs, then I can ask it if the key exists. But, this is essentially another level of indirection like an int* or string[]* and is more heavy weight than I might want/need. Ultimately, and people may disagree here, I don't have a problem with pointers, and this is a really 'nice' feature of using pointers, and it seems D's arrays don't share it, which bothers me.
 Your plan would mean that [] is a memory allocation.  I'd rather not  
 have the runtime do the lower performing thing unless there is a good  
 reason.
I'm not too bothered what syntax gets used, provided it was something that you don't accidently use when you do not want it, and wasn't too horrible to use as I don't see this as being a very uncommon occurance (which would warrant/allow ugliness of syntax). "[]" seems logical, as does "new T[]", both are not "null" so the programmer was obviously trying to do something other than pass null.
 As an alternative, you could use (cast(T *)null)[1..1] if you really  
 needed it (this also would be higher performing, BTW since the runtime  
 array literal function would not be called).
That seems to work, but it's hideous syntax for something that is not that uncommon IMO. To remind myself what D does, and try and find another way to achive the same thing I wrote a test case: -------------------- import std.stdio; char[] foo(int state) { switch(state) { default: case 0: return null; case 1: return []; case 2: return new char[0]; case 3: return (cast(char *)null)[1..1]; case 4: return cast(char[])"".dup; case 5: return cast(char[])""[0..0]; } } int main(string[] args) { foreach(int i; 0..6) { char[] arr = foo(i); writefln("foo%d 0x%08x,%d", i, arr.ptr, arr.length); } return 0; } Which outputs: foo0 0x00000000,0 foo1 0x00000000,0 foo2 0x00000000,0 foo3 0x00000001,0 <- your suggestion foo4 0x00000000,0 foo5 0x00000000,0 So, your suggestion appear to be the only way to get an empty array in D. R
Apr 01 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Apr 2011 11:52:47 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Fri, 01 Apr 2011 13:38:45 +0100, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Fri, 01 Apr 2011 06:38:56 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:

 On Mon, 28 Mar 2011 17:54:29 +0100, bearophile  
 <bearophileHUGS lycps.com> wrote:
 Steven Schveighoffer:

 So essentially, you are getting the same thing, but using [] is  
 slower.
It seems I was right then, thank you and Kagamin for the answers.
This may be slightly OT but I just wanted to raise the point that conceptually it's nice to be able to express (exists but is empty) and (does not exist). Pointers/references have null as a (does not exist) "value" and this is incredibly useful. Try doing the same thing with 'int' .. it requires you either use int* or pass an additional boolean to indicate existence.. yuck. I'd suggest if someone types '[]' they mean (exists but is empty) and if they type 'null' they mean (does not exist) and they may be relying on the .ptr value to differentiate these cases, which is useful. If you're not interested in the difference, and you need performance, you simply use 'null'. Everybody is happy. :)
The distinction is useful if you have something to reference (e.g. an empty slice that points at the end of a pre-existing non-empty array). But [] is a new array, no point in allocating memory just so the pointer can be non-null. Can you come up with a use case to show why you'd want such a thing?
Ok. Recently I wrote (in C) a function proxy interface. I had to execute a set of functions from one thread, and wanted to 'call' them from potentially many. So, I set up the thread, added events, and a queue, etc and I wrote a proxy function for 'calling' them from the many threads which looks like... void proxy(int func, ...) {} So, it accepts a variable list of args, places them in a structure, places that in the queue, and waits on an event for the proxy thread to execute the command and return the result. Lets say the function I am executing is a database lookup, lets say I have a database field which is a string, lets say it can be NULL (database definition allows NULLS). Now, lets say I want to do these lookups: 1. lookup all objects where the field is NULL 2. lookup all objects where the field is "reganwashere" 3. lookup all objects where the field is "" (empty/non-null) proxy(LOOKUP, NULL); proxy(LOOKUP, "reganwashere"); and in the actual lookup function, invoked by proxy, I call: pFieldValue = va_arg(pArgs, char*); and I get NULL, and "reganwashere". the actual lookup function pFieldValue would be "" (not NULL). But, in D it seems I cannot do this. In D I would have to pass an additional boolean parameter, or add another level of indirection i.e. pass a string[]*. The same problem exists in C if I want to pass an 'int' or any primitive type, I have to pass it as int*, use a boolean, or invent a 'special' value which means essentially NULL/not-set/ignored.
assert("" !is null); // works on D. Try it.
 Your plan would mean that [] is a memory allocation.  I'd rather not  
 have the runtime do the lower performing thing unless there is a good  
 reason.
I'm not too bothered what syntax gets used, provided it was something that you don't accidently use when you do not want it, and wasn't too horrible to use as I don't see this as being a very uncommon occurance (which would warrant/allow ugliness of syntax). "[]" seems logical, as does "new T[]", both are not "null" so the programmer was obviously trying to do something other than pass null.
It's one thing to want an array with a non-null pointer, but it's another thing entirely to want an array with a non-null pointer which points to a valid heap address. In my opinion, [] means empty array. I don't care what the pointer is, as long as the array is empty. The implementation can put whatever value it wants for the pointer. If it wants to put null, that is fine. null means I want a null pointer. If I had it my way, all array literals would be immutable, and the pointers would point to ROM (even empty ones). We should not be constructing array literals at runtime. But my opinion is still that you should not count on the pointer being anything because it's not specified what it is.
 As an alternative, you could use (cast(T *)null)[1..1] if you really  
 needed it (this also would be higher performing, BTW since the runtime  
 array literal function would not be called).
That seems to work, but it's hideous syntax for something that is not that uncommon IMO.
My opinion is that it is uncommon, but it can be abstracted: template emptyArray(T) { enum emptyArray = (cast(T*)0)[1..1]; } rename as desired.
 To remind myself what D does, and try and find another way to achive the  
 same thing I wrote a test case:
 --------------------
 import std.stdio;

 char[] foo(int state)
 {
 	switch(state)
 	{
 	default:
 	case 0:
 		return null;	
 	case 1:
 		return [];	
 	case 2:
 		return new char[0];
 	case 3:
 		return (cast(char *)null)[1..1];
 	case 4:
 		return cast(char[])"".dup;
 	case 5:
 		return cast(char[])""[0..0];
 	}
 }
 int main(string[] args)
 {
 	foreach(int i; 0..6)
 	{
 		char[] arr = foo(i);
 		writefln("foo%d 0x%08x,%d",
 			i,
 			arr.ptr,
 			arr.length);
 	}
 	return 0;
 }

 Which outputs:


 foo0 0x00000000,0
 foo1 0x00000000,0
 foo2 0x00000000,0
 foo3 0x00000001,0  <- your suggestion
 foo4 0x00000000,0
 foo5 0x00000000,0

 So, your suggestion appear to be the only way to get an empty array in D.

 R
This code seems to disagree with your results for case 5 (dmd 2.052): auto x = cast(char[])""[0..0]; assert(x.ptr != null); // no failure -Steve
Apr 01 2011
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 01 Apr 2011 18:23:28 +0100, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 On Fri, 01 Apr 2011 11:52:47 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:
 But, in D it seems I cannot do this.  In D I would have to pass an  
 additional boolean parameter, or add another level of indirection i.e.  
 pass a string[]*.  The same problem exists in C if I want to pass an  
 'int' or any primitive type, I have to pass it as int*, use a boolean,  
 or invent a 'special' value which means essentially  
 NULL/not-set/ignored.
assert("" !is null); // works on D. Try it.
Yes, but that's because this is a string literal. It's not useful where you're getting your input from somewhere else.. like in the other 2 use cases I mentioned. However.. I've realised that in those cases, where you have an input array you can slice, eg. auto y = "a=123&b=&c=456" auto z = y[8..8]; I do get an array 'z', with a non-null ptr. So, provided D doesn't lose this information when I pass it around I am fine. The other use case may be a little more problematic depending on the method used to read the input from the keyboard, IIRC one of the methods returns null for a blank line of input, which I would have to detect and 'fix' using emptyArray if I wanted to pass it to something that cares about the distinction.
 I'm not too bothered what syntax gets used, provided it was something  
 that you don't accidently use when you do not want it, and wasn't too  
 horrible to use as I don't see this as being a very uncommon occurance  
 (which would warrant/allow ugliness of syntax).  "[]" >> seems logical,  
 as does "new T[]", both are not "null" so the programmer was obviously  
 trying to do something other than pass null.
It's one thing to want an array with a non-null pointer, but it's another thing entirely to want an array with a non-null pointer which points to a valid heap address.
I don't specifically want either of those things. I just want _some way_ to represent 'exists but is empty' and for it to be different to 'does not exist'. Currently D's arrays cannot do that, yet a plain old pointer can.
 In my opinion, [] means empty array.  I don't care what the pointer is,  
 as long as the array is empty.  The implementation can put whatever  
 value it wants for the pointer.  If it wants to put null, that is fine.   
 null means I want a null pointer.
 If I had it my way, all array literals would be immutable, and the  
 pointers would point to ROM (even empty ones).  We should not be  
 constructing array literals at runtime.  But my opinion is still that  
 you should not count on the pointer being anything because
 it's not specified what it is.
Sure, I agree with all that, but I still want some way of representing both states and detecting both states and the problem is that if the language cannot do it at a fundamental level, and requires some weird hack or reliance on string literals then when I use any 3rd party library, or phobos itself it will tell me null and I will have to guess which state it actually means and 'fix' it manually.
 That seems to work, but it's hideous syntax for something that is not  
 that uncommon IMO.
My opinion is that it is uncommon, but it can be abstracted: template emptyArray(T) { enum emptyArray = (cast(T*)0)[1..1]; } rename as desired.
Not useful if you're getting your input from somewhere else, vs trying to create a new empty array. That said, if I were to want this I'd use the literal instead as it seems safer, eg. template emptyArray(T) { enum emptyArray = cast(T[])""[0..0]; }
 This code seems to disagree with your results for case 5 (dmd 2.052):

      auto x = cast(char[])""[0..0];
      assert(x.ptr != null); // no failure
Nope, my case 5 had a 'dup' which you're missing. If I add a new case returning a literal as you have there I get the same result as you. I was intentionally avoiding the literal because I knew it would be non-null (I believe D null terminates literals) and because I want to be able to detect these states on more than just empty string literals. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Apr 05 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 05 Apr 2011 13:24:49 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Fri, 01 Apr 2011 18:23:28 +0100, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 assert("" !is null); // works on D.  Try it.
Yes, but that's because this is a string literal. It's not useful where you're getting your input from somewhere else.. like in the other 2 use cases I mentioned.
But that isn't the same as []. Basically, if you have an existing array, and you want to create a non-null empty array out of it, a slice of [0..0] always works. I know you mention it, but I want to draw attention to the original problem, that [] returns a null array. Other cases where you are not using [] or "" are a separate issue. All the cases you have brought up involve strings, for which there is a non-null array returned for "". I still have not yet seen a compelling use case for making [] return non-null.
 The other use case may be a little more problematic depending on the  
 method used to read the input from the keyboard, IIRC one of the methods  
 returns null for a blank line of input, which I would have to detect and  
 'fix' using emptyArray if I wanted to pass it to something that cares  
 about the distinction.
That is up to the implementation of that function. D provides ways to return an empty array that does not have a null pointer.
 It's one thing to want an array with a non-null pointer, but it's  
 another thing entirely to want an array with a non-null pointer which  
 points to a valid heap address.
I don't specifically want either of those things. I just want _some way_ to represent 'exists but is empty' and for it to be different to 'does not exist'. Currently D's arrays cannot do that, yet a plain old pointer can.
Of course they can, you can check for null vs empty using "is null" or ".empty". The issue you may have is that phobos does not always care about preserving this distinction. One exmaple is dup. It is pointless to dup an empty array (even if non-null) by creating a heap allocation, so it just returns null.
 In my opinion, [] means empty array.  I don't care what the pointer is,  
 as long as the array is empty.  The implementation can put whatever  
 value it wants for the pointer.  If it wants to put null, that is  
 fine.  null means I want a null pointer.
 If I had it my way, all array literals would be immutable, and the  
 pointers would point to ROM (even empty ones).  We should not be  
 constructing array literals at runtime.  But my opinion is still that  
 you should not count on the pointer being anything because
 it's not specified what it is.
Sure, I agree with all that, but I still want some way of representing both states and detecting both states and the problem is that if the language cannot do it at a fundamental level, and requires some weird hack or reliance on string literals then when I use any 3rd party library, or phobos itself it will tell me null and I will have to guess which state it actually means and 'fix' it manually.
The array has the ability to store whether it's null and empty or just empty, you are just expecting every function to care about that distinction, which most don't.
 That seems to work, but it's hideous syntax for something that is not  
 that uncommon IMO.
My opinion is that it is uncommon, but it can be abstracted: template emptyArray(T) { enum emptyArray = (cast(T*)0)[1..1]; } rename as desired.
Not useful if you're getting your input from somewhere else, vs trying to create a new empty array.
Again, not relevant. Getting an empty-but-not-null array from a non-null-non-empty array is trivial. This whole thread is about [].
 That said, if I were to want this I'd use the literal instead as it  
 seems safer, eg.

 template emptyArray(T)
 {
     enum emptyArray = cast(T[])""[0..0];
 }
Either way should be safe. Nothing should use data outside the array bounds.
 This code seems to disagree with your results for case 5 (dmd 2.052):

      auto x = cast(char[])""[0..0];
      assert(x.ptr != null); // no failure
Nope, my case 5 had a 'dup' which you're missing. If I add a new case returning a literal as you have there I get the same result as you. I was intentionally avoiding the literal because I knew it would be non-null (I believe D null terminates literals) and because I want to be able to detect these states on more than just empty string literals.
Quoting from your message previously (with added comment): case 4: return cast(char[])"".dup; case 5: return cast(char[])""[0..0]; // note lack of .dup } -Steve
Apr 05 2011
parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 05 Apr 2011 18:46:06 +0100, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 On Tue, 05 Apr 2011 13:24:49 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:

 On Fri, 01 Apr 2011 18:23:28 +0100, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 assert("" !is null); // works on D.  Try it.
Yes, but that's because this is a string literal. It's not useful where you're getting your input from somewhere else.. like in the other 2 use cases I mentioned.
But that isn't the same as []. Basically, if you have an existing array, and you want to create a non-null empty array out of it, a slice of [0..0] always works. I know you mention it, but I want to draw attention to the original problem, that [] returns a null array. Other cases where you are not using [] or "" are a separate issue. All the cases you have brought up involve strings, for which there is a non-null array returned for "". I still have not yet seen a compelling use case for making [] return non-null.
Ahh.. I see, I really should have renamed the thread title. I'm not, and never was, arguing for [] (specifically) returning non-null. Sorry.
 Quoting from your message previously (with added comment):


 	case 4:
 		return cast(char[])"".dup;
 	case 5:
 		return cast(char[])""[0..0]; // note lack of .dup
 	}
Drat, not sure what happened there. My source had the 'dup' when I went back to it. Sorry. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Apr 07 2011
prev sibling parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 Regan Heath:
 
 conceptually it's nice to be able to express (exists but is empty) and  
 (does not exist).
You may want to express that, but for the implementation of the language those two situations are the same, because in the [] literal the ptr is null. So I think it's better for the programmer to not differentiate the two situations, because they are not different. If the programmer tells them apart, he/she is doing something bad in D, creating a false illusion.
It's bad, when the language is driven by the implementation of a "reference" compiler by the copyright holder. This way compiler bugs and tricks become a language standard. See the story of VP7 codec.
Apr 07 2011