www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [Suggestion] Make if(array) illegal.

reply AJG <AJG_member pathlink.com> writes:
Hi,

This is a suggestion based on a thread from a couple of weeks ago. What about
making if (array) illegal in D? I think it brings ambiguity and a high potential
for errors to the language. The main two uses for this construct can already be
done with a slightly more explicit syntax:

if (array.ptr == null) // Check for a kind of "non-existance."
if (array.length == 0) // Check for explicit emptiness.

On the other hand, one is not sure what if (array) by itself is supposed to
mean, since it's _not_ like C. In C, if (array), where array is typically a
pointer, means simply != NULL. The problem in D is that the array ptr is tricky
and IMHO it's best not to interface with it directly.

I think it would be wise to remove this ambiguity. I propose two options:
1) Make if (array) equal _always_ to if (array.length).
2) Simply make it illegal.

What do you guys think? Walter?

Thanks,
--AJG.
Jul 19 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG wrote:

 Hi,
 
 This is a suggestion based on a thread from a couple of weeks ago. What about
 making if (array) illegal in D? I think it brings ambiguity and a high
potential
 for errors to the language. The main two uses for this construct can already be
 done with a slightly more explicit syntax:
 
 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.
 
 On the other hand, one is not sure what if (array) by itself is supposed to
 mean, since it's _not_ like C. In C, if (array), where array is typically a
 pointer, means simply != NULL. The problem in D is that the array ptr is tricky
 and IMHO it's best not to interface with it directly.
 
 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.
 
 What do you guys think?

Both these suggestions wouldn't effect me. -- Derek Melbourne, Australia 20/07/2005 12:26:36 PM
Jul 19 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 12:27:23 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG wrote:

 Hi,

 This is a suggestion based on a thread from a couple of weeks ago. What  
 about
 making if (array) illegal in D? I think it brings ambiguity and a high  
 potential
 for errors to the language. The main two uses for this construct can  
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is  
 supposed to
 mean, since it's _not_ like C. In C, if (array), where array is  
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr  
 is tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two  
 options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think?

Both these suggestions wouldn't effect me.

:) because you're explicit all the time. Regan
Jul 19 2005
prev sibling parent AJG <AJG_member pathlink.com> writes:
 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.
 
 What do you guys think?

Both these suggestions wouldn't effect me.

I'll take that to mean "yes" if you don't mind. ;)
-- 
Derek
Melbourne, Australia
20/07/2005 12:26:36 PM

Jul 19 2005
prev sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What  
 about
 making if (array) illegal in D? I think it brings ambiguity and a high  
 potential
 for errors to the language. The main two uses for this construct can  
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is supposed  
 to
 mean, since it's _not_ like C. In C, if (array), where array is  
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr is  
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

I prefer the current behaviour (for all the reasons I mentioned in the previous thread): digitalmars.D/25804 "if (array)" is the same as "if (array.ptr)" which acts just like it does in C, comparing it to 0/null. Essentially the "if" statement is checking the not zero state of the variable itself. In the case of value types it compares the value to 0. In the case of pointers and references it compares them to null. In the case of an array, which (as explained in link above) is a mix/pseudo value/reference type, it compares the data pointer to null. The reason this is the correct behaviour is that a null array has a null data pointer, but, an empty array i.e. an existing set containing no elements may have a non-null data pointer. In both cases they have a 0 length property. Of course we could change this, we could remove the case where an array contains no items but has a non-null data pointer. This IMO would remove a useful distinction, the "existing set containing no items" would be un-representable with a single array variable. IMO that would be a bad move, the current situation(*) is good. (*) there remains the problem where setting the length of an array sets the data pointer to null. This can change an "existing set with no elements" into a "non existant set". Regan
Jul 19 2005
next sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi Regan,

Of course we could change this, we could remove the case where an array  
contains no items but has a non-null data pointer. This IMO would remove a  
useful distinction, the "existing set containing no items" would be  
un-representable with a single array variable. IMO that would be a bad  
move, the current situation(*) is good.

I undersntand, and I agree with your opinion. Losing this possible representation would not be good. Moreover, it is unnecessary. But my idea is not about that at all. I don't want to change the way the arrays themselves work. As we know, all current representations will still be available via array.ptr. That's fine with me. I'll never use array.ptr, but if people need it, then it's all good. Just like regular pointers. I don't use them in D at all, but they are still useful to people. The only thing I propose is to remove ambiguity in one kind of construct. If we take a look at the semantics of if (array), you will see what I mean when I said it's different than in C. In C, when you do # if (array) { free(array); array = NULL; } You are literally testing whether it is pointing to an array or not. If it is, delete it and null the pointer. It's very semantic, and very clear. In D, on the other hand, the concept of "pointing to an array" is gone. The reference is always there. It is never null. So when you do if (array) you are saying "if this reference's pointer contains any data." That's a fine query to make, but not via if (array). To me, at least, that is not immediately clear. Asking if (array) to me means "does this array exist?" In D, the answer is always yes. Technically the array reference is always there. Which is why as a sort of hack, array.ptr is tested instead. That's why the semantics of it are lost (or worse, mixed). Therefore, it introduces ambiguity, which is what I want to prevent. If the meaning of an expression is not immediately clear and intuitive, I think people are going to misuse it. I can already see new programmers using that expression to test for emptiness. That would be fine in C. In C, Empty == NotExistent. But not in D. Thus, my idea is to either make it so that it works semantically like C, or at least remove the construct to avoid those potential errors. The worst part is that these errors would be a shifty bugs to catch. Why? Because if (array) _sometimes_ works for length, and sometimes it doesn't. That's just no good in my book. Anyway, remember that if (array.ptr == null) will always be there.
(*) there remains the problem where setting the length of an array sets  
the data pointer to null. This can change an "existing set with no  
elements" into a "non existant set".

This is exactly the sort of thing I meant when I said the array.ptr is tricky. Static/Const arrays also don't help. In general, I don't think the pointer should be messed around with unless (a) you know what you are doing and (b) it's necessary (i.e. when the "bare metal" is needed). Cheers, --AJG.
Jul 19 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 04:45:21 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Of course we could change this, we could remove the case where an array
 contains no items but has a non-null data pointer. This IMO would  
 remove a
 useful distinction, the "existing set containing no items" would be
 un-representable with a single array variable. IMO that would be a bad
 move, the current situation(*) is good.

I undersntand, and I agree with your opinion. Losing this possible representation would not be good. Moreover, it is unnecessary. But my idea is not about that at all. I don't want to change the way the arrays themselves work.

Excellent.
 As we know, all current representations will still be available via  
 array.ptr. That's fine with me. I'll never use array.ptr, but if people  
 need it, then it's all good. Just like regular pointers. I don't use  
 them in D at all, but they are still useful to people.

Ok.
 The only thing I propose is to remove ambiguity in one kind of construct.

First you must convice Walter it's ambiguous, which is what you're trying to do :) I don't think it is.
 If we take a look at the semantics of if (array), you will see what I  
 mean when I said it's different than in C. In C, when you do

 # if (array) { free(array); array = NULL; }

 You are literally testing whether it is pointing to an array or not. If  
 it is, delete it and null the pointer. It's very semantic, and very  
 clear.

Sure. In your example above you represent an array with a pointer.
 In D, on the other hand, the concept of "pointing to an array" is gone.

Yes and no, slicing could be described as pointing to an array. Sure, it doesn't use a pointer per-se but when you slice you create a value type containing a pointer and length, the pointer points somewhere in the original array.
 The reference is always there. It is never null.

Yes, and I can see the angle you're taking here, but I think the current behaviour is consistent, take for example these statements: A. The expression "if (x)" compares the variable x with null or 0. B. Given "char[] p = null;" then "if (p)" should be FALSE. C. Given "char[] p = "";" then "if (p)" should be TRUE. Do you agree with those statements? If not, which ones, and why? If you change "if (x)" for arrays to compare the length property instead of the data pointer then you invalidate all but the last statement C. If you do that then arrays no longer behave like references, pointers, or basic types i.e. int, float, etc.
 So when you do if (array) you are saying "if this reference's pointer  
 contains any data."

No, you're saying "is 'array' null or 0". Given: - an array reference can never be null (as you say) - in all situations when an array reference would be null the data pointer is null therefore, to compare the reference to null you simply compare the data pointer to null.
 That's a fine query to make, but not via if (array). To me, at least,  
 that is not immediately clear.

It may not be immediately clear, but I believe it's consistent and unambiguous.
 Asking if (array) to me means "does this array exist?" In D, the answer  
 is always yes.

It depends what you're referring to as the "array". Yes, a struct always exists. Yes, the reference always refers to it. But, the data pointer is the key element that either exists (not null, allocated) or does not (null). Just as in your C example above.
 Technically the array reference is always there. Which is why as a sort  
 of hack, array.ptr is tested instead.

IMO it's not a hack, it's the correct behaviour (reasoning above) "to compare the reference to null you simply compare the data pointer to null."
 That's why the semantics of it are lost (or worse, mixed).

They're only lost where length is reassigned to 0, this is a bug IMO. Otherwise they're fine AFAICS.
 Therefore, it introduces ambiguity, which is what I want to prevent. If  
 the meaning of an expression is not immediately clear and intuitive, I  
 think people are going to misuse it.

Sure. I understand the point you're trying to make. I just don't agree with any of your reasoning (yet).
 I can already see new programmers using that expression to test for  
 emptiness.

Why? It's an established fact that to check the length of an array you use the length property, it's comparable to just about any other container class in just about any language you care to name. (if not 'length' then 'size' or 'elements' or some other property/member)
 That would be fine in C. In C, Empty == NotExistent.

Not true. char *empty = ""; char *notexistant = null;
 But not in D. Thus, my idea is to either make it so that it works  
 semantically like C, or at least remove the construct to avoid those  
 potential errors.

 The worst part is that these errors would be a shifty bugs to catch. Why?
 Because if (array) _sometimes_ works for length, and sometimes it  
 doesn't.
 That's just no good in my book.

"if (array)" never checks length, it *always* compares 'array' (the variable) with null or 0, nothing more, nothing less. You wouldn't expect "if (x)" to call the length member of a class 'x' would you? Why would you expect it to do so for arrays? Regan
Jul 20 2005
prev sibling next sibling parent reply Dejan Lekic <leka entropy.tmok.com> writes:
Mr Heath, I agree with You on this. 

-- 
...........
Dejan Lekic
  http://dejan.lekic.org
  
Jul 20 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

I don't. Does ... if (array) ... test for an empty array or a non-existent array? I can't tell from the syntax. It is thus ambiguous. if (array.ptr == null) -- test for a non-existence. if (array.length == 0) -- test for emptiness if (array) -- test for which? -- Derek Parnell Melbourne, Australia 20/07/2005 7:47:04 PM
Jul 20 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 19:49:19 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

I don't. Does ... if (array) ... test for an empty array or a non-existent array?

It does what it always does, for every type in D, it tests whether 'array' is null or 0. A null array is a non-existant array, thus it tests for a non-existant array.
 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined. The only 'catch' in this case is that an array cannot be null. However, when an array would be null it's data pointer is null, therefore testing the data pointer _is_ testing the array. Regan
Jul 20 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 20 Jul 2005 22:42:22 +1200, Regan Heath wrote:

 On Wed, 20 Jul 2005 19:49:19 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

I don't. Does ... if (array) ... test for an empty array or a non-existent array?

It does what it always does, for every type in D, it tests whether 'array' is null or 0. A null array is a non-existant array, thus it tests for a non-existant array.

I think I'm not understanding this. I thought that char[] array; defined an eight-byte structure in RAM in which the first 4-bytes is the current length of the array (if it is allocated) and the second 4-bytes is the address of the array data. Initially all eight bytes are zero. Thus when I see "if (array)" I think it is converted into machine language instructions that tests the second 4-bytes against zero. In other words ... if (array) is essentially the same as if (array.ptr == 0) and if (*(cast(int*)((&array)+4)) == 0) I'm only guessing at this, because I haven't see it written down this *explicitly* ;-)
 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined.

Where is that behavior defined? I can't see it in the documentation.
 The only 'catch' in this case is that an array cannot be null. 

Of course not. It's an 8-byte structure. All 8 bytes can be zero though.
 However,  
 when an array would be null it's data pointer is null, therefore testing  
 the data pointer _is_ testing the array.

Huh? You just said that 'array cannot be null' so how does that reconcile with 'when an array would be null'? But back to what I was saying ... if (array) is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it is testing the first 4-byte field or the second 4-byte field in 'array'. It's behaviour may be precisely defined, but I haven't seen that yet. Oh, and there is a difference in semantics to testing array.ptr and array.length. -- Derek Parnell Melbourne, Australia 20/07/2005 10:35:31 PM
Jul 20 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek psych.ward> wrote:
 Does ...

   if (array) ...

 test for an empty array or a non-existent array?

It does what it always does, for every type in D, it tests whether 'array' is null or 0. A null array is a non-existant array, thus it tests for a non-existant array.

I think I'm not understanding this. I thought that char[] array; defined an eight-byte structure in RAM in which the first 4-bytes is the current length of the array (if it is allocated) and the second 4-bytes is the address of the array data. Initially all eight bytes are zero.

I'd say: it defines a variable 'array' which is a reference to a struct/class like you've described.
 Thus when I see "if (array)" I think it is converted into machine  
 language instructions that tests the second 4-bytes against zero.

Because you're thinking of 'array' as a struct. It's not, it's a reference. Thus, "if (array)" compares that reference to null. I'd guess the reason you think of it as a struct is because like a struct it cannot be null. That is the only similarity it has to a struct, all the rest of it's behaviour is that of a reference. Because it's a reference you can set it to null, because it's a reference you can say "if(array)", because it's a reference you can say "if(array is null)", because it's a reference it behaves like any other reference, except for the fact that it cannot be null. It is logically consistent with all other types in D (barring structs), eg. A. The expression "if (x)" compares the variable x with null or 0. B. Given "char[] p = null;" then "if (p)" should be FALSE. C. Given "char[] p = "";" then "if (p)" should be TRUE. These are all correct and true for all types in D barring structs. (replace null and "" with 0 and 1 for value types).
 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined.

Where is that behavior defined? I can't see it in the documentation.

I was referring to the behaviour of "if (x)". Most people know, or quickly learn this behaviour.
 The only 'catch' in this case is that an array cannot be null.

Of course not. It's an 8-byte structure.

No, it's not. Or rather, we have to decide what exactly we're talking about here. Above, you defined a variable 'array'. It is a reference. It refers to an object. The object contains some data and has a length property. The array reference, like any other can be set to 'null'. However the implementation is such that it is defined never to be null. Yet, statements in the form "if (array is null)" and "if (array)" still behave like the reference is null. (thus they are consistent, see A,B,C above) They behave in that way because they check the data pointer, the data pointer is the part of the object that mirrors the state the reference would have, were it not prohibited from being null. In essence the data pointer _is_ the array, the rest is implementation around it.
 All 8 bytes can be zero though.

Just like a normal struct. However an array reference is not itself a struct, it's a reference to a object (struct/class) with a length property.
 However,
 when an array would be null it's data pointer is null, therefore testing
 the data pointer _is_ testing the array.

Huh? You just said that 'array cannot be null' so how does that reconcile with 'when an array would be null'?

The data pointer mirrors the state the reference would have, were it not for the implementation ensuring the reference is never null. Essentially the data pointer _is_ the array, the rest is implementation.
 But back to what I was saying ...

   if (array)

 is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it  
 is testing the first 4-byte field or the second 4-byte field in 'array'.

So? This is no different to any other variable type, try an int for example.
 It's behaviour may be precisely defined, but I haven't seen that yet.

It's behaviour is to test the variable 'array' against null or 0.
 Oh, and there is a difference in semantics to testing array.ptr and
 array.length.

Of course. Which is why changing "if(array)" to test the length breaks logical consistency and is just plain wrong IMO. Regan
Jul 20 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 21 Jul 2005 10:39:54 +1200, Regan Heath wrote:

 On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek psych.ward> wrote:
 Does ...

   if (array) ...

 test for an empty array or a non-existent array?

It does what it always does, for every type in D, it tests whether 'array' is null or 0. A null array is a non-existant array, thus it tests for a non-existant array.

I think I'm not understanding this. I thought that char[] array; defined an eight-byte structure in RAM in which the first 4-bytes is the current length of the array (if it is allocated) and the second 4-bytes is the address of the array data. Initially all eight bytes are zero.

I'd say: it defines a variable 'array' which is a reference to a struct/class like you've described.

Actually that turns out not to be the case. If it was, then 'array' would be represented by a 4-byte value which contained the address of the 8-byte struct, {uint len, void* ptr}. However, if you look at the generated machine code you can see that the 8-byte struct _is_ the 'array'. In other words, 'array' is not a reference to a struct/class. Here is what I found. I compiled this D code ... void main() { char[] array; if (array.ptr == null) { array.length = 2; } if (array.length == 3) { array.length = 4; } if (array) { array.length = 5; } } And this is the generated machine code ... assume CS:__Dmain L0: enter 8,0 push EBX mov dword ptr -8[EBP],0 mov dword ptr -4[EBP],0 cmp dword ptr -4[EBP],0 jne L29 lea EAX,-8[EBP] push EAX push 1 push 2 call near ptr __d_arraysetlength add ESP,0Ch L29: cmp dword ptr -8[EBP],3 jne L3F lea ECX,-8[EBP] push ECX push 1 push 4 call near ptr __d_arraysetlength add ESP,0Ch L3F: mov EDX,-4[EBP] or EDX,-8[EBP] je L57 lea EBX,-8[EBP] push EBX push 1 push 5 call near ptr __d_arraysetlength add ESP,0Ch L57: pop EBX leave ret __Dmain ends As you can see, the 8-byte struct is reserved in the local stack and references to array.ptr and array.length are direct accesses of the stack space and not dereferenced via a pointer. Furthermore, 'if (array)' is equivalent to ... if (array.len == 0 || array.ptr == null) which I think is slightly slower than testing either .length or .ptr [snip]
 Of course. Which is why changing "if(array)" to test the length breaks  
 logical consistency and is just plain wrong IMO.

I'm not asking for it's behavior to be changed, just documented. -- Derek Melbourne, Australia 21/07/2005 9:35:42 AM
Jul 20 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 10:02:49 +1000, Derek Parnell <derek psych.ward> wrote:
 I thought that

    char[] array;

 defined an eight-byte structure in RAM in which the first 4-bytes is  
 the
 current length of the array (if it is allocated) and the second 4-bytes
 is the address of the array data. Initially all eight bytes are zero.

I'd say: it defines a variable 'array' which is a reference to a struct/class like you've described.

Actually that turns out not to be the case. If it was, then 'array' would be represented by a 4-byte value which contained the address of the 8-byte struct, {uint len, void* ptr}. However, if you look at the generated machine code you can see that the 8-byte struct _is_ the 'array'. In other words, 'array' is not a reference to a struct/class. Here is what I found. I compiled this D code ... void main() { char[] array; if (array.ptr == null) { array.length = 2; } if (array.length == 3) { array.length = 4; } if (array) { array.length = 5; } } And this is the generated machine code ... assume CS:__Dmain L0: enter 8,0 push EBX mov dword ptr -8[EBP],0 mov dword ptr -4[EBP],0 cmp dword ptr -4[EBP],0 jne L29 lea EAX,-8[EBP] push EAX push 1 push 2 call near ptr __d_arraysetlength add ESP,0Ch L29: cmp dword ptr -8[EBP],3 jne L3F lea ECX,-8[EBP] push ECX push 1 push 4 call near ptr __d_arraysetlength add ESP,0Ch L3F: mov EDX,-4[EBP] or EDX,-8[EBP] je L57 lea EBX,-8[EBP] push EBX push 1 push 5 call near ptr __d_arraysetlength add ESP,0Ch L57: pop EBX leave ret __Dmain ends As you can see, the 8-byte struct is reserved in the local stack and references to array.ptr and array.length are direct accesses of the stack space and not dereferenced via a pointer.

I'll have to take your word for it, my assembler knowledge is non existant. I'd call this an "optimisation", and a good one at that. This does not refute the fact that the 'array' variable _behaves_ as a reference type, i.e is passed by reference, can have null assigned to it, can be used in "if (array is null)", can be assigned to another reference, and so on. Further, it's described in the docs as an "array reference". So despite the _implementation_ of it, it _behaves_ as a reference type(*). (*)The only exception, the only thing in which it behaves like a struct is the fact that it cannot be null.
 Furthermore, 'if (array)' is
 equivalent to ...

   if (array.len == 0 || array.ptr == null)

Don't you mean: if (array.len != 0 || array.ptr != null) ? Does the assembler above show this? This: if (array.len != 0 || array.ptr != null) is in fact identical in effect/meaning to: if (array.ptr != null) because length cannot be anything other than 0 when the data pointer is null, in other words this is impossible: if (array.ptr == null && length != 0) { //impossible } note that: if (array.ptr != null && length == 0) { //not impossible }
 Of course. Which is why changing "if(array)" to test the length breaks
 logical consistency and is just plain wrong IMO.

I'm not asking for it's behavior to be changed, just documented.

Sure. I can appreciate the desire to have things set down explicitly for reference. Regan
Jul 20 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:


[snip]
 
 This does not refute the fact that the 'array' variable _behaves_ as a  
 reference type, ...

 i.e is passed by reference,

Well ... not always. If the function parameter is an 'in' type, then the 8-byte struct is passed to the function and not a reference to it. If the parameter is either 'out' or 'inout' then the address of the 8-byte struct is passed to the function.
 can have null assigned to it,  

This just sets the 8-bytes to zero.
 can be used in "if (array is null)", 

This is identical to 'if (array)' according to the generated machine code.
can be assigned to another reference,  

This just copies the source struct 8 bytes to the target struct's 8 bytes.
 and so on. Further, it's described in the docs as an "array reference". So  
 despite the _implementation_ of it, it _behaves_ as a reference type(*).
 
 (*)The only exception, the only thing in which it behaves like a struct is  
 the fact that it cannot be null.

Often there seems to be a confusion between the 'array' reference and the reference to the data that 'array' owns. -- Derek Melbourne, Australia 21/07/2005 10:23:03 AM
Jul 20 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 10:45:40 +1000, Derek Parnell <derek psych.ward> wrote:
 This does not refute the fact that the 'array' variable _behaves_ as a
 reference type, ...

 i.e is passed by reference,

Well ... not always. If the function parameter is an 'in' type, then the 8-byte struct is passed to the function and not a reference to it. If the parameter is either 'out' or 'inout' then the address of the 8-byte struct is passed to the function.

Cool. Optimisations.
 can have null assigned to it,

This just sets the 8-bytes to zero.

Like opAssign for a normal struct could do.
 can be used in "if (array is null)",

This is identical to 'if (array)' according to the generated machine code.

Cool.
 can be assigned to another reference,

This just copies the source struct 8 bytes to the target struct's 8 bytes.

And/or creates a new one (i.e. if slicing)
 and so on. Further, it's described in the docs as an "array reference".  
 So
 despite the _implementation_ of it, it _behaves_ as a reference type(*).

 (*)The only exception, the only thing in which it behaves like a struct  
 is the fact that it cannot be null.

Often there seems to be a confusion between the 'array' reference and the reference to the data that 'array' owns.

Right. Thanks, this thread has been enlightening. I believe this statement accurately describes arrays. "Array references _behave_ like references but are _implemented_ as stack based structs." In other words treat it like a reference as that is what it's pretending to be. At the same time you get the performance of a stack based struct. This is yet more evidence as to why arrays are great. In short, I still believe "if(array)" is doing it's job correctly (in effect, if not exactly - see changes below) . I don't believe people will commonly expect this statement to check the length of an array, nor do I think it should be illegal. I believe Walter has tried to remove the distinction between a non-existant array and an empty one (going on the results you're shown here) but has failed in some areas, thankfully, because I still believe it is a useful distinction. In fact I'd say he's got the implementation of arrays pretty much perfect, I would make the following changes: - change "if(array)" and "if(array is null)" to check the data pointer only (it's pointless checking length). - fix array.length = 0; so as it doesn't set the data pointer to null. Regan
Jul 20 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:

 On Thu, 21 Jul 2005 10:02:49 +1000, Derek Parnell <derek psych.ward> wrote:

[snip]
 
 Furthermore, 'if (array)' is
 equivalent to ...

   if (array.len == 0 || array.ptr == null)

Don't you mean: if (array.len != 0 || array.ptr != null) ?

 Does the assembler above show this?

Yes. mov EDX,-4[EBP] ; Put the ptr into DX register or EDX,-8[EBP] ; OR the DX register with the length je L57 ; jump if the result is zero -- Derek Melbourne, Australia 21/07/2005 10:50:30 AM
Jul 20 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

It does what it always does, for every type in D, it tests whether 'array'  
is null or 0.
A null array is a non-existant array, thus it tests for a non-existant  
array.

That's not exactly true. As you mentioned yourself, .length = 0 makes the pointer null, yet isn't the array "existant?" This kind of implementation defect should not be exposed in the language.
 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined. The only 'catch' in this case is that an array cannot be null. However, when an array would be null it's data pointer is null,

Isn't this a contradiction?
therefore testing the data pointer _is_ testing the array.

That's where I beg to differ. That's the source of ambiguity. To _you_ it may seem like "testing the data pointer _is_ testing the array," but that's most certainly not the only interpretation, and in fact I think it's a misleading one. Testing the array ptr is _just_ that, testing a pointer, some random block of memory that just happens to be used by your array. It is unsemantic and unclear. I am certain it will be misused by both the C camp and new programmers. This behaviour is not even documented anywhere. The problem once again is that in D, "testing the array" doesn't mean anything outright because the array is always there. Technically if (array) should _always_ return true. Therefore, I think it would be much more consistent to use the .length property rather than .ptr for this implicit test, or ban the implicit test. Why is .length better? 1) It is much more semantic. It means in D what it would have meant in C. 2) It is a simple test for numerical emptiness. Nothing more, nothing less. No memory involved. No philosophical questions about null/empty needed. 3) It is not prone to weird memory incongruences (e.g. an empty existant array) or changes in the technical details of the implementation. 4) It is consistent: It works exactly the same with normal arrays, dynamic arrays, static arrays, associative arrays, and even raw pointers (which map directly to C's behaviour). I think there is another non-ambiguous option now (C): A) Make if (array) equal to if (array.length) B) Make if (array) illegal. C) Make if (array) always return true, since the array is always there. I prefer A first, then B, then C as a last resort. Thanks for listening. --AJG.
Jul 20 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 14:29:13 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 It does what it always does, for every type in D, it tests whether  
 'array'
 is null or 0.
 A null array is a non-existant array, thus it tests for a non-existant
 array.

That's not exactly true. As you mentioned yourself, .length = 0 makes the pointer null, yet isn't the array "existant?"

Not anymore, that is why this is a BUG.
 This kind of implementation defect
 should not be exposed in the language.

It is a BUG.
 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined. The only 'catch' in this case is that an array cannot be null. However, when an array would be null it's data pointer is null,

Isn't this a contradiction?

No. We have 2 facts: 1. array _references_ are never null. 2. null arrays have null data pointers. To be clear a "null array" is an array to which you have assigned null, or to which nothing has ever been assigned. It represents "non-existant".
 therefore testing the data pointer _is_ testing the array.

That's where I beg to differ. That's the source of ambiguity. To _you_ it may seem like "testing the data pointer _is_ testing the array," but that's most certainly not the only interpretation, and in fact I think it's a misleading one. Testing the array ptr is _just_ that, testing a pointer, some random block of memory that just happens to be used by your array. It is unsemantic and unclear.

It's identical to the C code you posted which you said was semantic and clear. The data pointer is the part of the array struct that mirrors the value the array reference would have were it not for the additional safety features they have i.e. can never be null. Therefore in my opinion the data pointer _is_ the array.
 The problem once again is that in D, "testing the array" doesn't mean  
 anything outright because the array is always there.

You're confusing implementation with concept. Walter has chosen for the implementation to ensure the array _reference_ is never null, yet, it's still possible to assign 'null' to one, in order to represent a 'null array', when you do so it sets the data pointer to null. If you ignore what you know about how an array works internally and just look at it from the point of view that it is another reference like any other then it's current behaviour is perfectly consistent with all other types. You can treat an array like any other class with a "length" member/property. The added bonus with arrays is that: - they can be created on the fly implicitly. - you can never have a null reference to one. Would you expect "if (x)" to call a member function of a class x?
 Technically if (array) should _always_ return true.

No, technically they should not, for if they did: A. The expression "if (x)" compares the variable x with null or 0. B. Given "char[] p = null;" then "if (p)" should be FALSE. Then statement B would be incorrect, as "if (p)" would return TRUE and this would be inconsistent with other types in D.
 Therefore, I think it would be much more consistent

Less consistent, because then you would break this logic: A. The expression "if (x)" compares the variable x with null or 0. B. Given "char[] p = null;" then "if (p)" should be FALSE. C. Given "char[] p = "";" then "if (p)" should be TRUE. All 3 statements are correct and true for all pointer/reference types, and are also all correct and true for value types, except structs, if you replace the null and "" with appropriate values eg. 0 and 1 In short, if you set an array to null "if (array)" will be FALSE. if you set an array to anything else "if (array)" will be TRUE. if you change "if (array)" to test length you break that logic. You'll also note that the statement "if (array is null)" is true for arrays to which you have assigned null, in short: although the array reference is not itself null it pretends to be in situations where it would be, were it not for the implementation ensuring it cannot be (for crash safety reasons).
 Why is .length better?
 1) It is much more semantic. It means in D what it would have meant in C.
 2) It is a simple test for numerical emptiness. Nothing more, nothing  
 less. No memory involved. No philosophical questions about null/empty  
 needed.
 3) It is not prone to weird memory incongruences (e.g. an empty existant  
 array) or changes in the technical details of the implementation.
 4) It is consistent: It works exactly the same with normal arrays,  
 dynamic arrays, static arrays, associative arrays, and even raw pointers  
 (which map directly to C's behaviour).

Why is .length wrong? 1. It makes the behaviour of "if (x)" inconsistent with other types. 2. It makes arrays inconsistent, "if (x)" no longer returns FALSE for an array to which you have assigned null. In short it breaks the logical consistency of types.
 I think there is another non-ambiguous option now (C):
 A) Make if (array) equal to if (array.length)
 B) Make if (array) illegal.
 C) Make if (array) always return true, since the array is always there.

 I prefer A first, then B, then C as a last resort.

I prefer the current situation. The options above all break consistency. Regan
Jul 20 2005
prev sibling next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:1k0mwc3gtmj73.inn5n1oiajb5$.dlg 40tude.net...
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

I don't. Does ... if (array) ... test for an empty array or a non-existent array? I can't tell from the syntax. It is thus ambiguous. if (array.ptr == null) -- test for a non-existence. if (array.length == 0) -- test for emptiness if (array) -- test for which?

I can sympathize with the argument that it should be illegal to implicitly test 'array' but presumably we'd want to keep implicit conversion to the ptr in calls like void foo(char* p); foo(array); That would mean 'array' is implicitly converted to ptr in some places but not everywhere and that seems like a slippery slope. It might be easier to just live with the current behavior. For example dlint can flag implicit array conditions. Then again we already have 'if (x = y)' illegal so there is precendent for filtering conditions - the good-old 'value does not give boolean result' error.
Jul 20 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

I can sympathize with the argument that it should be illegal to implicitly 
test 'array' but presumably we'd want to keep implicit conversion to the ptr 
in calls like
  void foo(char* p);
  foo(array);
That would mean 'array' is implicitly converted to ptr in some places but 
not everywhere and that seems like a slippery slope.

I agree that this is something to think about. Of course, there is a fundamental difference here. foo (char *) expects a pointer. if (array) expects a bool (well, int, technically; another D annoyance). This is a clear distinction to me, one that prevents the slippery slope.
It might be easier to just live with the current behavior.

That's just laziness speaking ;).
Then again we already have 'if (x = y)' illegal so there is precendent for 
filtering conditions - the good-old 'value does not give boolean result' 
error.

Yes! That's exactly what I was thinking. D even has its cake and eats it, because (x = y) is still legal with an additional explict == true/false; this is great. It allows you to do it yet prevents the common missing = mistake. This is analogous to if (array). The pointer check can still be done via array.ptr, but D would error out when using the ambiguous form. So there is definitely precedent, and it's a good precendent. Cheers, --AJG.
Jul 20 2005
parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
It might be easier to just live with the current behavior.

That's just laziness speaking ;).

Maybe "easier" isn't the right word :-) The last time this topic came up one suggestion was to encourage explicit .length or .ptr conditions but to keep the current implicit conversions. For example the C++string vs D string page http://www.digitalmars.com/d/cppstrings.html was changed to test for empty as: if (!array.length) ... It's in the section "Checking For Empty Strings". It used to just be "if (!array)", I think.
Then again we already have 'if (x = y)' illegal so there is precendent for
filtering conditions - the good-old 'value does not give boolean result'
error.

Yes! That's exactly what I was thinking. D even has its cake and eats it, because (x = y) is still legal with an additional explict == true/false; this is great. It allows you to do it yet prevents the common missing = mistake. This is analogous to if (array). The pointer check can still be done via array.ptr, but D would error out when using the ambiguous form. So there is definitely precedent, and it's a good precendent.

In fact now that I think about the 'if (!array)' code if we made 'if (array)' illegal we'd also need a special check for 'if (!array)'. That's at least two more special cases for conditions.
Jul 21 2005
prev sibling next sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
My vote is against.

Derek Parnell schrieb:
 Does ...
 
   if (array) ...
 
 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.
 
   if (array.ptr == null) -- test for a non-existence.
 
   if (array.length == 0) -- test for emptiness
 
   if (array) -- test for which?

Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous. And at all, arrays have somewhat pointer-like semantics in D, so it should stay, among other reasons. One of the reasons is that it seems familiar to C programmers and makes the foreach..else syntax suggestion from AJG very unnecessary. -eye
Jul 20 2005
next sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

Making difference between an empty array and a nonexistent one is flaky, 
if not directly ambiguous, thus D does not do it, as far as i can 
remember the statement of Walter. Thus if(array) is not ambiguous.

Hm... not only does this distinction exist, it is in fact _very_ much available in D. That's exactly the point Regan has made in some past replies. I'm indifferent towards this distinction, but Regan seems fond of it. Please look at my examples further below.
And at all, arrays have somewhat pointer-like semantics in D.

No, the do not, IMHO. This is one of the points I've tried to make. Arrays have completely different semantics in D compared to C. In D arrays are first-class objects. They are handled via references, which can't be nulled, they keep their own length, etc. I think this is a good thing. Very different from C.
One of the reasons is that it seems 
familiar to C programmers.

Indeed. It seems familiar, and people will misuse it because of that. But then the boogieman comes and gets them in the form of a weird bug. Examples of the incongruence (empty _but_ existant array): # int[0] emptyArray; # if (emptyArray) writef("See, I'm empty, yet I exist!"); // The statement will print. // Let's try it again: # int[] emptyArray = new int[0]; # if (emptyArray) writef("I'm still empty, but non-existant."); // The statement will *not* print. // Think about strings: # string emptyString = ""; # if (emptyString) writef("Empty, yet I exist"); // The statement will *not* print. Is that last test not a reasonable thing to do? It seems pretty harmless. You want to test for an empty string, an empty array. But you still get true. But what about this: # string emptyString = null; # if (emptyString) writef("Empty, but now I don't exist"); // The statement will print. Would you say the behaviour I showed above is consistent? You don't find it a tad, say, ambiguous? You don't think people will be confused? I certainly was.
makes the foreach..else syntax suggestion from AJG very unnecessary.

Huh? I don't see how the two things are related. You may have a valid point, but I fail to see the connection. Cheers, --AJG.
Jul 20 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 00:04:56 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Making difference between an empty array and a nonexistent one is flaky,
 if not directly ambiguous, thus D does not do it, as far as i can
 remember the statement of Walter. Thus if(array) is not ambiguous.

Hm... not only does this distinction exist, it is in fact _very_ much available in D. That's exactly the point Regan has made in some past replies. I'm indifferent towards this distinction, but Regan seems fond of it. Please look at my examples further below.

It's true.
 And at all, arrays have somewhat pointer-like semantics in D.

No, the do not, IMHO. This is one of the points I've tried to make. Arrays have completely different semantics in D compared to C. In D arrays are first-class objects. They are handled via references, which can't be nulled, they keep their own length, etc. I think this is a good thing. Very different from C.

The point I'm trying to make is that in D an array can be nulled, and it has meaning, eg. char[] p = null; you're confusing the _implementation_ of arrays with the _behaviour_ of arrays, the above array _referece_ behaves just like any other reference that has been nulled(*) eg. if (p is null) { //true } (*) the exception being that the _implementation_ protects you by ensuring the reference always refers to a valid object. The objects data pointer then mirrors the actual state of the array. In addition several optimisations go on in the background, removing the actual reference (as Derek has shown in another post) which makes sense as it's not ever null, thus not required for the _implementation_.
 One of the reasons is that it seems
 familiar to C programmers.

Indeed. It seems familiar, and people will misuse it because of that.

How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is perfectly correct in all cases(*). (*) except for the _BUG_ where you can write: char[] p = ""; p.length = 0; if (p) { //false, length = 0 resets the data pointer to null }
 But then
 the boogieman comes and gets them in the form of a weird bug.

 Examples of the incongruence (empty _but_ existant array):

 # int[0] emptyArray;
 # if (emptyArray) writef("See, I'm empty, yet I exist!");
 // The statement will print.

This is a static array. It's data pointer can never be null, thus it always exists. (Nothing incongruous here)
 // Let's try it again:
 # int[] emptyArray = new int[0];
 # if (emptyArray) writef("I'm still empty, but non-existant.");
 // The statement will *not* print.

Here you have not allocated any memory, thus nothing exists. (Nothing incongruous here)
 // Think about strings:
 # string emptyString = "";
 # if (emptyString) writef("Empty, yet I exist");
 // The statement will *not* print.

Wrong, this statement will print (try it). The reason it prints is that memory _is_ allocated because string constants are C compatible i.e. contain a null terminator. If this was not the case then this would act as the previous example. (Nothing incongruous here)
 Is that last test not a reasonable thing to do? It seems pretty  
 harmless. You want to test for an empty string, an empty array. But you  
 still get true.

You're asking the wrong questions. The statement "if(x)" asks is x null or 0, it does not ask "is this string longer than 0 characters" or "does this array contain more than 0 elements". The correct question is: if (x.length > 0) {} Just like most any other container class you care to name/try.
 But what about this:
 # string emptyString = null;
 # if (emptyString) writef("Empty, but now I don't exist");
 // The statement will print.

Wrong, it will not print. The array is null, nothing exists. (Nothing incongruous here)
 Would you say the behaviour I showed above is consistent?

Yes.
 You don't find it a tad, say, ambiguous?

No.
 You don't think people will be confused? I certainly was.

That's because you're asking the wrong questions, and you didn't check your answers.
 makes the foreach..else syntax suggestion from AJG very unnecessary.

Huh? I don't see how the two things are related. You may have a valid point, but I fail to see the connection.

I'm not sure either. I suspect he's referring to foreach being usable on a null array equally well, i.e. you dont have to check whether it's a null array, it will iterate 0 times for both a null array and an emtpy array. Regan
Jul 20 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi Regan,

In article <opst8meeo123k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Thu, 21 Jul 2005 00:04:56 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Making difference between an empty array and a nonexistent one is flaky,
 if not directly ambiguous, thus D does not do it, as far as i can
 remember the statement of Walter. Thus if(array) is not ambiguous.

Hm... not only does this distinction exist, it is in fact _very_ much available in D. That's exactly the point Regan has made in some past replies. I'm indifferent towards this distinction, but Regan seems fond of it. Please look at my examples further below.

It's true.

Praise the lord, agreement. ;)
 And at all, arrays have somewhat pointer-like semantics in D.

No, the do not, IMHO. This is one of the points I've tried to make. Arrays have completely different semantics in D compared to C. In D arrays are first-class objects. They are handled via references, which can't be nulled, they keep their own length, etc. I think this is a good thing. Very different from C.

The point I'm trying to make is that in D an array can be nulled, and it has meaning, eg. char[] p = null; you're confusing the _implementation_ of arrays with the _behaviour_ of arrays, the above array _referece_ behaves just like any other reference that has been nulled(*) eg.

I'm well aware of the implementation vs. the behaviour. It just so happens the two are married when it comes to the compiler. In fact, in the resulting executable, they are indistinguishable. Confusion arises as a result.
 One of the reasons is that it seems
 familiar to C programmers.

Indeed. It seems familiar, and people will misuse it because of that.

How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is perfectly correct in all cases(*).

And except for static arrays. Oh, and strings, which must be compatible with C. Since strings are a fairly important piece of the puzzle, I'd say this is problematic.
(*) except for the _BUG_ where you can write:

char[] p = "";
p.length = 0;
if (p) { //false, length = 0 resets the data pointer to null }

Has Walter actually acknowledged this to be a bug? This seems more like what you mentioned, a desire to make the distinction (empty/exist) dissapear. If that's the case, then why would you say it's a bug? If anything, it could only get worse.
 # int[0] emptyArray;
 # if (emptyArray) writef("See, I'm empty, yet I exist!");
 // The statement will print.

This is a static array. It's data pointer can never be null, thus it always exists. (Nothing incongruous here)

My friend, that's the very definition of an incongruence. It means static arrays do not follow the same principles as other kinds (just like strings). # int[0] empty; // Not null. # int[ ] empty = new int[0]; // Yes null. I even went ahead and _assigned_ an empty array (int[0]) to the reference, and yet it remains _non_ existant. How do you explain that? You can't have a dynamic array that is empty and non-existant, but you _can_ have a static one? (or at least, not via the initializer?) Let's analyze this carefully, and you will definitely see an incongruence: # int[] A = null; # int[] B = new int[0]; if (A) // this is false. if (B) // this is false. Since false == false, then A == B, and therefore null == int[0]. The very distinction you are so fond of is gone! So in this case empty == non-existant, but all over the place it isn't? _That's_ an incongruence.
 // Let's try it again:
 # int[] emptyArray = new int[0];
 # if (emptyArray) writef("I'm still empty, but non-existant.");
 // The statement will *not* print.

Here you have not allocated any memory, thus nothing exists. (Nothing incongruous here)

Oh, so then it's purely about memory? How very semantic. Nevermind the fact that int[0] means an empty array. The distinction is lost, as shown above. IMHO there's no way around this one.
 // Think about strings:
 # string emptyString = "";
 # if (emptyString) writef("Empty, yet I exist");
 // The statement will *not* print.

Wrong, this statement will print (try it). The reason it prints is that memory _is_ allocated because string constants are C compatible i.e. contain a null terminator. If this was not the case then this would act as the previous example.

"If this was not the case". That's fine, but it happens to _be_ the case. Therefore the docs should state: "There is an incongruence when it comes to string literals. Because we want them to be compatible with C, it means an empty string is not really empty. In other words, what should have been an empty array is really not. Careful, folks!"
 But what about this:
 # string emptyString = null;
 # if (emptyString) writef("Empty, but now I don't exist");
 // The statement will print.

Wrong, it will not print. The array is null, nothing exists. (Nothing incongruous here)
 Would you say the behaviour I showed above is consistent?


If you agree with the previous statements, you'll concur that the behaviour is not consistent. It calls for exceptions to be made and explained. Once more gratuitously: static vs. dynamic, and string literals, and the .length "bug," and the dynamic initializer problem.
 You don't find it a tad, say, ambiguous?


If you at least agree it's inconsistent, then we are getting somewhere. The ambiguity results in not knowing when which is going to happen. Since there is no documentation on this, the problem is only aggravated.
 You don't think people will be confused? I certainly was.

That's because you're asking the wrong questions, and you didn't check your answers.

I did check my answers, and now I know. I made the mistake, and by _chance_ one case didn't work early on, so I started looking under the hood. But how many people will go to their graves with bugs like that still coded? How many bugs like that exist as we speak? Remember, for _most_ cases, it will not show up. Tell me this, do you agree with this statement: People (mistakedly) may use if (array) to test for the emptiness of an array. What about this: Moreover, this test will work most of the time. And finally: The remaining times, they are bugs. My proposal aims to prevent those bugs.
 makes the foreach..else syntax suggestion from AJG very unnecessary.

Huh? I don't see how the two things are related. You may have a valid point, but I fail to see the connection.

I'm not sure either. I suspect he's referring to foreach being usable on a null array equally well, i.e. you dont have to check whether it's a null array, it will iterate 0 times for both a null array and an emtpy array.

If this is true, Ilya, that was never the intention of my suggestion. I know that foreach is "safe" even with "null" arrays. The suggestion is a way to deal with the no-items case elegantly without using a separate if statement every single time. As a matter of fact, no-items happens quite a bit IMHO. Thanks for reading, --AJG.
Jul 20 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 02:18:27 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Hm... not only does this distinction exist, it is in fact _very_ much
 available
 in D. That's exactly the point Regan has made in some past replies. I'm
 indifferent towards this distinction, but Regan seems fond of it.  
 Please
 look at
 my examples further below.

It's true.

Praise the lord, agreement. ;)

We're both men of "distinction" ;)
 you're confusing the _implementation_ of arrays with the _behaviour_ of
 arrays, the above array _referece_ behaves just like any other reference
 that has been nulled(*) eg.

I'm well aware of the implementation vs. the behaviour.It just so happens the two are married when it comes to the compiler. In fact, in the resulting executable, they are indistinguishable. Confusion arises as a result.

Sorry, I don't see your point. The compiler isn't confused, neither am I. Arrays are references, treat them as such and there is no confusion.
 One of the reasons is that it seems
 familiar to C programmers.

Indeed. It seems familiar, and people will misuse it because of that.

How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is perfectly correct in all cases(*).

And except for static arrays.

No, this is no exception to the rule. Yes, static arrays are different to dynamic ones, no surprises there. Yes, static arrays cannot have a null data pointer, no, it makes no difference to the behaviour of "if(x)", nor should it. static arrays are the same as dynamic ones that _exist_, this makes perfect sense as static arrays always exist.
 Oh, and strings, which must be compatible with C.

Again, there is no exception to the rule here. "bob" is a static string, it cannot be null. "" is a static string, it cannot be null. Yes, the last example has no items, i.e. has a 0 length, but it still _exists_. If Walter decided to remove the trailing null and make it incompatible with C then it could be optimised away, i.e. the compiler could decide "" was meaningless and so could remove it, making it non existant. In that case it wouldn't exist. Otherwise it does. As long as it exists it has a non-null data pointer. The length is meaningless when talking about existance.
 (*) except for the _BUG_ where you can write:

 char[] p = "";
 p.length = 0;
 if (p) { //false, length = 0 resets the data pointer to null }

Has Walter actually acknowledged this to be a bug?

In short, no. But then he isn't known for his verbosity on many matters. He just percolates and out pops a new compiler possibly with a changes we talk about.
 This seems more like what you mentioned, a desire to make the  
 distinction (empty/exist) dissapear.

I believe that was the original intent.
 If that's the case, then why would you say it's a bug?

In this case my impression is that the real intent was to remove the seg-v problems associated with null strings, remove the need to check for null all the time, etc. That has been achieved, what is great is that at the same time we can preseve the distinction if we so choose (it takes so very little to do this, from the current state)
 If anything, it could only get worse.

Oh ye of little faith!
 # int[0] emptyArray;
 # if (emptyArray) writef("See, I'm empty, yet I exist!");
 // The statement will print.

This is a static array. It's data pointer can never be null, thus it always exists. (Nothing incongruous here)

My friend, that's the very definition of an incongruence.

Whose definition? http://dictionary.reference.com/search?q=incongruous The closest/best definition for this situation appears to be: "Not in keeping with what is correct, proper, or logical; inappropriate: incongruous behavior"
 It means static arrays do not follow the same principles as other kinds  
 (just like strings).

What "principles" are you referring to?
 # int[0] empty;              // Not null.
 # int[ ] empty = new int[0]; // Yes null.

 I even went ahead and _assigned_ an empty array (int[0]) to the  
 reference, and yet it remains _non_ existant. How do you explain that?  
 You can't have a dynamic array that is empty and non-existant, but you  
 _can_ have a static one? (or at least, not via the initializer?)

Aha! This is a new (good) example. I agree in this example shows "incongruous behaviour". I would suggest that "int[0] s;" be an error, as it's pretty meaningless.. Except template programmers would likely be a little annoyed with that. I would suggest that "int[0] s;" have a null data pointer (as the dynamic one does).. But I believe they're implemented in such a way that there is no such data pointer. There seems to be no simple solution to this problem, perhaps Walter has an idea. I'll post to the bugs NG.
 Let's analyze this carefully, and you will definitely see an  
 incongruence:

 # int[] A = null;
 # int[] B = new int[0];

 if (A) // this is false.
 if (B) // this is false.

 Since false == false, then A == B, and therefore null == int[0]. The very
 distinction you are so fond of is gone!

Not true. I suspect "new int[0]" allocates no memory, therefore it _is_ null. This is different to C/C++ which can and do allocate a zero-length item in the heap. This could be a solution to the problem above, if "new int[0]" allocated a zero length item on the heap it would be consistent with the static array case.
 // Let's try it again:
 # int[] emptyArray = new int[0];
 # if (emptyArray) writef("I'm still empty, but non-existant.");
 // The statement will *not* print.

Here you have not allocated any memory, thus nothing exists. (Nothing incongruous here)

Oh, so then it's purely about memory?

In essence, yes. If no memory is allocated it doesn't exist. Exactly like your own C example earlier.
 How very semantic. Nevermind the fact that int[0] means an empty array.

"new int[0]" means allocate an array of 0 int's. 0 * int.sizeof == 0. In other words allocate 0 bytes. I suspect a shortcut is being done where it does no allocation when you ask for 0 bytes. I think perhaps it should allocate a zero-length item on the heap instead.
 The distinction is lost, as shown above. IMHO there's no way around this  
 one.

Sure, there is 1 problem in the static array vs dynamic array example. Lets hope Walter agrees and has/likes the solution.
 // Think about strings:
 # string emptyString = "";
 # if (emptyString) writef("Empty, yet I exist");
 // The statement will *not* print.

Wrong, this statement will print (try it). The reason it prints is that memory _is_ allocated because string constants are C compatible i.e. contain a null terminator. If this was not the case then this would act as the previous example.

"If this was not the case". That's fine, but it happens to _be_ the case. Therefore the docs should state: "There is an incongruence when it comes to string literals. Because we want them to be compatible with C, it means an empty string is not really empty.

It depends how you want to look at it. When I type "" I'm saying here exists a string containing nothing. In other words, it _exists_ but contains _nothing_ it's the very definition of a non-null data pointer with a 0 length.
 In other words, what should have been an empty array is really not.  
 Careful, folks!"

It _is_ empty, it's length is 0. The trailing \0 is effectively outside the length of the array, it exists past the end.
 But what about this:
 # string emptyString = null;
 # if (emptyString) writef("Empty, but now I don't exist");
 // The statement will print.

Wrong, it will not print. The array is null, nothing exists. (Nothing incongruous here)
 Would you say the behaviour I showed above is consistent?


If you agree with the previous statements, you'll concur that the behaviour is not consistent. It calls for exceptions to be made and explained.

As I said above, there are no exceptions in the rule for "if(x)". It simply and always checks the variable 'x' against null or 0. Nothing more, nothing less. You do however need to understand what other statements like the "new int[0]" do, in order to understand how they relate to "if(x)". That doesn't mean there is anything wrong with "if(x)".
 Once more gratuitously: static vs. dynamic, and string literals, and the  
 .length "bug," and the dynamic initializer problem.

Summary: I agree there is a problem with static vs dynamic above. I don't agree that there is anything wrong with the behaviour of "if(x)".
 You don't find it a tad, say, ambiguous?


If you at least agree it's inconsistent, then we are getting somewhere.

The static vs dynamic example above shows inconsistency.
 The ambiguity results in not knowing when which is going to happen.

Specifically with statments like "new int[0]" and "int[0] a" and what exactly _they_ do.
 You don't think people will be confused? I certainly was.

That's because you're asking the wrong questions, and you didn't check your answers.

I did check my answers, and now I know.

Yeah, I didn't see your post correcting it till after I wrote this.
 I made the mistake, and by _chance_ one case didn't work early on, so I  
 started looking under the hood. But how many people will go to their  
 graves with bugs like that still coded? How many bugs like that exist as  
 we speak? Remember, for _most_ cases, it will not show up.

 Tell me this, do you agree with this statement:
 People (mistakedly) may use if (array) to test for the emptiness of an  
 array.

No. My reasoning: 1. Most container classes use a length or size member for this. I haven't seen a single container class/object/thing in any language that lets you check the length or size of an object using "if(x)". 2. The statement "if(x)" is well know to mean check x vs null or 0. If you assume an array is a struct you're writing something meaningless. If you assume an array is a reference you're comparing the reference to null or 0. I cannot see how you would ever think it would silently call ther length member of x.
 What about this:
 Moreover, this test will work most of the time.

Sure. Most of the time you'll have an array with items, thus the data pointer will be non-null.
 And finally:
 The remaining times, they are bugs.

Yes. Assuming: you wrote "if(x)" and meant to check for length>0 then in the case of a non-null data pointer and a 0 length it would execute the code you had written for arrays with a length greater than 0.
 My proposal aims to prevent those bugs.

Sure, only you want to do it in such a way as to break existing code relying on "if(x)". You want to introduce inconsistent behaviour (making arrays behave differently to all other types in D). And lastly the bugs you're referring to are, IMO, unlikely to occur. Essentially you have to generate a zero length non-null array. The 3 ways I know of doing this are: char[0] p; //1 char[] p = ""; //2 char[] tmp = "abc"; char[] p = tmp[0..0]; //3 You'd have to (incorrectly) attempt to compare the length of an array with "if(p)" and the outcome would have to be wrong in a subtle way for this to be a serious problem, a blatant bug is easy to find and you quickly learn not to use "if(p)" to check for length. Most cases I can imagine the non-null zero length array causes no problems, because as Ilya mentioned things like "foreach" treat them the same. This is part of the "treat them the same" that was Walters initial goal and is achieved mostly by array references never being null. In short, I like it how it is, I can't see a significant problem, and I totally dislike your suggested solution. But, like you say thanks for listening to my point of view, it's been fun. (I think we've exhausted our ideas and I don't think we're agreeing) Regan.
Jul 20 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

 Please
 look at
 my examples further below.

It's true.

Praise the lord, agreement. ;)

We're both men of "distinction" ;)

Hehehe. I'll in requiring your testimony in court one day.
In short, I like it how it is, I can't see a significant problem, and I  
totally dislike your suggested solution. But, like you say thanks for  
listening to my point of view, it's been fun. (I think we've exhausted our  
ideas and I don't think we're agreeing)

Yes, I suppose we can agree to disagree. One last couple of things I'd like to clarify, though: My idea is not necessarily to make if (array) check length automatically. This is just one of the three I mentioned. My general suggestion is to improve/clarify and document the behaviour of the construct because I find it dangerous and leading to the subtle bugs I mentioned. You agreed that the bugs can at least happen. It'd be great to know how common they could appear; alas, this wouldn't be easy. However, in all honesty, bugs arising from using assignment as a boolean (if (x = y)) haven't happened to me very much. Maybe once or twice (in years). Yet the construct was made partially illegal, requiring a more explicit version. That's fine with me. It helps prevent those subtle (if seldom) bugs. In addition, IIRC, nowhere on the D site proper is there a mention of what the correct behaviour is supposed to be. I have a feeling Walter left this construct a little unfinished with regards to arrays. Maybe he's working on the empty/null distinction thing and then he will revise it. Anyway, as I've said the lack of documentation doesn't help. And finally: Could you give me a concrete example of a useful application of if (array) to test for the array pointer's nullness? Say, in a complete function? I simply don't think dealing with ptrs (or checking them) should be necessary in D except for C-compat. But perhaps you have a really good use for this construct that I haven't considered. Thanks, --AJG.
Jul 21 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 13:35:36 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 And finally: Could you give me a concrete example of a useful  
 application of if (array) to test for the array pointer's nullness? Say,  
 in a complete function? I simply don't think dealing with ptrs (or  
 checking them) should be necessary in D except for C-compat. But perhaps  
 you have a really good use for this construct that I haven't considered.

Template programming is an example of where we rely on the logical consistency of types to achieve generic things, see: import std.stdio; class A { char[] toString() { return "A"; } } template doWrite(Type) { void doWrite(Type p) { if (p) writef(p); } } alias doWrite!(A) doWriteA; alias doWrite!(char[]) doWriteC; void main() { char[] a = "this is an "; doWriteC(null); doWriteC(a); doWriteA(null); doWriteA(new A()); } Essentially anywhere you expect consistent behaviour of references (string or otherwise) and want to test the reference is not null, i.e. non-existant. Regan
Jul 21 2005
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 20 Jul 2005 23:42:49 +0200, Ilya Minkov wrote:

 My vote is against.
 
 Derek Parnell schrieb:
 Does ...
 
   if (array) ...
 
 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.
 
   if (array.ptr == null) -- test for a non-existence.
 
   if (array.length == 0) -- test for emptiness
 
   if (array) -- test for which?

Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.

Maybe in your world, but not in mine. I have a glass of water. The glass exists and it is not empty. I drink the water. The glass exists and it is empty. I smash the glass. The glass does not exist and it is neither full nor empty because it doesn't exist. To repeat: Existence and Emptiness are not the same concept. And as I've just discovered, 'if (array)' test both the .ptr and the .length properties of the array variable. -- Derek Melbourne, Australia 21/07/2005 10:10:34 AM
Jul 20 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 10:17:02 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 23:42:49 +0200, Ilya Minkov wrote:

 My vote is against.

 Derek Parnell schrieb:
 Does ...

   if (array) ...

 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.

   if (array.ptr == null) -- test for a non-existence.

   if (array.length == 0) -- test for emptiness

   if (array) -- test for which?

Making difference between an empty array and a nonexistent one is flaky, if not directly ambiguous, thus D does not do it, as far as i can remember the statement of Walter. Thus if(array) is not ambiguous.

Maybe in your world, but not in mine. I have a glass of water. The glass exists and it is not empty. I drink the water. The glass exists and it is empty. I smash the glass. The glass does not exist and it is neither full nor empty because it doesn't exist. To repeat: Existence and Emptiness are not the same concept.

You know I agree. ;)
 And as I've just discovered, 'if (array)' test both the .ptr and the
 .length properties of the array variable.

Which is pointless because when the array pointer is null the length cannot be anything but 0. Regan
Jul 20 2005
prev sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Derek Parnell schrieb:
Making difference between an empty array and a nonexistent one is flaky, 
if not directly ambiguous, thus D does not do it, as far as i can 
remember the statement of Walter. Thus if(array) is not ambiguous.

Maybe in your world, but not in mine.

[...]
 To repeat: Existence and Emptiness are not the same concept.

The matter of discussion is not your or my view of the real world, nor some other programming languages' realm. The matter is how arrays are implemented, or should be implemented in D. Considering that D relies on garbage collection heaily with arrays anyway, the construct of an empty, but existant array is unnecessary. I believe that making this distinction, between empty and non-existent arrays, just provides the possibility for another misconception and bug. If someone sees real technical necessity to be able to distinguish between the empty and the non-existing one, is invited to show it here. -eye
Jul 22 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Derek Parnell schrieb:
 Making difference between an empty array and a nonexistent one is  
 flaky, if not directly ambiguous, thus D does not do it, as far as i  
 can remember the statement of Walter. Thus if(array) is not ambiguous.


[...]
 To repeat: Existence and Emptiness are not the same concept.

The matter of discussion is not your or my view of the real world, nor some other programming languages' realm. The matter is how arrays are implemented, or should be implemented in D.

Sure, however D exists in the real world. Programmers solve real world problems. IMO arrays should be implemented in D in a manner that best allows us to do that.
 Considering that D relies on garbage collection heaily with arrays  
 anyway, the construct of an empty, but existant array is unnecessary.

I don't see your point. The concept of existance, non-existance, empty, not-empty still exists with garbage collection as much as any other memory management sceme. Garbage collection does not obviate the need to express non-existance, exists but empty, exists and not empty.
 I believe that making this distinction, between empty and non-existent  
 arrays, just provides the possibility for another misconception and bug.

You're correct in one respect, having the ability to express more i.e. non-existance, exists but empty, exists and not empty adds complexity increasing the chance that someone will mistakenly use one when they mean the other. However, as a concrete example a very common bug in C/C++ is referencing a null pointer (a pointer is a good example of a type which can represent non-existance, exists but empty, exists and not empty). Arrays in D do not share this problem, the array reference cannot be null. At the same time, the current array implementation retains the expressiveness that allows you to represent non-existance, exists but empty, exists and not empty. My point is that D's arrays have the expressiveness without the complexity, you can ignore the non-existance case unless you want/need to consider it.
 If someone sees real technical necessity to be able to distinguish  
 between the empty and the non-existing one, is invited to show it here.

I'm not sure there is a "necessity" as in most cases you could probably "work around" the restriction (if it was added to D). Here is an example where the expressiveness of representing non-existance, exists but empty, exists and not empty is useful. This comment was posted to the DMDScript NG recently: <quote> For example, might it not be useful to return 'null' on EOF, thus allowing this sort of construct: var line = readln(); while (line != null) { ... line = readln(); } </quote> Of course you could implement this in another way, removing the need for the ability to represent non-existance. You would have to if your type couldn't represent non-existance, that is the price you pay for simplicity. The current price paid for the current array's expressiveness is very little IMO. Regan
Jul 23 2005
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Regan Heath schrieb:
 Considering that D relies on garbage collection heaily with arrays  
 anyway, the construct of an empty, but existant array is unnecessary.

I don't see your point. The concept of existance, non-existance, empty, not-empty still exists with garbage collection as much as any other memory management sceme. Garbage collection does not obviate the need to express non-existance, exists but empty, exists and not empty.

In C it was extremely important, and one had to keep one's eye on uniqueness. At every allocation, one was to think about how to "anchor" and where to free this value, and not forget to implement freeing. Naturally, C++ automated this process somewhat. In C, the non-existance versus emtyness was sometimes very important.
 I believe that making this distinction, between empty and 
 non-existent  arrays, just provides the possibility for another 
 misconception and bug.

You're correct in one respect, having the ability to express more i.e. non-existance, exists but empty, exists and not empty adds complexity increasing the chance that someone will mistakenly use one when they mean the other. However, as a concrete example a very common bug in C/C++ is referencing a null pointer (a pointer is a good example of a type which can represent non-existance, exists but empty, exists and not empty).

There is a problem with "exists but empty". What does malloc do when you request 0 bytes? As far as i can remember, the standard allows 2 options: the implementation can return NULL, or it could return a tiny region of memory - still not "nothing". What will it contain? My bet would be "uninitialized space". This is garbage which was in the memory before it was allocated, and might be zero, or might be anything else. So, in C there is no other way than to embed the information on the non-existance into your data structure. In the case of strings, this is a string having '\0' character at the very beginning. One could suggest to preallocate one data structure which will be stored globally as "the empty singleton", and when one wants to distinguish, do a pointer comparison, similarly to the null handling. However, in C it might be bad for finding a memory management solution (as we in fact deal not with a special inaccessible adress in memory, but a living object), while in D the solution is, apart from special cases, simply to copy and forget, and make the GC do the dirty work.
 Arrays in D do not share this problem, the array reference cannot be 
 null.  At the same time, the current array implementation retains the  
 expressiveness that allows you to represent non-existance, exists but  
 empty, exists and not empty.

What do you mean by can't be null?
 If someone sees real technical necessity to be able to distinguish  
 between the empty and the non-existing one, is invited to show it here.

I'm not sure there is a "necessity" as in most cases you could probably "work around" the restriction (if it was added to D). Here is an example where the expressiveness of representing non-existance, exists but empty, exists and not empty is useful.

Necessity is a fuzzy value which is probably best destinguished by the heavyness of workaround.
 This comment was posted to the DMDScript NG recently:
 
 <quote>
 For example, might it not be useful to return 'null' on EOF, thus allowing
 this sort of construct:
 
     var line = readln();
 
     while (line != null)
     {
          ...
          line = readln();
     }
 </quote>

As above, i think preallocated EOL line would do, as long as array comparison (done on pointer and length) is a simple operation.
 Of course you could implement this in another way, removing the need 
 for  the ability to represent non-existance. You would have to if your 
 type  couldn't represent non-existance, that is the price you pay for  
 simplicity. The current price paid for the current array's 
 expressiveness  is very little IMO.

Ok, given we still have the ability to manipulate the pointer and the length separately, how should array conversion to boolean condition be defined then? Should it query the pointer, the length, or some combination of both? If length is zero, one obviously cannot iterate over it. If pointer is null, the length should be invariably zero? -eye
Jul 23 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Sat, 23 Jul 2005 22:13:24 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Regan Heath schrieb:
 Considering that D relies on garbage collection heaily with arrays   
 anyway, the construct of an empty, but existant array is unnecessary.

empty, not-empty still exists with garbage collection as much as any other memory management sceme. Garbage collection does not obviate the need to express non-existance, exists but empty, exists and not empty.

In C it was extremely important, and one had to keep one's eye on uniqueness. At every allocation, one was to think about how to "anchor" and where to free this value, and not forget to implement freeing. Naturally, C++ automated this process somewhat. In C, the non-existance versus emtyness was sometimes very important.

Sure, memory management makes things complicated. But, uniqueness has nothing to do with non-existance. The fact that non-existance is typically represented by null is the same regardless of memory management model.
 I believe that making this distinction, between empty and  
 non-existent  arrays, just provides the possibility for another  
 misconception and bug.

i.e. non-existance, exists but empty, exists and not empty adds complexity increasing the chance that someone will mistakenly use one when they mean the other. However, as a concrete example a very common bug in C/C++ is referencing a null pointer (a pointer is a good example of a type which can represent non-existance, exists but empty, exists and not empty).

There is a problem with "exists but empty". What does malloc do when you request 0 bytes?

Allocates a zero length item on the heap. (I checked this recently).
 As far as i can remember, the standard allows 2 options: the  
 implementation can return NULL, or it could return a tiny region of  
 memory - still not "nothing". What will it contain? My bet would be  
 "uninitialized space". This is garbage which was in the memory before it  
 was allocated, and might be zero, or might be anything else.

 So, in C there is no other way than to embed the information on the  
 non-existance into your data structure.

No, you simply use null. A non-existant string in C is a null pointer. An empty string in C is a non-null pointer which contains a \0 as the first character. The same applies to any other object. A null pointer indicates non-existance, and emptiness is represented in whatever fashion makes sense for the object i.e. a length property set to 0.
 In the case of strings, this is a string having '\0' character at the  
 very beginning.

No, that is an "empty" string, not a "non-existant" one.
 One could suggest to preallocate one data structure which will be stored  
 globally as "the empty singleton", and when one wants to distinguish, do  
 a pointer comparison, similarly to the null handling. However, in C it  
 might be bad for finding a memory management solution (as we in fact  
 deal not with a special inaccessible adress in memory, but a living  
 object), while in D the solution is, apart from special cases, simply to  
 copy and forget, and make the GC do the dirty work.

None of this is necessary.
 Arrays in D do not share this problem, the array reference cannot be  
 null.  At the same time, the current array implementation retains the   
 expressiveness that allows you to represent non-existance, exists but   
 empty, exists and not empty.

What do you mean by can't be null?

char[] p = null; if (p.length == 0) { //does not crash, p itself is never 'null' }
 If someone sees real technical necessity to be able to distinguish   
 between the empty and the non-existing one, is invited to show it here.

probably "work around" the restriction (if it was added to D). Here is an example where the expressiveness of representing non-existance, exists but empty, exists and not empty is useful.

Necessity is a fuzzy value which is probably best destinguished by the heavyness of workaround.

Exactly. However the other thing to consider is the price paid for it, if that price is smaller than the cost (as I believe it is in this case) then it is a point in it's favour. You then factor in all the other issues, complexity of implementation, etc.
 This comment was posted to the DMDScript NG recently:
  <quote>
 For example, might it not be useful to return 'null' on EOF, thus  
 allowing
 this sort of construct:
      var line = readln();
      while (line != null)
     {
          ...
          line = readln();
     }
 </quote>

As above, i think preallocated EOL line would do, as long as array comparison (done on pointer and length) is a simple operation.
 Of course you could implement this in another way, removing the need  
 for  the ability to represent non-existance. You would have to if your  
 type  couldn't represent non-existance, that is the price you pay for   
 simplicity. The current price paid for the current array's  
 expressiveness  is very little IMO.

Ok, given we still have the ability to manipulate the pointer and the length separately, how should array conversion to boolean condition be defined then?

The same way it works for every other type in D, the statement "if(x)" means "compare x to null or 0". In the case of a reference it compares the reference to null. The confusion arises in this case because arrays in D cannot be null, and because arrays are in fact implemented as stack based structs in the background. This makes arrays appear to be a struct and not a reference, however currently in all (I believe) situations they behave as references. I believe this was done on purpose. As I've noted in all cases where an array reference would be null, i.e. char[] p = null; it isn't, but instead the data pointer p.ptr is null. So, in order for them to behave as references it's logically consistent for "if(p)" to check the data ptr vs null. Change that and you need to code special cases for arrays vs other reference types, eg. template doWrite(Type) { void doWrite(Type p) { if (p) writefln(p); } class C { char[] toString() { return "C"; } } char[] p = "test"; C c = new C(); doWrite!(char[])(p); doWrite!(char[])(c);
 Should it query the pointer, the length, or some combination of both?

The ptr, for reasons given above. Checking both is a waste of time as when the pointer is null the length must be 0 (as you say below).
 If length is zero, one obviously cannot iterate over it.

Correct. One cannot iterate over an empty or a non-existant array.
 If pointer is null, the length should be invariably zero?

Indeed. It is currently. Regan
Jul 23 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 11:55:35 +1200, Regan Heath <regan netwin.co.nz> wrote:
 So, in order for them to behave as references it's logically consistent  
 for "if(p)" to check the data ptr vs null. Change that and you need to  
 code special cases for arrays vs other reference types, eg.

 template doWrite(Type) { void doWrite(Type p) {
    if (p) writefln(p);
 }

 class C {
    char[] toString() { return "C"; }
 }

 char[] p = "test";
 C c = new C();

 doWrite!(char[])(p);

TYPO:
 doWrite!(char[])(c);

Should be: doWrite!(C)(c); Regan
Jul 23 2005
prev sibling next sibling parent reply Holger <Holger_member pathlink.com> writes:
In article <opsud4qxii23k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Sat, 23 Jul 2005 22:13:24 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 So, in C there is no other way than to embed the information on the  
 non-existance into your data structure.

No, you simply use null. A non-existant string in C is a null pointer. An empty string in C is a non-null pointer which contains a \0 as the first character. The same applies to any other object. A null pointer indicates non-existance, and emptiness is represented in whatever fashion makes sense for the object i.e. a length property set to 0.
 In the case of strings, this is a string having '\0' character at the  
 very beginning.

No, that is an "empty" string, not a "non-existant" one.

Hi Regan, you're of course spot-on. It's not the first time that someone expressed misguided perceptions of "not existant" vs "empty" in the C language here. I really wonder how often we'll need to discuss such basic C-isms on the D NG? People should better learn their stuff before making such bold statements. Cheers, Holger
Jul 23 2005
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of "not
 existant" vs "empty" in the C language here. I really wonder how often we'll
 need to discuss such basic C-isms on the D NG? People should better learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read through the beginnings of the preceding newsgroup, know me, know that, although he pretty much popped up shortly before i disapperared. And, i really wonder why anyone would have to listen to someone who neither leaves his complete name nor a real e-mail adress, who can just drop a bomb a disappear. You are free to google for my name, it is not that common, and make yourself a picture. -i.
Jul 24 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 20:21:30 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of  
 "not
 existant" vs "empty" in the C language here. I really wonder how often  
 we'll
 need to discuss such basic C-isms on the D NG? People should better  
 learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read through the beginnings of the preceding newsgroup, know me

I do, in fact, "know you" well enough to have thought as I typed my reply that you must have simply made a mistake. Nor, did I make the above statements, so, I really have no idea why you'd react in such a way towards *me*?
 , know that, although he pretty much popped up shortly before i  
 disapperared. And, i really wonder why anyone would have to listen to  
 someone who neither leaves his complete name nor a real e-mail adress,  
 who can just drop a bomb a disappear.

Are you referring to me? I'm using my complete name and my real email address.
 You are free to google for my name, it is not that common, and make  
 yourself a picture.

You can google mine as well 7 of the top 10 are me. Regan
Jul 24 2005
parent reply Holger <Holger_member pathlink.com> writes:
In article <opsufqj2rn23k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Sun, 24 Jul 2005 20:21:30 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of  
 "not
 existant" vs "empty" in the C language here. I really wonder how often  
 we'll
 need to discuss such basic C-isms on the D NG? People should better  
 learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read through the beginnings of the preceding newsgroup, know me

I do, in fact, "know you" well enough to have thought as I typed my reply that you must have simply made a mistake. Nor, did I make the above statements, so, I really have no idea why you'd react in such a way towards *me*?
 , know that, although he pretty much popped up shortly before i  
 disapperared. And, i really wonder why anyone would have to listen to  
 someone who neither leaves his complete name nor a real e-mail adress,  
 who can just drop a bomb a disappear.

Are you referring to me? I'm using my complete name and my real email address.
 You are free to google for my name, it is not that common, and make  
 yourself a picture.

You can google mine as well 7 of the top 10 are me. Regan

Regan, calm down please. It's me, Holger, that is the hooligan here! Again, I apologize for my tone. Cheers, Holger
Jul 24 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 21:37:09 +0000 (UTC), Holger  
<Holger_member pathlink.com> wrote:
 In article <opsufqj2rn23k2f5 nrage.netwin.co.nz>, Regan Heath says...
 On Sun, 24 Jul 2005 20:21:30 +0200, Ilya Minkov <minkov cs.tum.edu>  
 wrote:
 Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions  
 of
 "not
 existant" vs "empty" in the C language here. I really wonder how often
 we'll
 need to discuss such basic C-isms on the D NG? People should better
 learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read through the beginnings of the preceding newsgroup, know me

I do, in fact, "know you" well enough to have thought as I typed my reply that you must have simply made a mistake. Nor, did I make the above statements, so, I really have no idea why you'd react in such a way towards *me*?
 , know that, although he pretty much popped up shortly before i
 disapperared. And, i really wonder why anyone would have to listen to
 someone who neither leaves his complete name nor a real e-mail adress,
 who can just drop a bomb a disappear.

Are you referring to me? I'm using my complete name and my real email address.
 You are free to google for my name, it is not that common, and make
 yourself a picture.

You can google mine as well 7 of the top 10 are me. Regan

Regan, calm down please. It's me, Holger, that is the hooligan here!

You are right. ;) I am calm, I did not intend for my comments above to sound angry.
 Again, I apologize for my tone.

We're all adults here, you reply shows as much (no condescention meant/implied). Regan
Jul 24 2005
prev sibling parent Holger <Holger_member pathlink.com> writes:
In article <42E3DC2A.1040406 cs.tum.edu>, Ilya Minkov says...
Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of "not
 existant" vs "empty" in the C language here. I really wonder how often we'll
 need to discuss such basic C-isms on the D NG? People should better learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read through the beginnings of the preceding newsgroup, know me, know that, although he pretty much popped up shortly before i disapperared. And, i really wonder why anyone would have to listen to someone who neither leaves his complete name nor a real e-mail adress, who can just drop a bomb a disappear. You are free to google for my name, it is not that common, and make yourself a picture. -i.

Good answer Ilya, you hit the mark. However, my philippic wasn't specifically addressed at you. It's just that this particular misconception has popped up quite a few times in the past and I felt annoyed. Anyway, I apologize for being caustically. I didn't mean to question you personal abilities. Sorry ... Still, Holger
Jul 24 2005
prev sibling next sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Regan Heath schrieb:
 Sure, memory management makes things complicated. But, uniqueness has  
 nothing to do with non-existance. The fact that non-existance is 
 typically  represented by null is the same regardless of memory 
 management model.

And the emptyness?
 There is a problem with "exists but empty". What does malloc do when 
 you  request 0 bytes?

Allocates a zero length item on the heap. (I checked this recently).

How? If this was so, it would break the promise that malloc never returns the same adress unless this memory was returned by free. So, my bet would be that it returns a byte or a word of memory, which basically doesn't matter since you may not dereference it anyway. Does it segfault or throw when you access the first byte? Which implementation did you check? It is still possible that other implementations do it differently. I just checked DMC and Cygwin-GCC, and both returned at least 8 bytes.
 So, in C there is no other way than to embed the information on the  
 non-existance into your data structure.


ARRGH. I obviously meant the emptyness, probably was very tired when i wrote the message. And you, you should know that i have posts in the digitalmars newsgroups dating as early as two and a half years back, and if you ever looked at them you would know that you don't have to explain such stuff to me, but that i just mixed up the words. The current newsgroup is just too much of a bunch of people not able nor willing to take time for each other and the subject, which was why i basically left. I think i was quite right about it and i don't think i'd like to show here up ever again.
 No, you simply use null. A non-existant string in C is a null pointer. 
 An  empty string in C is a non-null pointer which contains a \0 as the 
 first  character. The same applies to any other object. A null pointer 
 indicates  non-existance, and emptiness is represented in whatever 
 fashion makes  sense for the object i.e. a length property set to 0.

Blah. True.
 In the case of strings, this is a string having '\0' character at the  
 very beginning.

No, that is an "empty" string, not a "non-existant" one.

I know dammit.
 None of this is necessary.

Now, if you like to explain *how* exactly you want to distinguish the empty something by pointer? You (a) embed the information into the target, or (b) you make an empty singleton and distinguish it by pointer comparison. Please note that we're not talking about the D arrays here, it's about your statement: "a pointer is a good example of a type which can represent non-existance, exists but empty, exists and not empty" and i'm waiting for you to either see your mistake or show me exactly how.
 What do you mean by can't be null?

char[] p = null; if (p.length == 0) { //does not crash, p itself is never 'null' }

Ok, if you like to see it so. I think i would call p "null" if i cannot dereference any element out of it. Just like a null pointer is something you cannot dereference, and where you can distinguish that by looking at the representation of the pointer, lust like you can by looking at the fields of an array slice. Whether you get a specialized exception or a general memory protection fault if you try to dereference it nontheless, is implementation detail. BTW, to add more to this similarity, you can catch the exception resulting from dereferencing a null pointer, just as you can catch the one resulting from dereferencing an element from the null array.
 Ok, given we still have the ability to manipulate the pointer and the  
 length separately, how should array conversion to boolean condition 
 be  defined then?

The same way it works for every other type in D, the statement "if(x)" means "compare x to null or 0". In the case of a reference it compares the reference to null.

 The confusion arises in this case because arrays in D cannot be null, 
 and  because arrays are in fact implemented as stack based structs in 
 the  background. This makes arrays appear to be a struct and not a 
 reference,  however currently in all (I believe) situations they behave 
 as references.  I believe this was done on purpose.

I still cannot quite grasp the statement "arrays cannot be null". :) Yes, your writing looks very confused to me. :) It is quite correct to think of array (or, perhaps more correctly slice) as a value struct. However, i don't see how it would "behave as a reference" as opposed to the pointer... An array references a bunch of objects, which you don't access directly, but you "dereference" a certain element of an array by using operator[], similarly like operator* is used to dereference a single pointer. Though i'm not that sure whether the distinction between pointers and references needs to be kept upright. It is only of syntactical nature, but so many sorts of pointers in D do some sort of syntactical forwarding to their target - the function pointer in the same way as in C, but also a struct pointer makes forwarding of the dot '.' operator.
 As I've noted in all cases where an array reference would be null, i.e.
 
 char[] p = null;
 
 it isn't, but instead the data pointer p.ptr is null.

...
 So, in order for them to behave as references it's logically consistent  
 for "if(p)" to check the data ptr vs null. Change that and you need to  
 code special cases for arrays vs other reference types, eg.
 
 template doWrite(Type) { void doWrite(Type p) {
   if (p) writefln(p);
 }

 
 class C {
   char[] toString() { return "C"; }
 }
 
 char[] p = "test";
 C c = new C();
 
 doWrite!(char[])(p);

This just works. I don't see any *specific* problem with templates if array is defined to always be null when it is empty, only the *general* tiny loss of freedom we just discussed. There is another thing that comes to my mind: when you do a lot of sclicing and the slices get nulled-out as soon as you cannot reference any element through them, it can make the garbage collector reclaim the memory sooner. -eye
Jul 24 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 22:09:26 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Regan Heath schrieb:
 Sure, memory management makes things complicated. But, uniqueness has   
 nothing to do with non-existance. The fact that non-existance is  
 typically  represented by null is the same regardless of memory  
 management model.

And the emptyness?
 There is a problem with "exists but empty". What does malloc do when  
 you  request 0 bytes?


How?

I have no idea, I am just repeating what the MSDN documentation says ;) I wrote some C to test it, and malloc(0) and new char[0] both return non-null. That is all I tried.
 If this was so, it would break the promise that malloc never returns the  
 same adress unless this memory was returned by free. So, my bet would be  
 that it returns a byte or a word of memory, which basically doesn't  
 matter since you may not dereference it anyway. Does it segfault or  
 throw when you access the first byte?

Not sure. I didn't check it. I suspect your guesses are correct.
 Which implementation did you check? It is still possible that other  
 implementations do it differently. I just checked DMC and Cygwin-GCC,  
 and both returned at least 8 bytes.

I tested only the M$ compiler that comes with Visual Studio 6.0. I read only the MSDN documentation. It appears all implementations tested so far (mine and yours above) return something on a malloc of 0, rather than nothing.
 So, in C there is no other way than to embed the information on the   
 non-existance into your data structure.


ARRGH. I obviously meant the emptyness, probably was very tired when i wrote the message.

No problem. I half suspected as much.
 And you, you should know that i have posts in the digitalmars newsgroups  
 dating as early as two and a half years back, and if you ever looked at  
 them you would know that you don't have to explain such stuff to me, but  
 that i just mixed up the words.

I could not be certain it was a mistake. I replied not to correct *you* but to correct the *statements* so another would not read them and take them as truth. My reply was as much to you as to the newsgroup as a whole, so while you may not need the explainations, others might. My comments were never intended as personal criticism.
 The current newsgroup is just too much of a bunch of people not able nor  
 willing to take time for each other and the subject, which was why i  
 basically left. I think i was quite right about it and i don't think i'd  
 like to show here up ever again.

I'm sorry you feel that way. I'm here because I like to discuss these sorts of things in my spare time. (can anyone say 'geek'). Regardless of the quality(*) of the replies I get they all influence and refine _my_ opinion, giving me a clearer idea of what I'm talking about and all the issues involved. In short, it's fun and it's self improvement. (*) I'll leave that to the reader to define.
 None of this is necessary.

Now, if you like to explain *how* exactly you want to distinguish the empty something by pointer?

I wasn't suggesting you could. I thought you were confusing "empty" and "non-existant" at this stage and framed my reply with that in mind, essentially I thought you were suggesting you needed a singleton to represent non-existant, not empty.
 You (a) embed the information into the target, or (b) you make an empty  
 singleton and distinguish it by pointer comparison.

You're correct.
 Please note that we're not talking about the D arrays here, it's about  
 your statement: "a pointer is a good example of a type which can  
 represent  non-existance, exists but empty, exists and not empty" and  
 i'm waiting for you to either see your mistake or show me exactly how.

I see the point you're making, and you are correct. char *a = null; char *b = ""; char *c = "test"; a is non-existant. b exists but is empty c exists and is not empty. It is not technically the pointer representing all 3, it only really represents exists, or not. Whether it is empty or not is represented by the data. All I meant by my statement is that with a pointer you can do all 3. Compare that to 'int' which can only do non existance by using one of it's _values_ for non existance (this is limiting, and sort of illogical IMO - using a value to represent non-existance).
 What do you mean by can't be null?

if (p.length == 0) { //does not crash, p itself is never 'null' }

Ok, if you like to see it so. I think i would call p "null" if i cannot dereference any element out of it.

I would call it "null" if it was equal to "null". eg. if (p is null) {}
 Just like a null pointer is something you cannot dereference, and where  
 you can distinguish that by looking at the representation of the  
 pointer, lust like you can by looking at the fields of an array slice.

There is never a problem de-referencing an array reference itself, eg. char[] p = null; if (p.length) { //never crashes, so by your definition 'p' is never null, right? } There _is_ a problem referencing data outside an array, but I don't think that's the same thing, eg. char[] p = "one"; p[4] = 'a'; //crash (array bounds error) this is dereferencing the data pointer (which doesn't crash) by an offset larger than the array (which is what crashes).
 Whether you get a specialized exception or a general memory protection  
 fault if you try to dereference it nontheless, is implementation detail.

True.
 BTW, to add more to this similarity, you can catch the exception  
 resulting from dereferencing a null pointer, just as you can catch the  
 one resulting from dereferencing an element from the null array.

True.
 Ok, given we still have the ability to manipulate the pointer and the   
 length separately, how should array conversion to boolean condition  
 be  defined then?

"if(x)" means "compare x to null or 0". In the case of a reference it compares the reference to null.

 The confusion arises in this case because arrays in D cannot be null,  
 and  because arrays are in fact implemented as stack based structs in  
 the  background. This makes arrays appear to be a struct and not a  
 reference,  however currently in all (I believe) situations they behave  
 as references.  I believe this was done on purpose.

I still cannot quite grasp the statement "arrays cannot be null". :)

See above for my meaning. The array reference itself is never "null" (and can always be dereferenced).
 Yes, your writing looks very confused to me. :) It is quite correct to  
 think of array (or, perhaps more correctly slice) as a value struct.

I disagree. I believe the intention is that we think of them as references. The documentation calls them references, their behaviour is orthogonal with other references. I believe Walter made them behave as references on purpose. They are however implemented as a stack based struct. Derek has shown that.
 However, i don't see how it would "behave as a reference" as opposed to  
 the pointer... An array references a bunch of objects

I'd say an array 'contains' a bunch of objects, but that's beside the point. I'm referring to the array variable itself, eg. char[] p; 'p' behaves like a reference, just as: class C {} C c; 'c' behaves like a reference.
 , which you don't access directly, but you "dereference" a certain  
 element of an array by using operator[], similarly like operator* is  
 used to dereference a single pointer.

You access it directly when you use [] or any other property or member of it. You de-reference the array reference in order to use these properties. In the case of [] you dereference the data pointer in the array reference by an offset.
 Though i'm not that sure whether the distinction between pointers and  
 references needs to be kept upright. It is only of syntactical nature,  
 but so many sorts of pointers in D do some sort of syntactical  
 forwarding to their target - the function pointer in the same way as in  
 C, but also a struct pointer makes forwarding of the dot '.' operator.

Indeed.. when I think about pointers and references I find myself thinking of them as the "same thing". A reference is perhaps just a specific type of pointer. I mean, a reference is a pointer to an object, but a pointer is more general, you can have a pointer to a pointer to a ... to an object. You can create pointers to any other type.
 As I've noted in all cases where an array reference would be null, i.e.
  char[] p = null;
  it isn't, but instead the data pointer p.ptr is null.

...
 So, in order for them to behave as references it's logically  
 consistent  for "if(p)" to check the data ptr vs null. Change that and  
 you need to  code special cases for arrays vs other reference types, eg.
  template doWrite(Type) { void doWrite(Type p) {
   if (p) writefln(p);
 }

  class C {
   char[] toString() { return "C"; }
 }
  char[] p = "test";
 C c = new C();
  doWrite!(char[])(p);

This just works. I don't see any *specific* problem with templates if array is defined to always be null when it is empty, only the *general* tiny loss of freedom we just discussed.

"general tiny loss of freedom"? you mean the ability to express non-existance?
 There is another thing that comes to my mind: when you do a lot of  
 sclicing and the slices get nulled-out as soon as you cannot reference  
 any element through them, it can make the garbage collector reclaim the  
 memory sooner.

I don't see how this is relevant, sorry. Regan
Jul 24 2005
prev sibling parent reply James McComb <ned jamesmccomb.id.au> writes:
Regan Heath wrote:

 What do you mean by can't be null?

char[] p = null; if (p.length == 0) { //does not crash, p itself is never 'null' }

Okay... I obviously don't get D strings because this seems wildly counter-intuitive to me. Sure if p CANNOT be null, the line char[] p = null; // Surely this means: set p to null should fail to compile or throw an exception or something? If D strings truly couldn't be null I would expect something like: // Declare p // p is initialized to empty (not null - it can never be null) char[] p; // Test for empty // A test for null makes no sense - p can never be null if (!p) {} What am I missing? James McComb
Jul 24 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 25 Jul 2005 08:55:09 +1000, James McComb <ned jamesmccomb.id.au>  
wrote:
 Regan Heath wrote:

 What do you mean by can't be null?

if (p.length == 0) { //does not crash, p itself is never 'null' }

Okay... I obviously don't get D strings because this seems wildly counter-intuitive to me. Sure if p CANNOT be null, the line char[] p = null; // Surely this means: set p to null should fail to compile or throw an exception or something?

I understand where you're going with this, the important fact here is that though an array reference itself cannot be null it still behaves like any other reference set to null(*). eg. char[] p = null; if (p is null) { //true } (*) with the exception that you can de-reference an array that has been set to null, eg. char[] p = null; if (p.length == 0) { //does not crash } Try that with another reference type, eg a class with a length member. Try that with a struct (value type), you'll find you cannot assign null. The problem is perhaps trying to think of an array as "strictly" a reference type or "strictly" a value type. It's not, it's a mix of the two, an attempt by Walter (successful IMO) to get the best performance possible, passed by reference, stored on the stack, nice syntax, etc. I believe you should think of arrays as references with special abilities like overloaded = and [] operators, for example. If we had the ability to define an opEquals for classes I believe we could emulate arrays with classes. (I admit, I haven't tried, nor considered it thoughroughly) That said, you could perhaps equally argue that you should think of them as structs which are passed by reference, can be set to null, can be compared to null, .. I think of them as references, it makes more sense to me.
 If D strings truly couldn't be null I would expect something like:

 // Declare p
 // p is initialized to empty (not null - it can never be null)
 char[] p;

 // Test for empty
 // A test for null makes no sense - p can never be null
 if (!p) {}

The distinction is perhaps fine: "strings" can be non-existant. non-existant is typically represented with null. "array references" cannot be null. Yet, arrays can represent non-existant. A contradiction? No, those statements are not directly contradictory because: - null is not the only way to represent non-existance (beside the point in this case) - the object (arrays) need not be null in all situations. In our case point 2 is the relevant one. Arrays are null only when they need to be in order to behave like other reference types. Arrays are not null when you try to dereference them.
 What am I missing?

Maybe nothing, maybe something, who am I to say. I can only offer my opinion above. Regan
Jul 24 2005
parent reply James McComb <ned jamesmccomb.id.au> writes:
Regan Heath wrote:

 ... the important fact here is 
 that  though an array reference itself cannot be null it still behaves 
 like any  other reference set to null ...
 with the exception that you can de-reference an array that has been  
 set to null ...

Thanks for the examples. I understand what you mean, now. Considering that these lines are valid: char[] p = null; if (p is null) { //true } ...it confused me that you say p cannot be null. I know what you mean now, though. :) James McComb
Jul 24 2005
parent Derek Parnell <derek psych.ward> writes:
On Mon, 25 Jul 2005 11:33:58 +1000, James McComb wrote:

 Regan Heath wrote:
 
 ... the important fact here is 
 that  though an array reference itself cannot be null it still behaves 
 like any  other reference set to null ...
 with the exception that you can de-reference an array that has been  
 set to null ...

Thanks for the examples. I understand what you mean, now. Considering that these lines are valid: char[] p = null; if (p is null) { //true } ...it confused me that you say p cannot be null. I know what you mean now, though. :)

These lines are equivalent to ... char[] p; p.ptr = null; p.length = 0; if (p.ptr == null or p.length == 0) { //true } I know Regan talks about an 'array reference' as an abstract concept but I just find it easier to think of it in terms of how it is actually implemented, namely a two field struct. To me, I keep thinking of a 'reference' as the address of a pointer - but that's just me ;-) For example, '&p' returns the address of the array reference (struct) and you can manipulate it directly (at your peril, of course). -- Derek Melbourne, Australia 25/07/2005 12:56:21 PM
Jul 24 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

In article <dc1689$h1q$1 digitaldaemon.com>, James McComb says...
Regan Heath wrote:

 What do you mean by can't be null?

char[] p = null; if (p.length == 0) { //does not crash, p itself is never 'null' }

Okay... I obviously don't get D strings because this seems wildly counter-intuitive to me. Sure if p CANNOT be null, the line char[] p = null; // Surely this means: set p to null should fail to compile or throw an exception or something?

You are right, it is a little counter-intuitive. Then you say array = null you are sort of talking about the array's pointer, not the reference itself. I suspect one of two things happens internally: 1) The " = null" is simply ignored by the compiler (for efficiency). So char[] p = null; becomes: char[] p; 2) A *new* reference (null pointer, 0-length) is created, in the style of function parameters, and assigned. In other words, it does whatever it does when you pass "null" to a function expecting an array.
If D strings truly couldn't be null I would expect something like:

// Declare p
// p is initialized to empty (not null - it can never be null)
char[] p;

// Test for empty
// A test for null makes no sense - p can never be null
if (!p) {}

What am I missing?

That's one of the original points of this thread. I wanted D to outlaw that test precisely because of the confusion. It didn't make sense to me either. You can think of "if (!p)" as "if (!p.ptr)". It is _not_ a test for emptiness. Therefore, I'm against this automatic conversion, for what it's worth. Cheers, --AJG.
Jul 24 2005
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
AJG schrieb:
 You are right, it is a little counter-intuitive. Then you say array =
  null you are sort of talking about the array's pointer, not the 
 reference itself. I suspect one of two things happens internally:

 2) A *new* reference (null pointer, 0-length) is created, in the 
 style of function parameters, and assigned. In other words, it does 
 whatever it does when you pass "null" to a function expecting an 
 array.

You cannot say "new" because the array (slice) does not have the reference sematics by itself, and does not get allocated. It is stored inline - that is, wherever you mention it - on stack, within a class, etc, as a *value*. The statement "char[]p = null" declares an array slice on the stack (the LHS), and at the same time assigns null to its pointer field, and 0 to its length field. The slice has a reference semantics with respect to its referred elements, i.e. you can manipulate the elements directly and the changes propagate outside. However, it does not have a reference semantics in respect to its fields: void manipulate(char[] blah) { assert(blah.length > 1); blah[0] = 'x'; //This change propagates blah.length = blah.length-1; //Above change does not propagate blah[0] = 'y'; //This change propagates blah~="z"; //Above statement decouples completely //Now no changes will ever propagate, //whatever you do to the poor array! } I happen to like the strange semantics, but this is in strong contrast to a string& or vector<blah>& in C++ or the Java array.
 1) The " = null" is simply ignored by the compiler (for efficiency). 
 So char[] p = null; becomes: char[] p;

I don't see any special provision for efficiency, just that an array slice has a default value tuple (null,0), and the same gets assigned to it by an assignment of null. -eye
Jul 27 2005
parent AJG <AJG_member pathlink.com> writes:
In article <dc86pk$2rta$1 digitaldaemon.com>, Ilya Minkov says...
AJG schrieb:
 You are right, it is a little counter-intuitive. Then you say array =
  null you are sort of talking about the array's pointer, not the 
 reference itself. I suspect one of two things happens internally:

 2) A *new* reference (null pointer, 0-length) is created, in the 
 style of function parameters, and assigned. In other words, it does 
 whatever it does when you pass "null" to a function expecting an 
 array.

You cannot say "new" because the array (slice) does not have the reference sematics by itself, and does not get allocated. It is stored inline - that is, wherever you mention it - on stack, within a class, etc, as a *value*.

I meant at compile-time, not in the "new int[5]" sense.
The slice has a reference semantics with respect to its referred
elements, i.e. you can manipulate the elements directly and the changes 
propagate outside. However, it does not have a reference semantics in 
respect to its fields:

I had been wondering how to define reference semantics for a while, and I think your definition is spot-on.
I happen to like the strange semantics, but this is in strong contrast
to a string& or vector<blah>& in C++ or the Java array.

I don't like the strange array field semantics. I ran into a particularly nasty length problem a couple of days ago, and life would have been much simpler if fields worked just like contents. My solution was to use an array-reference pointer. E.g. # alias char[] string; # string original; // = whatever. # string* trueReference = &original; # trueReference.length = 5; // This does work properly.
 1) The " = null" is simply ignored by the compiler (for efficiency). 
 So char[] p = null; becomes: char[] p;

I don't see any special provision for efficiency, just that an array slice has a default value tuple (null,0), and the same gets assigned to it by an assignment of null.

What I meant is that char[] p; is already initialized to null. Doing an additional = null is redundant and thus eliminating it would be a small efficiency gain (again, to the compiler, not the runtime). ---- While we are on this topic, is this legal? (and safe?): # void Foo (string s) { # assert(s.length == 3); # s[0] = 'F'; # s[1] = 'o'; # s[2] = 'o'; # } # # void Bar () { # string s = "Bar"; // Static data? # writefln(s); # Foo(s); # writefln(s); # } I think it should be, for consistency and simplicity, but I don't know about the safety of it. I don't have DMD handy to test it, either. Thanks, --AJG.
Jul 27 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov wrote:


[snip]
 
 I believe that making this distinction, between empty and non-existent 
 arrays, just provides the possibility for another misconception and bug.

When I started seriously coding with D, I was making mistakes in my code because I assumed that D would make this distinction.
 If someone sees real technical necessity to be able to distinguish 
 between the empty and the non-existing one, is invited to show it here.

One reasonable use for a non-existent string is to represent the fact that a default value has not been supplied. As every possible string value, including an empty string, could be the default value, I needed a way to state that a string has no default yet. -- Derek Parnell Melbourne, Australia 24/07/2005 12:07:59 AM
Jul 23 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Sorry, I got the two last examples backwards. The comments should read
"the statement will print"
and then
"the statement will *not* print"

Not the other way around. The point remains the same, though.
Thanks,
--AJG.
Jul 20 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 00:17:51 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Sorry, I got the two last examples backwards. The comments should read
 "the statement will print"
 and then
 "the statement will *not* print"

 Not the other way around. The point remains the same, though.

Sorry, I replied before seeing this post. My reply remains the same minus correcting your mistakes. Regan
Jul 20 2005
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Derek Parnell wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
 
 Mr Heath, I agree with You on this.

I don't. Does ... if (array) ... test for an empty array or a non-existent array? I can't tell from the syntax. It is thus ambiguous. if (array.ptr == null) -- test for a non-existence. if (array.length == 0) -- test for emptiness if (array) -- test for which?

If array might be null, can you be certain that it's proper to dereference it, e.g. array.length would seem to presume that array wasn't null. (Actually, so would array.ptr...but perhaps that's just me.)
Jul 20 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 20 Jul 2005 15:38:04 -0700, Charles Hixson  
<charleshixsn earthlink.net> wrote:
 Derek Parnell wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

Does ... if (array) ... test for an empty array or a non-existent array? I can't tell from the syntax. It is thus ambiguous. if (array.ptr == null) -- test for a non-existence. if (array.length == 0) -- test for emptiness if (array) -- test for which?

If array might be null, can you be certain that it's proper to dereference it, e.g. array.length would seem to presume that array wasn't null. (Actually, so would array.ptr...but perhaps that's just me.)

D guarantees an array reference is never null. Is an array reference _the_ array? No, just like an object reference is not _the_ object (thus why you can have x references to the same object) A null array, "char[] p = null;" has a null data pointer. So, to check for a null array you check the data pointer. An empty array, "char[] p = "";" has a non-null data pointer but a 0 length. So, to check for an empty array you check the length. See my other posts for reasoning as to why "if(array)" checks the null array case. Regan
Jul 20 2005
prev sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opst6x8cje23k2f5 nrage.netwin.co.nz...
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com> 
 wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What 
 about
 making if (array) illegal in D? I think it brings ambiguity and a high 
 potential
 for errors to the language. The main two uses for this construct can 
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is supposed 
 to
 mean, since it's _not_ like C. In C, if (array), where array is 
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr is 
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

I prefer the current behaviour (for all the reasons I mentioned in the previous thread): digitalmars.D/25804 "if (array)" is the same as "if (array.ptr)" which acts just like it does in C, comparing it to 0/null. Essentially the "if" statement is checking the not zero state of the variable itself. In the case of value types it compares the value to 0. In the case of pointers and references it compares them to null. In the case of an array, which (as explained in link above) is a mix/pseudo value/reference type, it compares the data pointer to null. The reason this is the correct behaviour is that a null array has a null data pointer, but, an empty array i.e. an existing set containing no elements may have a non-null data pointer. In both cases they have a 0 length property. Of course we could change this, we could remove the case where an array contains no items but has a non-null data pointer. This IMO would remove a useful distinction, the "existing set containing no items" would be un-representable with a single array variable. IMO that would be a bad move, the current situation(*) is good. (*) there remains the problem where setting the length of an array sets the data pointer to null. This can change an "existing set with no elements" into a "non existant set". Regan

I was poking around the Qt documentation and interestingly enough QString has a concept of null and empty. Here's what they say, though: "For historical reasons, QString distinguishes between a null string and an empty string. [snip] We recommend that you always use isEmpty() and avoid isNull()." The exact doc is http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings
Jul 21 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 21 Jul 2005 22:31:37 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opst6x8cje23k2f5 nrage.netwin.co.nz...
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com>
 wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What
 about
 making if (array) illegal in D? I think it brings ambiguity and a high
 potential
 for errors to the language. The main two uses for this construct can
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is  
 supposed
 to
 mean, since it's _not_ like C. In C, if (array), where array is
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr  
 is
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two  
 options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

I prefer the current behaviour (for all the reasons I mentioned in the previous thread): digitalmars.D/25804 "if (array)" is the same as "if (array.ptr)" which acts just like it does in C, comparing it to 0/null. Essentially the "if" statement is checking the not zero state of the variable itself. In the case of value types it compares the value to 0. In the case of pointers and references it compares them to null. In the case of an array, which (as explained in link above) is a mix/pseudo value/reference type, it compares the data pointer to null. The reason this is the correct behaviour is that a null array has a null data pointer, but, an empty array i.e. an existing set containing no elements may have a non-null data pointer. In both cases they have a 0 length property. Of course we could change this, we could remove the case where an array contains no items but has a non-null data pointer. This IMO would remove a useful distinction, the "existing set containing no items" would be un-representable with a single array variable. IMO that would be a bad move, the current situation(*) is good. (*) there remains the problem where setting the length of an array sets the data pointer to null. This can change an "existing set with no elements" into a "non existant set". Regan

I was poking around the Qt documentation and interestingly enough QString has a concept of null and empty. Here's what they say, though: "For historical reasons, QString distinguishes between a null string and an empty string. [snip] We recommend that you always use isEmpty() and avoid isNull()." The exact doc is http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

That's not too surprising. A lot of people have never seen the need for the distinction, and it certainly can make life "simpler". However, I don't believe you can argue that it doesn't exist, at least logically. That is why you get situations like this (stolen from a post to the DMDScript group): <quote> For example, might it not be useful to return 'null' on EOF, thus allowing this sort of construct: var line = readln(); while (line != null) { ... line = readln(); } </quote> which is an example where there is a desire to distinguish between existance and empty. Sure, you can remove the distinction, lessen the expressiveness of arrays and force everyone to "work around" the deficiency in other ways, it's possible, it can make life simpler for the general case and more complicated for the rest. I think arrays in D are nearly perfect(*). They allow you to ignore the distinction in the general case (thus life is pretty easy already) yet you can tell the difference if you require it. (*) there are only 2 problems with them IMO: 1. length = 0; resets the data pointer to null, changing emtpy into non-existant. 2. "int[0] a;" and "int[] a = new int[0];" produce different results when you'd expect the same thing. Regan
Jul 21 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsuaqfmcv23k2f5 nrage.netwin.co.nz...
 On Thu, 21 Jul 2005 22:31:37 -0400, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opst6x8cje23k2f5 nrage.netwin.co.nz...
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com>
 wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What
 about
 making if (array) illegal in D? I think it brings ambiguity and a high
 potential
 for errors to the language. The main two uses for this construct can
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is 
 supposed
 to
 mean, since it's _not_ like C. In C, if (array), where array is
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr 
 is
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two 
 options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

I prefer the current behaviour (for all the reasons I mentioned in the previous thread): digitalmars.D/25804 "if (array)" is the same as "if (array.ptr)" which acts just like it does in C, comparing it to 0/null. Essentially the "if" statement is checking the not zero state of the variable itself. In the case of value types it compares the value to 0. In the case of pointers and references it compares them to null. In the case of an array, which (as explained in link above) is a mix/pseudo value/reference type, it compares the data pointer to null. The reason this is the correct behaviour is that a null array has a null data pointer, but, an empty array i.e. an existing set containing no elements may have a non-null data pointer. In both cases they have a 0 length property. Of course we could change this, we could remove the case where an array contains no items but has a non-null data pointer. This IMO would remove a useful distinction, the "existing set containing no items" would be un-representable with a single array variable. IMO that would be a bad move, the current situation(*) is good. (*) there remains the problem where setting the length of an array sets the data pointer to null. This can change an "existing set with no elements" into a "non existant set". Regan

I was poking around the Qt documentation and interestingly enough QString has a concept of null and empty. Here's what they say, though: "For historical reasons, QString distinguishes between a null string and an empty string. [snip] We recommend that you always use isEmpty() and avoid isNull()." The exact doc is http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

That's not too surprising. A lot of people have never seen the need for the distinction, and it certainly can make life "simpler". However, I don't believe you can argue that it doesn't exist, at least logically. That is why you get situations like this (stolen from a post to the DMDScript group): <quote> For example, might it not be useful to return 'null' on EOF, thus allowing this sort of construct: var line = readln(); while (line != null) { ... line = readln(); } </quote> which is an example where there is a desire to distinguish between existance and empty. Sure, you can remove the distinction, lessen the expressiveness of arrays and force everyone to "work around" the deficiency in other ways, it's possible, it can make life simpler for the general case and more complicated for the rest. I think arrays in D are nearly perfect(*). They allow you to ignore the distinction in the general case (thus life is pretty easy already) yet you can tell the difference if you require it. (*) there are only 2 problems with them IMO: 1. length = 0; resets the data pointer to null, changing emtpy into non-existant. 2. "int[0] a;" and "int[] a = new int[0];" produce different results when you'd expect the same thing. Regan

Sure, I agree special values can be useful and null is an easy special value to use. Note the same behavior can be obtained with returning a singleton empty just for eof, if desired. The singleton approach could arguably make the code more readable, too, since the reader wouldn't have to know that null line meant eof. For example char[] line = din.readLine(); while (line !is din.eofLine()) { ... line = din.readLine(); } where eofLine can return null or if the stream author wishes it can return some other unique empty string.
Jul 22 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 22 Jul 2005 09:06:48 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 I was poking around the Qt documentation and interestingly enough  
 QString
 has a concept of null and empty. Here's what they say, though: "For
 historical reasons, QString distinguishes between a null string and an
 empty
 string. [snip] We recommend that you always use isEmpty() and avoid
 isNull()."

 The exact doc is
 http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

That's not too surprising. A lot of people have never seen the need for the distinction, and it certainly can make life "simpler". However, I don't believe you can argue that it doesn't exist, at least logically. That is why you get situations like this (stolen from a post to the DMDScript group): <quote> For example, might it not be useful to return 'null' on EOF, thus allowing this sort of construct: var line = readln(); while (line != null) { ... line = readln(); } </quote> which is an example where there is a desire to distinguish between existance and empty. Sure, you can remove the distinction, lessen the expressiveness of arrays and force everyone to "work around" the deficiency in other ways, it's possible, it can make life simpler for the general case and more complicated for the rest. I think arrays in D are nearly perfect(*). They allow you to ignore the distinction in the general case (thus life is pretty easy already) yet you can tell the difference if you require it. (*) there are only 2 problems with them IMO: 1. length = 0; resets the data pointer to null, changing emtpy into non-existant. 2. "int[0] a;" and "int[] a = new int[0];" produce different results when you'd expect the same thing. Regan

Sure, I agree special values can be useful and null is an easy special value to use.

Indeed, null and NAN have a lot in common. They indicate non-existance, or un-initialised. Think how much trouble we have coding with 'int' and other 'value' types that cannot indicate non-existance? esp with container classes and the like. std.boxer wouldn't exist if int could indicate non-existance.
 Note the same behavior can be obtained with returning a singleton
 empty just for eof, if desired. The singleton approach could arguably  
 make the code more readable, too, since the reader wouldn't have to know  
 that
 null line meant eof. For example
  char[] line = din.readLine();
  while (line !is din.eofLine()) { ... line = din.readLine(); }
 where eofLine can return null or if the stream author wishes it can  
 return some other unique empty string.

That code is more descriptive, sure. However, null is more generic in application. You can use it 'everywhere' and everywhere it is used it can have the same meaning. This means no 'special case' code is required (like that shown above). Regan
Jul 23 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

 Sure, I agree special values can be useful and null is an easy special  
 value to use.

Indeed, null and NAN have a lot in common. They indicate non-existance, or un-initialised. Think how much trouble we have coding with 'int' and other 'value' types that cannot indicate non-existance? esp with container classes and the like. std.boxer wouldn't exist if int could indicate non-existance.

Yes! That is exactly right. The problem with using array.ptr as null for existance checks is that it's not orthogonal at all. It only works with arrays. It might also work with classes (not sure). What about primitives? No, it's back to an additional boolean or somesuch. That's why I think it's a crappy solution, and that's exactly the source of the if (array) dilemma in the first place. C# 2.0 will "solve" this problem with the concept of nullable types. Even ints will be nullable. I'm not sure how this is going to work (haven't tried it), but at least it's orthogonal. It works everywhere. Whereas array.ptr is shaky, buggy, likely to change and IMHO unsemantic. If we at least see that this is a problem, and that there is a need for a more complete feature, maybe we can work towards a better solution.
 Note the same behavior can be obtained with returning a singleton
 empty just for eof, if desired. The singleton approach could arguably  
 make the code more readable, too, since the reader wouldn't have to know  
 that
 null line meant eof. For example
  char[] line = din.readLine();
  while (line !is din.eofLine()) { ... line = din.readLine(); }
 where eofLine can return null or if the stream author wishes it can  
 return some other unique empty string.

That code is more descriptive, sure. However, null is more generic in application. You can use it 'everywhere' and everywhere it is used it can have the same meaning. This means no 'special case' code is required (like that shown above).

That's not true. 'Everywhere' would mean complete orthogonality, and as we know, this trick only works with certain types. But I agree with the premise, that nullness is a great (and easy) special value that makes life simpler. Thus a good solution should be built into the language. Cheers, --AJG.
Jul 23 2005
next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...
C# 2.0 will "solve" this problem with the concept of nullable types. Even ints
will be nullable. I'm not sure how this is going to work (haven't tried it), 

Interesting stuff.. I looked into this a bit and apparently the underlying implementation is done through System.Nullable<T>. System.Nullable<int> j; int? k; The 'T?' form is shorthand. As you can imagine, there appears to be quite a bit of overhead involved as the nullable types aren't native. But there's nothing stopping you from retrieving, say, a DB value into a nullable type, checking if it's null and then assigning it to a native variable if it's not. But assigning it requires accessing a property (int k = j.Value;) or a cast (int k = (int)j;). I like the idea, but given that you will still always have to check if a nullable variable is not null before using it or even assigning it to another (non-nullable) variable, I'm having trouble imagining how much more productive / readable it's going to make coding for most chores where "nullable native types" would be useful. For example, for database applications, I can still see a need to write a library of wrapper functions to assign a column to native data types, or if the table was represented by a class, to check for null each time a column was retrieved in order to assign the value to a native type. Either way it seems like it will require about the same amount of code to write most applications, but with added complexity to the language.
Jul 23 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

In article <dbv8fr$1nnm$1 digitaldaemon.com>, Dave says...
In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...
C# 2.0 will "solve" this problem with the concept of nullable types. Even ints
will be nullable. I'm not sure how this is going to work (haven't tried it), 

Interesting stuff.. I looked into this a bit and apparently the underlying implementation is done through System.Nullable<T>. System.Nullable<int> j; int? k; The 'T?' form is shorthand.

Interesting. I didn't know that. I was actually kinda hoping they found a magic "native" way, but I guess not.
As you can imagine, there appears to be quite a bit of overhead involved as the
nullable types aren't native.

Yes, I agree. Though I shouldn't speculate without having even tested for performance.
But there's nothing stopping you from retrieving,
say, a DB value into a nullable type, checking if it's null and then assigning
it to a native variable if it's not. But assigning it requires accessing a
property (int k = j.Value;) or a cast (int k = (int)j;).

This looks a little cumbersome. It will remain cumbersome without language support, IMHO.
I like the idea, but given that you will still always have to check if a
nullable variable is not null before using it or even assigning it to another
(non-nullable) variable, I'm having trouble imagining how much more >productive
/
readable it's going to make coding for most chores where "nullable native
>types" would be useful.

I disagree here. I think the non-existent concept is a good one. It's useful in arrays (and possibly classes), and I think the usefulness extends across primitives as well.
For example, for database applications, I can still see a need
to write a library of wrapper functions to assign a column to native data
>types,
or if the table was represented by a class, to check for null each time a
>column was retrieved in order to assign the value to a native type. 

For instance: # void someDataFunc(nullableInt dbValue) { # // Handle the special case: # if (!dbValue) throw new Exception("Value must be specified."); # # // These can now all be valid values: # if (dbValue < 0) { /* Case 1 */ } # else if (dbValue == 0) { /* Case 2 */ } # else if (dbValue > 0) { /* Case 3 */ } # else assert(false); # }
Either way it seems like it will require about the same amount of code to write
most applications, but with added complexity to the language.

Some (most?) of the complexity is already there. Arrays and Classes both are already capable of existing vs. being empty. This would merely extend the feature for orthogonality. I think it would be fairly useful. Just my 2 cents. --AJG.
Jul 23 2005
parent reply Dave <Dave_member pathlink.com> writes:
In article <dbvajm$1p2i$1 digitaldaemon.com>, AJG says...
Hi,

In article <dbv8fr$1nnm$1 digitaldaemon.com>, Dave says...
In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...
C# 2.0 will "solve" this problem with the concept of nullable types. Even ints
will be nullable. I'm not sure how this is going to work (haven't tried it), 

Interesting stuff.. I looked into this a bit and apparently the underlying implementation is done through System.Nullable<T>. System.Nullable<int> j; int? k; The 'T?' form is shorthand.

Interesting. I didn't know that. I was actually kinda hoping they found a magic "native" way, but I guess not.
As you can imagine, there appears to be quite a bit of overhead involved as the
nullable types aren't native.

Yes, I agree. Though I shouldn't speculate without having even tested for performance.

I did run a quick test w/ a simple loop using a few assignment operators and the penalty was on the order of 4-5x. Even if that is actually indicative of what you could expect with 'real world' code, I'm not implying that C# nullable types aren't useful just because of a performance penalty. Who knows, if built into D 'natively', maybe all/most of that penalty could be (practically) optimized away or maybe there's an internal implementation possible that wouldn't cause any 'penalty'?
But there's nothing stopping you from retrieving,
say, a DB value into a nullable type, checking if it's null and then assigning
it to a native variable if it's not. But assigning it requires accessing a
property (int k = j.Value;) or a cast (int k = (int)j;).

This looks a little cumbersome. It will remain cumbersome without language support, IMHO.
I like the idea, but given that you will still always have to check if a
nullable variable is not null before using it or even assigning it to another
(non-nullable) variable, I'm having trouble imagining how much more >productive
/
readable it's going to make coding for most chores where "nullable native
>types" would be useful.

I disagree here. I think the non-existent concept is a good one. It's useful in arrays (and possibly classes), and I think the usefulness extends across primitives as well.
For example, for database applications, I can still see a need
to write a library of wrapper functions to assign a column to native data
>types,
or if the table was represented by a class, to check for null each time a
>column was retrieved in order to assign the value to a native type. 

For instance: # void someDataFunc(nullableInt dbValue) { # // Handle the special case: # if (!dbValue) throw new Exception("Value must be specified."); # # // These can now all be valid values: # if (dbValue < 0) { /* Case 1 */ } # else if (dbValue == 0) { /* Case 2 */ } # else if (dbValue > 0) { /* Case 3 */ } # else assert(false); # }
Either way it seems like it will require about the same amount of code to write
most applications, but with added complexity to the language.

Some (most?) of the complexity is already there. Arrays and Classes both are already capable of existing vs. being empty. This would merely extend the feature for orthogonality. I think it would be fairly useful. Just my 2 cents. --AJG.

Jul 24 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 14:38:05 +0000 (UTC), Dave <Dave_member pathlink.com>  
wrote:
 In article <dbv8fr$1nnm$1 digitaldaemon.com>, Dave says...
 In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...
 C# 2.0 will "solve" this problem with the concept of nullable types.  
 Even ints
 will be nullable. I'm not sure how this is going to work (haven't  
 tried it),

Interesting stuff.. I looked into this a bit and apparently the underlying implementation is done through System.Nullable<T>. System.Nullable<int> j; int? k; The 'T?' form is shorthand.

Interesting. I didn't know that. I was actually kinda hoping they found a magic "native" way, but I guess not.
 As you can imagine, there appears to be quite a bit of overhead  
 involved as the
 nullable types aren't native.

Yes, I agree. Though I shouldn't speculate without having even tested for performance.

I did run a quick test w/ a simple loop using a few assignment operators and the penalty was on the order of 4-5x. Even if that is actually indicative of what you could expect with 'real world' code, I'm not implying that C# nullable types aren't useful just because of a performance penalty. Who knows, if built into D 'natively', maybe all/most of that penalty could be (practically) optimized away or maybe there's an internal implementation possible that wouldn't cause any 'penalty'?

D's arrays are exactly this. They are a nullable type which is implemented as a stack based value type. They are fast and efficient. Walter could do the same thing with boxing i.e. build boxing into the language, implement it using a stack based value type. Regan
Jul 24 2005
prev sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 04:07:09 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Sure, I agree special values can be useful and null is an easy special
 value to use.

Indeed, null and NAN have a lot in common. They indicate non-existance, or un-initialised. Think how much trouble we have coding with 'int' and other 'value' types that cannot indicate non-existance? esp with container classes and the like. std.boxer wouldn't exist if int could indicate non-existance.

Yes! That is exactly right. The problem with using array.ptr as null for existance checks is that it's not orthogonal at all. It only works with arrays.

No, the key point you seem to be missing is: "if(x)" compares 'x' to null or 0. It is _not_ intended to test for existance, that is _not_ it's purpose. The "if(x)" rule is true *for all types* even primitives (with the exception of a struct - because it is user defined and cannot be compared to null or 0). If the variable 'x' is a reference type it compares the reference to null. Arrays are references, so it compares the array reference to null. Array references cannot be null. When an array reference would be null, the array.ptr is null. Therefore to compare an array to null, you compare the array.ptr to null. This behaviour is _required_ to make arrays orthogonal with other references. This behaviour is completely orthogonal *for all types* and this can be proven by example. class C {} char[] p = null; C c = null; int i = 0; if (c) { //not true } if (p) { //not true } if (i) { //not true }
 It might also work with classes (not sure).

Yes, see above.
 What about primitives? No, it's back to an additional boolean or  
 somesuch. That's why I think it's a crappy solution, and that's exactly  
 the source of the if (array) dilemma in the first place.

The ability to express non-existance has nothing to do with the "if(x)" statement. The "if(x)" statement's purpose is not specifically to test for non-existance. I repeat: "if(x)" compares 'x' to null or 0 That's it. Yes, you can use it to test for non-existance with reference and pointer types. That however is not it's purpose.
 C# 2.0 will "solve" this problem with the concept of nullable types.

And we have std.boxer.
 Even ints will be nullable. I'm not sure how this is going to work  
 (haven't tried it), but at least it's orthogonal. It works everywhere.

Likely it's going to work like std.boxer except automagically.
 Whereas array.ptr is shaky, buggy, likely to change and IMHO unsemantic.  
 If we at least see that this is a problem, and that there is a need for  
 a more complete feature, maybe we can work towards a better solution.

IMO there is no problem with "if(x)". Not being able to represent non-existance is a tradeoff when using value types. std.boxer is the solution to those tradeoffs, that or using pointers.
 Note the same behavior can be obtained with returning a singleton
 empty just for eof, if desired. The singleton approach could arguably
 make the code more readable, too, since the reader wouldn't have to  
 know
 that
 null line meant eof. For example
  char[] line = din.readLine();
  while (line !is din.eofLine()) { ... line = din.readLine(); }
 where eofLine can return null or if the stream author wishes it can
 return some other unique empty string.

That code is more descriptive, sure. However, null is more generic in application. You can use it 'everywhere' and everywhere it is used it can have the same meaning. This means no 'special case' code is required (like that shown above).

That's not true. 'Everywhere' would mean complete orthogonality, and as we know, this trick only works with certain types.

You are correct. I meant only to refer to reference and pointer types above which can express non existance. However, I repeat (because this is an important fact): The purpose of "if(x)" is not to test for non-existance, it's purpose is to compare 'x' to null or 0. Nothing more, nothing less.
 But I agree with the premise, that nullness is a great (and easy)  
 special value that makes life simpler.

In this we agree. :)
 Thus a good solution should be built into the language.

IMO a good solution _is_ built into the language. Arrays, are a good solution to the problem posed by types which can represent non-existance, that problem being that the added expressiveness comes with greater risk of accidental use. Arrays cannot be null, yet can represent non-existance, they're a great solution. Unfortunately they're not the solution to the "non-existance of value types" problem, which currently has 2 solutions: - std.boxer. - pointers. Both these solutions involve references/pointers that can be null, so they suffer from the risk involved in using null, unlike arrays. Regan
Jul 24 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 24 Jul 2005 20:47:43 +1200, Regan Heath <regan netwin.co.nz> wrote:
 Thus a good solution should be built into the language.

IMO a good solution _is_ built into the language. Arrays, are a good solution to the problem posed by types which can represent non-existance, that problem being that the added expressiveness comes with greater risk of accidental use. Arrays cannot be null, yet can represent non-existance, they're a great solution.

Re-reading this I don't think I was clear enough. What I meant here is that arrays themselves are types which can represent non-existance, they're done in a clever way which enables the expressiveness without the cost. I didn't mean to imply that you could store a value type in an array and represent non-existance in some way (except of the whole array). Regan
Jul 24 2005