digitalmars.D - [Suggestion] Make if(array) illegal.

AJG (17/17) Jul 19 2005 Hi,

Derek Parnell (6/26) Jul 19 2005 Both these suggestions wouldn't effect me.

Regan Heath (3/32) Jul 19 2005 :) because you're explicit all the time.
AJG (1/11) Jul 19 2005

Regan Heath (25/45) Jul 19 2005 I prefer the current behaviour (for all the reasons I mentioned in the

AJG (40/48) Jul 19 2005 I undersntand, and I agree with your opinion. Losing this possible

Regan Heath (53/98) Jul 20 2005 Excellent.

Dejan Lekic (6/6) Jul 20 2005 Mr Heath, I agree with You on this.

Derek Parnell (13/14) Jul 20 2005 I don't.

Regan Heath (10/18) Jul 20 2005 It does what it always does, for every type in D, it tests whether 'arra...

Derek Parnell (32/57) Jul 20 2005 I think I'm not understanding this.

Regan Heath (43/81) Jul 20 2005 I'd say: it defines a variable 'array' which is a reference to a

Derek Parnell (71/97) Jul 20 2005 Actually that turns out not to be the case. If it was, then 'array' woul...

Regan Heath (26/105) Jul 20 2005 I'll have to take your word for it, my assembler knowledge is non exista...

Derek Parnell (17/28) Jul 20 2005 On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:

Regan Heath (26/49) Jul 20 2005 Like opAssign for a normal struct could do.

Derek Parnell (12/25) Jul 20 2005 Oops. Yes I got that wrong. Your code is right.

AJG (34/44) Jul 20 2005 That's not exactly true. As you mentioned yourself, .length = 0 makes th...

Regan Heath (55/99) Jul 20 2005 Not anymore, that is why this is a BUG.

Ben Hinkle (14/24) Jul 20 2005 I can sympathize with the argument that it should be illegal to implicit...

AJG (14/25) Jul 20 2005 I agree that this is something to think about. Of course, there is a fun...

Ben Hinkle (12/25) Jul 21 2005 Maybe "easier" isn't the right word :-)

Ilya Minkov (10/22) Jul 20 2005 Making difference between an empty array and a nonexistent one is flaky,...

AJG (36/43) Jul 20 2005 Hm... not only does this distinction exist, it is in fact _very_ much av...

Regan Heath (47/94) Jul 20 2005 It's true.

AJG (61/137) Jul 20 2005 Praise the lord, agreement. ;)

Regan Heath (117/238) Jul 20 2005 We're both men of "distinction" ;)

AJG (26/38) Jul 21 2005 Yes, I suppose we can agree to disagree.

Regan Heath (34/39) Jul 21 2005 Template programming is an example of where we rely on the logical

Derek Parnell (12/31) Jul 20 2005 Maybe in your world, but not in mine.

Regan Heath (5/34) Jul 20 2005 Which is pointless because when the array pointer is null the length
Ilya Minkov (12/18) Jul 22 2005 The matter of discussion is not your or my view of the real world, nor

Regan Heath (43/59) Jul 23 2005 Sure, however D exists in the real world. Programmers solve real world

Ilya Minkov (33/82) Jul 23 2005 In C it was extremely important, and one had to keep one's eye on

Regan Heath (47/126) Jul 23 2005 Sure, memory management makes things complicated. But, uniqueness has

Regan Heath (5/18) Jul 23 2005 Should be:
Holger (8/19) Jul 23 2005 Hi Regan, you're of course spot-on.

Ilya Minkov (10/15) Jul 24 2005 Misconceptions? That was a typo and Regan could, if he cared to read

Regan Heath (9/26) Jul 24 2005 I do, in fact, "know you" well enough to have thought as I typed my repl...

Holger (5/32) Jul 24 2005 Regan, calm down please. It's me, Holger, that is the hooligan here!

Regan Heath (7/47) Jul 24 2005 You are right. ;) I am calm, I did not intend for my comments above to

Holger (7/22) Jul 24 2005 Good answer Ilya, you hit the mark. However, my philippic wasn't specifi...

Ilya Minkov (64/121) Jul 24 2005 How? If this was so, it would break the promise that malloc never

Regan Heath (81/183) Jul 24 2005 I have no idea, I am just repeating what the MSDN documentation says ;) ...

James McComb (14/18) Jul 24 2005 Okay... I obviously don't get D strings because this seems wildly

Regan Heath (38/54) Jul 24 2005 I understand where you're going with this, the important fact here is th...

James McComb (8/13) Jul 24 2005 Thanks for the examples. I understand what you mean, now.

Derek Parnell (16/34) Jul 24 2005 These lines are equivalent to ...

AJG (18/35) Jul 24 2005 You are right, it is a little counter-intuitive. Then you say array = nu...

Ilya Minkov (29/38) Jul 27 2005 You cannot say "new" because the array (slice) does not have the

AJG (34/57) Jul 27 2005 I had been wondering how to define reference semantics for a while, and ...

Derek Parnell (13/17) Jul 23 2005 On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov wrote:

AJG (7/7) Jul 20 2005 Sorry, I got the two last examples backwards. The comments should read

Regan Heath (5/10) Jul 20 2005 Sorry, I replied before seeing this post. My reply remains the same minu...

Charles Hixson (5/23) Jul 20 2005 If array might be null, can you be certain that it's proper to

Regan Heath (12/28) Jul 20 2005 D guarantees an array reference is never null.

Ben Hinkle (9/58) Jul 21 2005 I was poking around the Qt documentation and interestingly enough QStrin...

Regan Heath (32/105) Jul 21 2005 That's not too surprising. A lot of people have never seen the need for ...

Ben Hinkle (11/118) Jul 22 2005 Sure, I agree special values can be useful and null is an easy special v...

Regan Heath (12/77) Jul 23 2005 Indeed, null and NAN have a lot in common. They indicate non-existance, ...

AJG (18/38) Jul 23 2005 Yes! That is exactly right. The problem with using array.ptr as null for

Dave (21/23) Jul 23 2005 Interesting stuff.. I looked into this a bit and apparently the underlyi...

AJG (27/51) Jul 23 2005 Interesting. I didn't know that. I was actually kinda hoping they found ...

Dave (8/63) Jul 24 2005 I did run a quick test w/ a simple loop using a few assignment operators...

Regan Heath (7/49) Jul 24 2005 D's arrays are exactly this. They are a nullable type which is implement...

Regan Heath (55/100) Jul 24 2005 No, the key point you seem to be missing is: "if(x)" compares 'x' to nul...

Regan Heath (7/13) Jul 24 2005 Re-reading this I don't think I was clear enough. What I meant here is

AJG <AJG_member pathlink.com> writes:

Hi,

This is a suggestion based on a thread from a couple of weeks ago. What about
making if (array) illegal in D? I think it brings ambiguity and a high potential
for errors to the language. The main two uses for this construct can already be
done with a slightly more explicit syntax:

if (array.ptr == null) // Check for a kind of "non-existance."
if (array.length == 0) // Check for explicit emptiness.

On the other hand, one is not sure what if (array) by itself is supposed to
mean, since it's _not_ like C. In C, if (array), where array is typically a
pointer, means simply != NULL. The problem in D is that the array ptr is tricky
and IMHO it's best not to interface with it directly.

I think it would be wise to remove this ambiguity. I propose two options:
1) Make if (array) equal _always_ to if (array.length).
2) Simply make it illegal.

What do you guys think? Walter?

Thanks,
--AJG.

Jul 19 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG wrote:

 Hi,
 
 This is a suggestion based on a thread from a couple of weeks ago. What about
 making if (array) illegal in D? I think it brings ambiguity and a high
potential
 for errors to the language. The main two uses for this construct can already be
 done with a slightly more explicit syntax:
 
 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.
 
 On the other hand, one is not sure what if (array) by itself is supposed to
 mean, since it's _not_ like C. In C, if (array), where array is typically a
 pointer, means simply != NULL. The problem in D is that the array ptr is tricky
 and IMHO it's best not to interface with it directly.
 
 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.
 
 What do you guys think?

Both these suggestions wouldn't effect me.

-- 
Derek
Melbourne, Australia
20/07/2005 12:26:36 PM

Jul 19 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 12:27:23 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG wrote:

 Hi,

 This is a suggestion based on a thread from a couple of weeks ago. What  
 about
 making if (array) illegal in D? I think it brings ambiguity and a high  
 potential
 for errors to the language. The main two uses for this construct can  
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is  
 supposed to
 mean, since it's _not_ like C. In C, if (array), where array is  
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr  
 is tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two  
 options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think?

 Both these suggestions wouldn't effect me.

:) because you're explicit all the time.

Regan

Jul 19 2005

AJG <AJG_member pathlink.com> writes:

 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.
 
 What do you guys think?

Both these suggestions wouldn't effect me.

I'll take that to mean "yes" if you don't mind. ;)

-- 
Derek
Melbourne, Australia
20/07/2005 12:26:36 PM

Jul 19 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What  
 about
 making if (array) illegal in D? I think it brings ambiguity and a high  
 potential
 for errors to the language. The main two uses for this construct can  
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is supposed  
 to
 mean, since it's _not_ like C. In C, if (array), where array is  
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr is  
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

I prefer the current behaviour (for all the reasons I mentioned in the  
previous thread):
   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804

"if (array)" is the same as "if (array.ptr)" which acts just like it does  
in C, comparing it to 0/null.

Essentially the "if" statement is checking the not zero state of the  
variable itself. In the case of value types it compares the value to 0. In  
the case of pointers and references it compares them to null.

In the case of an array, which (as explained in link above) is a  
mix/pseudo value/reference type, it compares the data pointer to null.

The reason this is the correct behaviour is that a null array has a null  
data pointer, but, an empty array i.e. an existing set containing no  
elements may have a non-null data pointer. In both cases they have a 0  
length property.

Of course we could change this, we could remove the case where an array  
contains no items but has a non-null data pointer. This IMO would remove a  
useful distinction, the "existing set containing no items" would be  
un-representable with a single array variable. IMO that would be a bad  
move, the current situation(*) is good.

(*) there remains the problem where setting the length of an array sets  
the data pointer to null. This can change an "existing set with no  
elements" into a "non existant set".

Regan

Jul 19 2005

AJG <AJG_member pathlink.com> writes:

Hi Regan,

Of course we could change this, we could remove the case where an array  
contains no items but has a non-null data pointer. This IMO would remove a  
useful distinction, the "existing set containing no items" would be  
un-representable with a single array variable. IMO that would be a bad  
move, the current situation(*) is good.

I undersntand, and I agree with your opinion. Losing this possible
representation would not be good. Moreover, it is unnecessary. But my idea is
not about that at all. I don't want to change the way the arrays themselves
work.

As we know, all current representations will still be available via array.ptr.
That's fine with me. I'll never use array.ptr, but if people need it, then it's
all good. Just like regular pointers. I don't use them in D at all, but they are
still useful to people.

The only thing I propose is to remove ambiguity in one kind of construct. If we
take a look at the semantics of if (array), you will see what I mean when I said
it's different than in C. In C, when you do



You are literally testing whether it is pointing to an array or not. If it is,
delete it and null the pointer. It's very semantic, and very clear.

In D, on the other hand, the concept of "pointing to an array" is gone. The
reference is always there. It is never null. So when you do if (array) you are
saying "if this reference's pointer contains any data." That's a fine query to
make, but not via if (array). To me, at least, that is not immediately clear.

Asking if (array) to me means "does this array exist?" In D, the answer is
always yes. Technically the array reference is always there. Which is why as a
sort of hack, array.ptr is tested instead. That's why the semantics of it are
lost (or worse, mixed).

Therefore, it introduces ambiguity, which is what I want to prevent. If the
meaning of an expression is not immediately clear and intuitive, I think people
are going to misuse it. I can already see new programmers using that expression
to test for emptiness.

That would be fine in C. In C, Empty == NotExistent. But not in D. Thus, my idea
is to either make it so that it works semantically like C, or at least remove
the construct to avoid those potential errors.

The worst part is that these errors would be a shifty bugs to catch. Why?
Because if (array) _sometimes_ works for length, and sometimes it doesn't.
That's just no good in my book. 

Anyway, remember that if (array.ptr == null) will always be there.

(*) there remains the problem where setting the length of an array sets  
the data pointer to null. This can change an "existing set with no  
elements" into a "non existant set".

This is exactly the sort of thing I meant when I said the array.ptr is tricky.
Static/Const arrays also don't help. In general, I don't think the pointer
should be messed around with unless (a) you know what you are doing and (b) it's
necessary (i.e. when the "bare metal" is needed).

Cheers,
--AJG.

Jul 19 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 04:45:21 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Of course we could change this, we could remove the case where an array
 contains no items but has a non-null data pointer. This IMO would  
 remove a
 useful distinction, the "existing set containing no items" would be
 un-representable with a single array variable. IMO that would be a bad
 move, the current situation(*) is good.

 I undersntand, and I agree with your opinion. Losing this possible
 representation would not be good. Moreover, it is unnecessary. But my  
 idea is not about that at all. I don't want to change the way the arrays  
 themselves work.

Excellent.

 As we know, all current representations will still be available via  
 array.ptr. That's fine with me. I'll never use array.ptr, but if people  
 need it, then it's all good. Just like regular pointers. I don't use  
 them in D at all, but they are still useful to people.

Ok.

 The only thing I propose is to remove ambiguity in one kind of construct.

First you must convice Walter it's ambiguous, which is what you're trying  
to do :)
I don't think it is.

 If we take a look at the semantics of if (array), you will see what I  
 mean when I said it's different than in C. In C, when you do



 You are literally testing whether it is pointing to an array or not. If  
 it is, delete it and null the pointer. It's very semantic, and very  
 clear.

Sure. In your example above you represent an array with a pointer.

 In D, on the other hand, the concept of "pointing to an array" is gone.

Yes and no, slicing could be described as pointing to an array. Sure, it  
doesn't use a pointer per-se but when you slice you create a value type  
containing a pointer and length, the pointer points somewhere in the  
original array.

 The reference is always there. It is never null.

Yes, and I can see the angle you're taking here, but I think the current  
behaviour is consistent, take for example these statements:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

Do you agree with those statements? If not, which ones, and why?

If you change "if (x)" for arrays to compare the length property instead  
of the data pointer then you invalidate all but the last statement C.

If you do that then arrays no longer behave like references, pointers, or  
basic types i.e. int, float, etc.

 So when you do if (array) you are saying "if this reference's pointer  
 contains any data."

No, you're saying "is 'array' null or 0".

Given:
  - an array reference can never be null (as you say)
  - in all situations when an array reference would be null the data  
pointer is null

therefore, to compare the reference to null you simply compare the data  
pointer to null.

 That's a fine query to make, but not via if (array). To me, at least,  
 that is not immediately clear.

It may not be immediately clear, but I believe it's consistent and  
unambiguous.

 Asking if (array) to me means "does this array exist?" In D, the answer  
 is always yes.

It depends what you're referring to as the "array". Yes, a struct always  
exists. Yes, the reference always refers to it. But, the data pointer is  
the key element that either exists (not null, allocated) or does not  
(null). Just as in your C example above.

 Technically the array reference is always there. Which is why as a sort  
 of hack, array.ptr is tested instead.

IMO it's not a hack, it's the correct behaviour (reasoning above) "to  
compare the reference to null you simply compare the data pointer to null."

 That's why the semantics of it are lost (or worse, mixed).

They're only lost where length is reassigned to 0, this is a bug IMO.  
Otherwise they're fine AFAICS.

 Therefore, it introduces ambiguity, which is what I want to prevent. If  
 the meaning of an expression is not immediately clear and intuitive, I  
 think people are going to misuse it.

Sure. I understand the point you're trying to make. I just don't agree  
with any of your reasoning (yet).

 I can already see new programmers using that expression to test for  
 emptiness.

Why? It's an established fact that to check the length of an array you use  
the length property, it's comparable to just about any other container  
class in just about any language you care to name. (if not 'length' then  
'size' or 'elements' or some other property/member)

 That would be fine in C. In C, Empty == NotExistent.

Not true.

char *empty = "";
char *notexistant = null;

 But not in D. Thus, my idea is to either make it so that it works  
 semantically like C, or at least remove the construct to avoid those  
 potential errors.

 The worst part is that these errors would be a shifty bugs to catch. Why?
 Because if (array) _sometimes_ works for length, and sometimes it  
 doesn't.
 That's just no good in my book.

"if (array)" never checks length, it *always* compares 'array' (the  
variable) with null or 0, nothing more, nothing less. You wouldn't expect  
"if (x)" to call the length member of a class 'x' would you? Why would you  
expect it to do so for arrays?

Regan

Jul 20 2005

Dejan Lekic <leka entropy.tmok.com> writes:

Mr Heath, I agree with You on this. 

-- 
...........
Dejan Lekic
  http://dejan.lekic.org

Jul 20 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

I don't.

Does ...

  if (array) ...

test for an empty array or a non-existent array? I can't tell from the
syntax. It is thus ambiguous.

  if (array.ptr == null) -- test for a non-existence.

  if (array.length == 0) -- test for emptiness

  if (array) -- test for which?


-- 
Derek Parnell
Melbourne, Australia
20/07/2005 7:47:04 PM

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 19:49:19 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

 I don't.

 Does ...

   if (array) ...

 test for an empty array or a non-existent array?

It does what it always does, for every type in D, it tests whether 'array'  
is null or 0.
A null array is a non-existant array, thus it tests for a non-existant  
array.

 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined.

The only 'catch' in this case is that an array cannot be null. However,  
when an array would be null it's data pointer is null, therefore testing  
the data pointer _is_ testing the array.

Regan

Jul 20 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 20 Jul 2005 22:42:22 +1200, Regan Heath wrote:

 On Wed, 20 Jul 2005 19:49:19 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

 I don't.

 Does ...

   if (array) ...

 test for an empty array or a non-existent array?

 
 It does what it always does, for every type in D, it tests whether 'array'  
 is null or 0.
 A null array is a non-existant array, thus it tests for a non-existant  
 array.

I think I'm not understanding this.

I thought that 

   char[] array;

defined an eight-byte structure in RAM in which the first 4-bytes is the
current length of the array (if it is allocated) and the second 4-bytes is
the address of the array data. Initially all eight bytes are zero.

Thus when I see "if (array)" I think it is converted into machine language
instructions that tests the second 4-bytes against zero. In other words ...

  if (array)

is essentially the same as 

  if (array.ptr == 0)

and 

  if (*(cast(int*)((&array)+4)) == 0)

I'm only guessing at this, because I haven't see it written down this
*explicitly* ;-)

 I can't tell from the
 syntax. It is thus ambiguous.

 
 Granted, it's not 'explicit'. However, the behaviour is well defined.

Where is that behavior defined? I can't see it in the documentation.
 
 The only 'catch' in this case is that an array cannot be null. 

Of course not. It's an 8-byte structure. All 8 bytes can be zero though.

 However,  
 when an array would be null it's data pointer is null, therefore testing  
 the data pointer _is_ testing the array.

Huh? You just said that 'array cannot be null' so how does that reconcile
with 'when an array would be null'? 

But back to what I was saying ...

  if (array) 

is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it is
testing the first 4-byte field or the second 4-byte field in 'array'. It's
behaviour may be precisely defined, but I haven't seen that yet. 

Oh, and there is a difference in semantics to testing array.ptr and
array.length.


-- 
Derek Parnell
Melbourne, Australia
20/07/2005 10:35:31 PM

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek psych.ward> wrote:
 Does ...

   if (array) ...

 test for an empty array or a non-existent array?

 It does what it always does, for every type in D, it tests whether  
 'array' is null or 0.
 A null array is a non-existant array, thus it tests for a non-existant
 array.

 I think I'm not understanding this.

 I thought that

    char[] array;

 defined an eight-byte structure in RAM in which the first 4-bytes is the
 current length of the array (if it is allocated) and the second 4-bytes  
 is the address of the array data. Initially all eight bytes are zero.

I'd say: it defines a variable 'array' which is a reference to a  
struct/class like you've described.

 Thus when I see "if (array)" I think it is converted into machine  
 language instructions that tests the second 4-bytes against zero.

Because you're thinking of 'array' as a struct. It's not, it's a  
reference. Thus, "if (array)" compares that reference to null.

I'd guess the reason you think of it as a struct is because like a struct  
it cannot be null. That is the only similarity it has to a struct, all the  
rest of it's behaviour is that of a reference.

Because it's a reference you can set it to null, because it's a reference  
you can say "if(array)", because it's a reference you can say "if(array is  
null)", because it's a reference it behaves like any other reference,  
except for the fact that it cannot be null.

It is logically consistent with all other types in D (barring structs), eg.

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

These are all correct and true for all types in D barring structs.  
(replace null and "" with 0 and 1 for value types).

 I can't tell from the
 syntax. It is thus ambiguous.

 Granted, it's not 'explicit'. However, the behaviour is well defined.

 Where is that behavior defined? I can't see it in the documentation.

I was referring to the behaviour of "if (x)". Most people know, or quickly  
learn this behaviour.

 The only 'catch' in this case is that an array cannot be null.

 Of course not. It's an 8-byte structure.

No, it's not. Or rather, we have to decide what exactly we're talking  
about here.

Above, you defined a variable 'array'. It is a reference. It refers to an  
object. The object contains some data and has a length property.

The array reference, like any other can be set to 'null'. However the  
implementation is such that it is defined never to be null. Yet,  
statements in the form "if (array is null)" and "if (array)" still behave  
like the reference is null. (thus they are consistent, see A,B,C above)

They behave in that way because they check the data pointer, the data  
pointer is the part of the object that mirrors the state the reference  
would have, were it not prohibited from being null. In essence the data  
pointer _is_ the array, the rest is implementation around it.

 All 8 bytes can be zero though.

Just like a normal struct. However an array reference is not itself a  
struct, it's a reference to a object (struct/class) with a length property.

 However,
 when an array would be null it's data pointer is null, therefore testing
 the data pointer _is_ testing the array.

 Huh? You just said that 'array cannot be null' so how does that reconcile
 with 'when an array would be null'?

The data pointer mirrors the state the reference would have, were it not  
for the implementation ensuring the reference is never null. Essentially  
the data pointer _is_ the array, the rest is implementation.

 But back to what I was saying ...

   if (array)

 is ambiguous because *JUST BY LOOKING AT THE CODE* one cannot tell if it  
 is testing the first 4-byte field or the second 4-byte field in 'array'.

So? This is no different to any other variable type, try an int for  
example.

 It's behaviour may be precisely defined, but I haven't seen that yet.

It's behaviour is to test the variable 'array' against null or 0.

 Oh, and there is a difference in semantics to testing array.ptr and
 array.length.

Of course. Which is why changing "if(array)" to test the length breaks  
logical consistency and is just plain wrong IMO.

Regan

Jul 20 2005

Derek Parnell <derek psych.ward> writes:

On Thu, 21 Jul 2005 10:39:54 +1200, Regan Heath wrote:

 On Wed, 20 Jul 2005 22:49:04 +1000, Derek Parnell <derek psych.ward> wrote:
 Does ...

   if (array) ...

 test for an empty array or a non-existent array?

 It does what it always does, for every type in D, it tests whether  
 'array' is null or 0.
 A null array is a non-existant array, thus it tests for a non-existant
 array.

 I think I'm not understanding this.

 I thought that

    char[] array;

 defined an eight-byte structure in RAM in which the first 4-bytes is the
 current length of the array (if it is allocated) and the second 4-bytes  
 is the address of the array data. Initially all eight bytes are zero.

 
 I'd say: it defines a variable 'array' which is a reference to a  
 struct/class like you've described.

Actually that turns out not to be the case. If it was, then 'array' would
be represented by a 4-byte value which contained the address of the 8-byte
struct, {uint len, void* ptr}. However, if you look at the generated
machine code you can see that the 8-byte struct _is_ the 'array'. In other
words, 'array' is not a reference to a struct/class. 

Here is what I found. I compiled this D code ...

  void main()
  {
    char[] array;
    if (array.ptr == null)
    {
        array.length = 2;
    }
    if (array.length == 3)
    {
        array.length = 4;
    }
    if (array)
    {
        array.length = 5;
    }
  }

And this is the generated machine code ...

        assume  CS:__Dmain
  L0:           enter   8,0
                push    EBX
                mov     dword ptr -8[EBP],0
                mov     dword ptr -4[EBP],0
                cmp     dword ptr -4[EBP],0
                jne     L29
                lea     EAX,-8[EBP]
                push    EAX
                push    1
                push    2
                call    near ptr __d_arraysetlength
                add     ESP,0Ch
  L29:          cmp     dword ptr -8[EBP],3
                jne     L3F
                lea     ECX,-8[EBP]
                push    ECX
                push    1
                push    4
                call    near ptr __d_arraysetlength
                add     ESP,0Ch
  L3F:          mov     EDX,-4[EBP]
                or      EDX,-8[EBP]
                je      L57
                lea     EBX,-8[EBP]
                push    EBX
                push    1
                push    5
                call    near ptr __d_arraysetlength
                add     ESP,0Ch
  L57:          pop     EBX
                leave
                ret
  __Dmain ends

As you can see, the 8-byte struct is reserved in the local stack and
references to array.ptr and array.length are direct accesses of the stack
space and not dereferenced via a pointer. Furthermore, 'if (array)' is
equivalent to ...

  if (array.len == 0 || array.ptr == null)

which I think is slightly slower than testing either .length or .ptr


[snip]
 
 Of course. Which is why changing "if(array)" to test the length breaks  
 logical consistency and is just plain wrong IMO.

I'm not asking for it's behavior to be changed, just documented.

-- 
Derek
Melbourne, Australia
21/07/2005 9:35:42 AM

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 10:02:49 +1000, Derek Parnell <derek psych.ward> wrote:
 I thought that

    char[] array;

 defined an eight-byte structure in RAM in which the first 4-bytes is  
 the
 current length of the array (if it is allocated) and the second 4-bytes
 is the address of the array data. Initially all eight bytes are zero.

 I'd say: it defines a variable 'array' which is a reference to a
 struct/class like you've described.

 Actually that turns out not to be the case. If it was, then 'array' would
 be represented by a 4-byte value which contained the address of the  
 8-byte
 struct, {uint len, void* ptr}. However, if you look at the generated
 machine code you can see that the 8-byte struct _is_ the 'array'. In  
 other
 words, 'array' is not a reference to a struct/class.

 Here is what I found. I compiled this D code ...

   void main()
   {
     char[] array;
     if (array.ptr == null)
     {
         array.length = 2;
     }
     if (array.length == 3)
     {
         array.length = 4;
     }
     if (array)
     {
         array.length = 5;
     }
   }

 And this is the generated machine code ...

         assume  CS:__Dmain
   L0:           enter   8,0
                 push    EBX
                 mov     dword ptr -8[EBP],0
                 mov     dword ptr -4[EBP],0
                 cmp     dword ptr -4[EBP],0
                 jne     L29
                 lea     EAX,-8[EBP]
                 push    EAX
                 push    1
                 push    2
                 call    near ptr __d_arraysetlength
                 add     ESP,0Ch
   L29:          cmp     dword ptr -8[EBP],3
                 jne     L3F
                 lea     ECX,-8[EBP]
                 push    ECX
                 push    1
                 push    4
                 call    near ptr __d_arraysetlength
                 add     ESP,0Ch
   L3F:          mov     EDX,-4[EBP]
                 or      EDX,-8[EBP]
                 je      L57
                 lea     EBX,-8[EBP]
                 push    EBX
                 push    1
                 push    5
                 call    near ptr __d_arraysetlength
                 add     ESP,0Ch
   L57:          pop     EBX
                 leave
                 ret
   __Dmain ends

 As you can see, the 8-byte struct is reserved in the local stack and
 references to array.ptr and array.length are direct accesses of the stack
 space and not dereferenced via a pointer.

I'll have to take your word for it, my assembler knowledge is non existant.

I'd call this an "optimisation", and a good one at that.

This does not refute the fact that the 'array' variable _behaves_ as a  
reference type, i.e is passed by reference, can have null assigned to it,  
can be used in "if (array is null)", can be assigned to another reference,  
and so on. Further, it's described in the docs as an "array reference". So  
despite the _implementation_ of it, it _behaves_ as a reference type(*).

(*)The only exception, the only thing in which it behaves like a struct is  
the fact that it cannot be null.

 Furthermore, 'if (array)' is
 equivalent to ...

   if (array.len == 0 || array.ptr == null)

Don't you mean:

if (array.len != 0 || array.ptr != null)

?
Does the assembler above show this?

This:
   if (array.len != 0 || array.ptr != null)

is in fact identical in effect/meaning to:
   if (array.ptr != null)

because length cannot be anything other than 0 when the data pointer is  
null, in other words this is impossible:
   if (array.ptr == null && length != 0) { //impossible }

note that:
   if (array.ptr != null && length == 0) { //not impossible }

 Of course. Which is why changing "if(array)" to test the length breaks
 logical consistency and is just plain wrong IMO.

 I'm not asking for it's behavior to be changed, just documented.

Sure. I can appreciate the desire to have things set down explicitly for  
reference.

Regan

Jul 20 2005

Derek Parnell <derek psych.ward> writes:

On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:


[snip]
 
 This does not refute the fact that the 'array' variable _behaves_ as a  
 reference type, ...

 i.e is passed by reference,

Well ... not always. If the function parameter is an 'in' type, then the
8-byte struct is passed to the function and not a reference to it. If the
parameter is either 'out' or 'inout' then the address of the 8-byte struct
is passed to the function.

 can have null assigned to it,  

This just sets the 8-bytes to zero.

 can be used in "if (array is null)", 

This is identical to 'if (array)' according to the generated machine code.

can be assigned to another reference,  

This just copies the source struct 8 bytes to the target struct's 8 bytes.

 and so on. Further, it's described in the docs as an "array reference". So  
 despite the _implementation_ of it, it _behaves_ as a reference type(*).
 
 (*)The only exception, the only thing in which it behaves like a struct is  
 the fact that it cannot be null.

Often there seems to be a confusion between the 'array' reference and the
reference to the data that 'array' owns.
 
-- 
Derek
Melbourne, Australia
21/07/2005 10:23:03 AM

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 10:45:40 +1000, Derek Parnell <derek psych.ward> wrote:
 This does not refute the fact that the 'array' variable _behaves_ as a
 reference type, ...

 i.e is passed by reference,

 Well ... not always. If the function parameter is an 'in' type, then the
 8-byte struct is passed to the function and not a reference to it. If the
 parameter is either 'out' or 'inout' then the address of the 8-byte  
 struct is passed to the function.

Cool. Optimisations.

 can have null assigned to it,

 This just sets the 8-bytes to zero.

Like opAssign for a normal struct could do.

 can be used in "if (array is null)",

 This is identical to 'if (array)' according to the generated machine  
 code.

Cool.

 can be assigned to another reference,

 This just copies the source struct 8 bytes to the target struct's 8  
 bytes.

And/or creates a new one (i.e. if slicing)

 and so on. Further, it's described in the docs as an "array reference".  
 So
 despite the _implementation_ of it, it _behaves_ as a reference type(*).

 (*)The only exception, the only thing in which it behaves like a struct  
 is the fact that it cannot be null.

 Often there seems to be a confusion between the 'array' reference and the
 reference to the data that 'array' owns.

Right. Thanks, this thread has been enlightening. I believe this statement  
accurately describes arrays.

"Array references _behave_ like references but are _implemented_ as stack  
based structs."

In other words treat it like a reference as that is what it's pretending  
to be. At the same time you get the performance of a stack based struct.  
This is yet more evidence as to why arrays are great.

In short, I still believe "if(array)" is doing it's job correctly (in  
effect, if not exactly - see changes below) . I don't believe people will  
commonly expect this statement to check the length of an array, nor do I  
think it should be illegal.

I believe Walter has tried to remove the distinction between a  
non-existant array and an empty one (going on the results you're shown  
here) but has failed in some areas, thankfully, because I still believe it  
is a useful distinction.

In fact I'd say he's got the implementation of arrays pretty much perfect,  
I would make the following changes:

  - change "if(array)" and "if(array is null)" to check the data pointer  
only (it's pointless checking length).
  - fix array.length = 0; so as it doesn't set the data pointer to null.

Regan

Jul 20 2005

Derek Parnell <derek psych.ward> writes:

On Thu, 21 Jul 2005 12:16:08 +1200, Regan Heath wrote:

 On Thu, 21 Jul 2005 10:02:49 +1000, Derek Parnell <derek psych.ward> wrote:

[snip]
 
 Furthermore, 'if (array)' is
 equivalent to ...

   if (array.len == 0 || array.ptr == null)

 
 Don't you mean:
 
 if (array.len != 0 || array.ptr != null)
 
 ?

Oops. Yes I got that wrong. Your code is right.
 
 Does the assembler above show this?

Yes.
     mov     EDX,-4[EBP] ; Put the ptr into DX register
     or      EDX,-8[EBP] ; OR the DX register with the length
     je      L57         ; jump if the result is zero


-- 
Derek
Melbourne, Australia
21/07/2005 10:50:30 AM

Jul 20 2005

AJG <AJG_member pathlink.com> writes:

Hi,

It does what it always does, for every type in D, it tests whether 'array'  
is null or 0.
A null array is a non-existant array, thus it tests for a non-existant  
array.

That's not exactly true. As you mentioned yourself, .length = 0 makes the
pointer null, yet isn't the array "existant?" This kind of implementation defect
should not be exposed in the language.

 I can't tell from the
 syntax. It is thus ambiguous.

Granted, it's not 'explicit'. However, the behaviour is well defined.

The only 'catch' in this case is that an array cannot be null. However,  
when an array would be null it's data pointer is null,

Isn't this a contradiction?

therefore testing the data pointer _is_ testing the array.

That's where I beg to differ. That's the source of ambiguity. To _you_ it may
seem like "testing the data pointer _is_ testing the array," but that's most
certainly not the only interpretation, and in fact I think it's a misleading
one.

Testing the array ptr is _just_ that, testing a pointer, some random block of
memory that just happens to be used by your array. It is unsemantic and unclear.
I am certain it will be misused by both the C camp and new programmers. This
behaviour is not even documented anywhere.

The problem once again is that in D, "testing the array" doesn't mean anything
outright because the array is always there. Technically if (array) should
_always_ return true. Therefore, I think it would be much more consistent to use
the .length property rather than .ptr for this implicit test, or ban the
implicit test.

Why is .length better?
1) It is much more semantic. It means in D what it would have meant in C.
2) It is a simple test for numerical emptiness. Nothing more, nothing less. No
memory involved. No philosophical questions about null/empty needed.
3) It is not prone to weird memory incongruences (e.g. an empty existant array)
or changes in the technical details of the implementation.
4) It is consistent: It works exactly the same with normal arrays, dynamic
arrays, static arrays, associative arrays, and even raw pointers (which map
directly to C's behaviour).

I think there is another non-ambiguous option now (C):
A) Make if (array) equal to if (array.length)
B) Make if (array) illegal.
C) Make if (array) always return true, since the array is always there.

I prefer A first, then B, then C as a last resort.
Thanks for listening.
--AJG.

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 14:29:13 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 It does what it always does, for every type in D, it tests whether  
 'array'
 is null or 0.
 A null array is a non-existant array, thus it tests for a non-existant
 array.

 That's not exactly true. As you mentioned yourself, .length = 0 makes the
 pointer null, yet isn't the array "existant?"

Not anymore, that is why this is a BUG.

 This kind of implementation defect
 should not be exposed in the language.

It is a BUG.

 I can't tell from the
 syntax. It is thus ambiguous.

 Granted, it's not 'explicit'. However, the behaviour is well defined.

 The only 'catch' in this case is that an array cannot be null. However,
 when an array would be null it's data pointer is null,

 Isn't this a contradiction?

No. We have 2 facts:

1. array _references_ are never null.
2. null arrays have null data pointers.

To be clear a "null array" is an array to which you have assigned null, or  
to which nothing has ever been assigned. It represents "non-existant".

 therefore testing the data pointer _is_ testing the array.

 That's where I beg to differ. That's the source of ambiguity. To _you_  
 it may seem like "testing the data pointer _is_ testing the array," but  
 that's most certainly not the only interpretation, and in fact I think  
 it's a misleading one.
 Testing the array ptr is _just_ that, testing a pointer, some random  
 block of memory that just happens to be used by your array. It is  
 unsemantic and unclear.

It's identical to the C code you posted which you said was semantic and  
clear.

The data pointer is the part of the array struct that mirrors the value  
the array reference would have were it not for the additional safety  
features they have i.e. can never be null.

Therefore in my opinion the data pointer _is_ the array.

 The problem once again is that in D, "testing the array" doesn't mean  
 anything outright because the array is always there.

You're confusing implementation with concept. Walter has chosen for the  
implementation to ensure the array _reference_ is never null, yet, it's  
still possible to assign 'null' to one, in order to represent a 'null  
array', when you do so it sets the data pointer to null.

If you ignore what you know about how an array works internally and just  
look at it from the point of view that it is another reference like any  
other then it's current behaviour is perfectly consistent with all other  
types. You can treat an array like any other class with a "length"  
member/property.

The added bonus with arrays is that:
  - they can be created on the fly implicitly.
  - you can never have a null reference to one.

Would you expect "if (x)" to call a member function of a class x?

 Technically if (array) should _always_ return true.

No, technically they should not, for if they did:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.

Then statement B would be incorrect, as "if (p)" would return TRUE and  
this would be inconsistent with other types in D.

 Therefore, I think it would be much more consistent

Less consistent, because then you would break this logic:

A. The expression "if (x)" compares the variable x with null or 0.
B. Given "char[] p = null;" then "if (p)" should be FALSE.
C. Given "char[] p = "";" then "if (p)" should be TRUE.

All 3 statements are correct and true for all pointer/reference types, and  
are also all correct and true for value types, except structs, if you  
replace the null and "" with appropriate values eg. 0 and 1

In short, if you set an array to null "if (array)" will be FALSE.
if you set an array to anything else "if (array)" will be TRUE.
if you change "if (array)" to test length you break that logic.

You'll also note that the statement "if (array is null)" is true for  
arrays to which you have assigned null, in short: although the array  
reference is not itself null it pretends to be in situations where it  
would be, were it not for the implementation ensuring it cannot be (for  
crash safety reasons).

 Why is .length better?
 1) It is much more semantic. It means in D what it would have meant in C.
 2) It is a simple test for numerical emptiness. Nothing more, nothing  
 less. No memory involved. No philosophical questions about null/empty  
 needed.
 3) It is not prone to weird memory incongruences (e.g. an empty existant  
 array) or changes in the technical details of the implementation.
 4) It is consistent: It works exactly the same with normal arrays,  
 dynamic arrays, static arrays, associative arrays, and even raw pointers  
 (which map directly to C's behaviour).

Why is .length wrong?

1. It makes the behaviour of "if (x)" inconsistent with other types.
2. It makes arrays inconsistent, "if (x)" no longer returns FALSE for an  
array to which you have assigned null.

In short it breaks the logical consistency of types.

 I think there is another non-ambiguous option now (C):
 A) Make if (array) equal to if (array.length)
 B) Make if (array) illegal.
 C) Make if (array) always return true, since the array is always there.

 I prefer A first, then B, then C as a last resort.

I prefer the current situation. The options above all break consistency.

Regan

Jul 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:1k0mwc3gtmj73.inn5n1oiajb5$.dlg 40tude.net...
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

 I don't.

 Does ...

  if (array) ...

 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.

  if (array.ptr == null) -- test for a non-existence.

  if (array.length == 0) -- test for emptiness

  if (array) -- test for which?

I can sympathize with the argument that it should be illegal to implicitly 
test 'array' but presumably we'd want to keep implicit conversion to the ptr 
in calls like
  void foo(char* p);
  foo(array);
That would mean 'array' is implicitly converted to ptr in some places but 
not everywhere and that seems like a slippery slope. It might be easier to 
just live with the current behavior. For example dlint can flag implicit 
array conditions.
Then again we already have 'if (x = y)' illegal so there is precendent for 
filtering conditions - the good-old 'value does not give boolean result' 
error.

Jul 20 2005

AJG <AJG_member pathlink.com> writes:

Hi,

I can sympathize with the argument that it should be illegal to implicitly 
test 'array' but presumably we'd want to keep implicit conversion to the ptr 
in calls like
  void foo(char* p);
  foo(array);
That would mean 'array' is implicitly converted to ptr in some places but 
not everywhere and that seems like a slippery slope.

I agree that this is something to think about. Of course, there is a fundamental
difference here. foo (char *) expects a pointer. if (array) expects a bool
(well, int, technically; another D annoyance). This is a clear distinction to
me, one that prevents the slippery slope.

It might be easier to just live with the current behavior.

That's just laziness speaking ;).

Then again we already have 'if (x = y)' illegal so there is precendent for 
filtering conditions - the good-old 'value does not give boolean result' 
error.

Yes! That's exactly what I was thinking. D even has its cake and eats it,
because (x = y) is still legal with an additional explict == true/false; this is
great. It allows you to do it yet prevents the common missing = mistake.

This is analogous to if (array). The pointer check can still be done via
array.ptr, but D would error out when using the ambiguous form. So there is
definitely precedent, and it's a good precendent.

Cheers,
--AJG.

Jul 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

It might be easier to just live with the current behavior.

 That's just laziness speaking ;).

Maybe "easier" isn't the right word :-)
The last time this topic came up one suggestion was to encourage explicit 
.length or .ptr conditions but to keep the current implicit conversions. For 
example the C++string vs D string page 
http://www.digitalmars.com/d/cppstrings.html was changed to test for empty 
as:
 if (!array.length) ...
It's in the section "Checking For Empty Strings". It used to just be "if 
(!array)", I think.


Then again we already have 'if (x = y)' illegal so there is precendent for
filtering conditions - the good-old 'value does not give boolean result'
error.

 Yes! That's exactly what I was thinking. D even has its cake and eats it,
 because (x = y) is still legal with an additional explict == true/false; 
 this is
 great. It allows you to do it yet prevents the common missing = mistake.

 This is analogous to if (array). The pointer check can still be done via
 array.ptr, but D would error out when using the ambiguous form. So there 
 is
 definitely precedent, and it's a good precendent.

In fact now that I think about the 'if (!array)' code if we made 'if 
(array)' illegal we'd also need a special check for 'if (!array)'. That's at 
least two more special cases for conditions.

Jul 21 2005

Ilya Minkov <minkov cs.tum.edu> writes:

My vote is against.

Derek Parnell schrieb:
 Does ...
 
   if (array) ...
 
 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.
 
   if (array.ptr == null) -- test for a non-existence.
 
   if (array.length == 0) -- test for emptiness
 
   if (array) -- test for which?

Making difference between an empty array and a nonexistent one is flaky, 
if not directly ambiguous, thus D does not do it, as far as i can 
remember the statement of Walter. Thus if(array) is not ambiguous.

And at all, arrays have somewhat pointer-like semantics in D, so it 
should stay, among other reasons. One of the reasons is that it seems 
familiar to C programmers and makes the foreach..else syntax suggestion 
from AJG very unnecessary.

-eye

Jul 20 2005

AJG <AJG_member pathlink.com> writes:

Hi,

Making difference between an empty array and a nonexistent one is flaky, 
if not directly ambiguous, thus D does not do it, as far as i can 
remember the statement of Walter. Thus if(array) is not ambiguous.

Hm... not only does this distinction exist, it is in fact _very_ much available
in D. That's exactly the point Regan has made in some past replies. I'm
indifferent towards this distinction, but Regan seems fond of it. Please look at
my examples further below.

And at all, arrays have somewhat pointer-like semantics in D.

No, the do not, IMHO. This is one of the points I've tried to make. Arrays have
completely different semantics in D compared to C. In D arrays are first-class
objects. They are handled via references, which can't be nulled, they keep their
own length, etc. I think this is a good thing. Very different from C.

One of the reasons is that it seems 
familiar to C programmers.

Indeed. It seems familiar, and people will misuse it because of that. But then
the boogieman comes and gets them in the form of a weird bug.

Examples of the incongruence (empty _but_ existant array):



// The statement will print.

// Let's try it again:


// The statement will *not* print.

// Think about strings:


// The statement will *not* print.

Is that last test not a reasonable thing to do? It seems pretty harmless. You
want to test for an empty string, an empty array. But you still get true.

But what about this:


// The statement will print.

Would you say the behaviour I showed above is consistent?
You don't find it a tad, say, ambiguous?
You don't think people will be confused? I certainly was.

makes the foreach..else syntax suggestion from AJG very unnecessary.

Huh? I don't see how the two things are related. You may have a valid point, but
I fail to see the connection.

Cheers,
--AJG.

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 00:04:56 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Making difference between an empty array and a nonexistent one is flaky,
 if not directly ambiguous, thus D does not do it, as far as i can
 remember the statement of Walter. Thus if(array) is not ambiguous.

 Hm... not only does this distinction exist, it is in fact _very_ much  
 available
 in D. That's exactly the point Regan has made in some past replies. I'm
 indifferent towards this distinction, but Regan seems fond of it. Please  
 look at
 my examples further below.

It's true.

 And at all, arrays have somewhat pointer-like semantics in D.

 No, the do not, IMHO. This is one of the points I've tried to make.  
 Arrays have
 completely different semantics in D compared to C. In D arrays are  
 first-class
 objects. They are handled via references, which can't be nulled, they  
 keep their
 own length, etc. I think this is a good thing. Very different from C.

The point I'm trying to make is that in D an array can be nulled, and it  
has meaning, eg.

char[] p = null;

you're confusing the _implementation_ of arrays with the _behaviour_ of  
arrays, the above array _referece_ behaves just like any other reference  
that has been nulled(*) eg.

if (p is null) { //true }

(*) the exception being that the _implementation_ protects you by ensuring  
the reference always refers to a valid object. The objects data pointer  
then mirrors the actual state of the array. In addition several  
optimisations go on in the background, removing the actual reference (as  
Derek has shown in another post) which makes sense as it's not ever null,  
thus not required for the _implementation_.

 One of the reasons is that it seems
 familiar to C programmers.

 Indeed. It seems familiar, and people will misuse it because of that.

How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is  
perfectly correct in all cases(*).

(*) except for the _BUG_ where you can write:

char[] p = "";
p.length = 0;
if (p) { //false, length = 0 resets the data pointer to null }

 But then
 the boogieman comes and gets them in the form of a weird bug.

 Examples of the incongruence (empty _but_ existant array):



 // The statement will print.

This is a static array. It's data pointer can never be null, thus it  
always exists.
(Nothing incongruous here)

 // Let's try it again:


 // The statement will *not* print.

Here you have not allocated any memory, thus nothing exists.
(Nothing incongruous here)

 // Think about strings:


 // The statement will *not* print.

Wrong, this statement will print (try it).

The reason it prints is that memory _is_ allocated because string  
constants are C compatible i.e. contain a null terminator. If this was not  
the case then this would act as the previous example.
(Nothing incongruous here)

 Is that last test not a reasonable thing to do? It seems pretty  
 harmless. You want to test for an empty string, an empty array. But you  
 still get true.

You're asking the wrong questions. The statement "if(x)" asks is x null or  
0, it does not ask "is this string longer than 0 characters" or "does this  
array contain more than 0 elements". The correct question is:

if (x.length > 0) {}

Just like most any other container class you care to name/try.

 But what about this:


 // The statement will print.

Wrong, it will not print. The array is null, nothing exists.
(Nothing incongruous here)

 Would you say the behaviour I showed above is consistent?

Yes.

 You don't find it a tad, say, ambiguous?

No.

 You don't think people will be confused? I certainly was.

That's because you're asking the wrong questions, and you didn't check  
your answers.

 makes the foreach..else syntax suggestion from AJG very unnecessary.

 Huh? I don't see how the two things are related. You may have a valid  
 point, but I fail to see the connection.

I'm not sure either. I suspect he's referring to foreach being usable on a  
null array equally well, i.e. you dont have to check whether it's a null  
array, it will iterate 0 times for both a null array and an emtpy array.

Regan

Jul 20 2005

AJG <AJG_member pathlink.com> writes:

Hi Regan,

In article <opst8meeo123k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Thu, 21 Jul 2005 00:04:56 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Making difference between an empty array and a nonexistent one is flaky,
 if not directly ambiguous, thus D does not do it, as far as i can
 remember the statement of Walter. Thus if(array) is not ambiguous.

 Hm... not only does this distinction exist, it is in fact _very_ much  
 available
 in D. That's exactly the point Regan has made in some past replies. I'm
 indifferent towards this distinction, but Regan seems fond of it. Please  
 look at
 my examples further below.

It's true.

Praise the lord, agreement. ;)

 And at all, arrays have somewhat pointer-like semantics in D.

 No, the do not, IMHO. This is one of the points I've tried to make.  
 Arrays have
 completely different semantics in D compared to C. In D arrays are  
 first-class
 objects. They are handled via references, which can't be nulled, they  
 keep their
 own length, etc. I think this is a good thing. Very different from C.

The point I'm trying to make is that in D an array can be nulled, and it  
has meaning, eg.

char[] p = null;

you're confusing the _implementation_ of arrays with the _behaviour_ of  
arrays, the above array _referece_ behaves just like any other reference  
that has been nulled(*) eg.

I'm well aware of the implementation vs. the behaviour. It just so happens the
two are married when it comes to the compiler. In fact, in the resulting
executable, they are indistinguishable. Confusion arises as a result.

 One of the reasons is that it seems
 familiar to C programmers.

 Indeed. It seems familiar, and people will misuse it because of that.

How? When you write "if(x)" you're asking is 'x' null or 0. D's answer is  
perfectly correct in all cases(*).

And except for static arrays. Oh, and strings, which must be compatible with C.
Since strings are a fairly important piece of the puzzle, I'd say this is
problematic.

(*) except for the _BUG_ where you can write:

char[] p = "";
p.length = 0;
if (p) { //false, length = 0 resets the data pointer to null }

Has Walter actually acknowledged this to be a bug? This seems more like what you
mentioned, a desire to make the distinction (empty/exist) dissapear. If that's
the case, then why would you say it's a bug? If anything, it could only get
worse.



 // The statement will print.

This is a static array. It's data pointer can never be null, thus it  
always exists.
(Nothing incongruous here)

My friend, that's the very definition of an incongruence. It means static arrays
do not follow the same principles as other kinds (just like strings).



I even went ahead and _assigned_ an empty array (int[0]) to the reference, and
yet it remains _non_ existant. How do you explain that? You can't have a dynamic
array that is empty and non-existant, but you _can_ have a static one? (or at
least, not via the initializer?)

Let's analyze this carefully, and you will definitely see an incongruence:




if (A) // this is false.
if (B) // this is false.

Since false == false, then A == B, and therefore null == int[0]. The very
distinction you are so fond of is gone! So in this case empty == non-existant,
but all over the place it isn't? _That's_ an incongruence.

 // Let's try it again:


 // The statement will *not* print.

Here you have not allocated any memory, thus nothing exists.
(Nothing incongruous here)

Oh, so then it's purely about memory? How very semantic. Nevermind the fact that
int[0] means an empty array. The distinction is lost, as shown above. IMHO
there's no way around this one.

 // Think about strings:


 // The statement will *not* print.

Wrong, this statement will print (try it).

The reason it prints is that memory _is_ allocated because string  
constants are C compatible i.e. contain a null terminator. If this was not  
the case then this would act as the previous example.

"If this was not the case". That's fine, but it happens to _be_ the case.
Therefore the docs should state: "There is an incongruence when it comes to
string literals. Because we want them to be compatible with C, it means an empty
string is not really empty. In other words, what should have been an empty array
is really not. Careful, folks!" 

 But what about this:


 // The statement will print.

Wrong, it will not print. The array is null, nothing exists.
(Nothing incongruous here)

 Would you say the behaviour I showed above is consistent?


If you agree with the previous statements, you'll concur that the behaviour is
not consistent. It calls for exceptions to be made and explained. Once more
gratuitously: static vs. dynamic, and string literals, and the .length "bug,"
and the dynamic initializer problem.

 You don't find it a tad, say, ambiguous?


If you at least agree it's inconsistent, then we are getting somewhere. The
ambiguity results in not knowing when which is going to happen. Since there is
no documentation on this, the problem is only aggravated.

 You don't think people will be confused? I certainly was.

That's because you're asking the wrong questions, and you didn't check  
your answers.

I did check my answers, and now I know. I made the mistake, and by _chance_ one
case didn't work early on, so I started looking under the hood. But how many
people will go to their graves with bugs like that still coded? How many bugs
like that exist as we speak? Remember, for _most_ cases, it will not show up.

Tell me this, do you agree with this statement:
People (mistakedly) may use if (array) to test for the emptiness of an array.
What about this:
Moreover, this test will work most of the time.
And finally:
The remaining times, they are bugs.

My proposal aims to prevent those bugs. 

 makes the foreach..else syntax suggestion from AJG very unnecessary.

 Huh? I don't see how the two things are related. You may have a valid  
 point, but I fail to see the connection.

I'm not sure either. I suspect he's referring to foreach being usable on a  
null array equally well, i.e. you dont have to check whether it's a null  
array, it will iterate 0 times for both a null array and an emtpy array.

If this is true, Ilya, that was never the intention of my suggestion. I know
that foreach is "safe" even with "null" arrays. The suggestion is a way to deal
with the no-items case elegantly without using a separate if statement every
single time. As a matter of fact, no-items happens quite a bit IMHO.

Thanks for reading,
--AJG.

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 02:18:27 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Hm... not only does this distinction exist, it is in fact _very_ much
 available
 in D. That's exactly the point Regan has made in some past replies. I'm
 indifferent towards this distinction, but Regan seems fond of it.  
 Please
 look at
 my examples further below.

 It's true.

 Praise the lord, agreement. ;)

We're both men of "distinction" ;)

 you're confusing the _implementation_ of arrays with the _behaviour_ of
 arrays, the above array _referece_ behaves just like any other reference
 that has been nulled(*) eg.

 I'm well aware of the implementation vs. the behaviour.It just so  
 happens the two are married when it comes to the compiler. In fact, in  
 the resulting
 executable, they are indistinguishable. Confusion arises as a result.

Sorry, I don't see your point. The compiler isn't confused, neither am I.  
Arrays are references, treat them as such and there is no confusion.

 One of the reasons is that it seems
 familiar to C programmers.

 Indeed. It seems familiar, and people will misuse it because of that.

 How? When you write "if(x)" you're asking is 'x' null or 0. D's answer  
 is perfectly correct in all cases(*).

 And except for static arrays.

No, this is no exception to the rule.

Yes, static arrays are different to dynamic ones, no surprises there. Yes,  
static arrays cannot have a null data pointer, no, it makes no difference  
to the behaviour of "if(x)", nor should it.

static arrays are the same as dynamic ones that _exist_, this makes  
perfect sense as static arrays always exist.

 Oh, and strings, which must be compatible with C.

Again, there is no exception to the rule here.
"bob" is a static string, it cannot be null.
"" is a static string, it cannot be null.

Yes, the last example has no items, i.e. has a 0 length, but it still  
_exists_.

If Walter decided to remove the trailing null and make it incompatible  
with C then it could be optimised away, i.e. the compiler could decide ""  
was meaningless and so could remove it, making it non existant. In that  
case it wouldn't exist. Otherwise it does. As long as it exists it has a  
non-null data pointer. The length is meaningless when talking about  
existance.

 (*) except for the _BUG_ where you can write:

 char[] p = "";
 p.length = 0;
 if (p) { //false, length = 0 resets the data pointer to null }

 Has Walter actually acknowledged this to be a bug?

In short, no. But then he isn't known for his verbosity on many matters.  
He just percolates and out pops a new compiler possibly with a changes we  
talk about.

 This seems more like what you mentioned, a desire to make the  
 distinction (empty/exist) dissapear.

I believe that was the original intent.

 If that's the case, then why would you say it's a bug?

In this case my impression is that the real intent was to remove the seg-v  
problems associated with null strings, remove the need to check for null  
all the time, etc. That has been achieved, what is great is that at the  
same time we can preseve the distinction if we so choose (it takes so very  
little to do this, from the current state)

 If anything, it could only get worse.

Oh ye of little faith!



 // The statement will print.

 This is a static array. It's data pointer can never be null, thus it
 always exists.
 (Nothing incongruous here)

 My friend, that's the very definition of an incongruence.

Whose definition?
   http://dictionary.reference.com/search?q=incongruous

The closest/best definition for this situation appears to be:
   "Not in keeping with what is correct, proper, or logical; inappropriate:  
incongruous behavior"

 It means static arrays do not follow the same principles as other kinds  
 (just like strings).

What "principles" are you referring to?




 I even went ahead and _assigned_ an empty array (int[0]) to the  
 reference, and yet it remains _non_ existant. How do you explain that?  
 You can't have a dynamic array that is empty and non-existant, but you  
 _can_ have a static one? (or at least, not via the initializer?)

Aha! This is a new (good) example. I agree in this example shows  
"incongruous behaviour".

I would suggest that "int[0] s;" be an error, as it's pretty meaningless..  
Except template programmers would likely be a little annoyed with that.

I would suggest that "int[0] s;" have a null data pointer (as the dynamic  
one does).. But I believe they're implemented in such a way that there is  
no such data pointer.

There seems to be no simple solution to this problem, perhaps Walter has  
an idea. I'll post to the bugs NG.

 Let's analyze this carefully, and you will definitely see an  
 incongruence:




 if (A) // this is false.
 if (B) // this is false.

 Since false == false, then A == B, and therefore null == int[0]. The very
 distinction you are so fond of is gone!

Not true.

I suspect "new int[0]" allocates no memory, therefore it _is_ null.
This is different to C/C++ which can and do allocate a zero-length item in  
the heap.

This could be a solution to the problem above, if "new int[0]" allocated a  
zero length item on the heap it would be consistent with the static array  
case.

 // Let's try it again:


 // The statement will *not* print.

 Here you have not allocated any memory, thus nothing exists.
 (Nothing incongruous here)

 Oh, so then it's purely about memory?

In essence, yes. If no memory is allocated it doesn't exist. Exactly like  
your own C example earlier.

 How very semantic. Nevermind the fact that int[0] means an empty array.

"new int[0]" means allocate an array of 0 int's. 0 * int.sizeof == 0. In  
other words allocate 0 bytes. I suspect a shortcut is being done where it  
does no allocation when you ask for 0 bytes. I think perhaps it should  
allocate a zero-length item on the heap instead.

 The distinction is lost, as shown above. IMHO there's no way around this  
 one.

Sure, there is 1 problem in the static array vs dynamic array example.
Lets hope Walter agrees and has/likes the solution.

 // Think about strings:


 // The statement will *not* print.

 Wrong, this statement will print (try it).

 The reason it prints is that memory _is_ allocated because string
 constants are C compatible i.e. contain a null terminator. If this was  
 not the case then this would act as the previous example.

 "If this was not the case". That's fine, but it happens to _be_ the case.
 Therefore the docs should state: "There is an incongruence when it comes  
 to string literals. Because we want them to be compatible with C, it  
 means an empty string is not really empty.

It depends how you want to look at it. When I type "" I'm saying here  
exists a string containing nothing. In other words, it _exists_ but  
contains _nothing_ it's the very definition of a non-null data pointer  
with a 0 length.

 In other words, what should have been an empty array is really not.  
 Careful, folks!"

It _is_ empty, it's length is 0. The trailing \0 is effectively outside  
the length of the array, it exists past the end.

 But what about this:


 // The statement will print.

 Wrong, it will not print. The array is null, nothing exists.
 (Nothing incongruous here)

 Would you say the behaviour I showed above is consistent?


 If you agree with the previous statements, you'll concur that the  
 behaviour is not consistent. It calls for exceptions to be made and  
 explained.

As I said above, there are no exceptions in the rule for "if(x)". It  
simply and always checks the variable 'x' against null or 0. Nothing more,  
nothing less. You do however need to understand what other statements like  
the "new int[0]" do, in order to understand how they relate to "if(x)".  
That doesn't mean there is anything wrong with "if(x)".

 Once more gratuitously: static vs. dynamic, and string literals, and the  
 .length "bug," and the dynamic initializer problem.

Summary:
I agree there is a problem with static vs dynamic above.
I don't agree that there is anything wrong with the behaviour of "if(x)".

 You don't find it a tad, say, ambiguous?


 If you at least agree it's inconsistent, then we are getting somewhere.

The static vs dynamic example above shows inconsistency.

 The ambiguity results in not knowing when which is going to happen.

Specifically with statments like "new int[0]" and "int[0] a" and what  
exactly _they_ do.

 You don't think people will be confused? I certainly was.

 That's because you're asking the wrong questions, and you didn't check
 your answers.

 I did check my answers, and now I know.

Yeah, I didn't see your post correcting it till after I wrote this.

 I made the mistake, and by _chance_ one case didn't work early on, so I  
 started looking under the hood. But how many people will go to their  
 graves with bugs like that still coded? How many bugs like that exist as  
 we speak? Remember, for _most_ cases, it will not show up.

 Tell me this, do you agree with this statement:
 People (mistakedly) may use if (array) to test for the emptiness of an  
 array.

No. My reasoning:

1. Most container classes use a length or size member for this. I haven't  
seen a single container class/object/thing in any language that lets you  
check the length or size of an object using "if(x)".

2. The statement "if(x)" is well know to mean check x vs null or 0. If you  
assume an array is a struct you're writing something meaningless. If you  
assume an array is a reference you're comparing the reference to null or  
0. I cannot see how you would ever think it would silently call ther  
length member of x.

 What about this:
 Moreover, this test will work most of the time.

Sure. Most of the time you'll have an array with items, thus the data  
pointer will be non-null.

 And finally:
 The remaining times, they are bugs.

Yes. Assuming: you wrote "if(x)" and meant to check for length>0 then in  
the case of a non-null data pointer and a 0 length it would execute the  
code you had written for arrays with a length greater than 0.

 My proposal aims to prevent those bugs.

Sure, only you want to do it in such a way as to break existing code  
relying on "if(x)". You want to introduce inconsistent behaviour (making  
arrays behave differently to all other types in D). And lastly the bugs  
you're referring to are, IMO, unlikely to occur.

Essentially you have to generate a zero length non-null array. The 3 ways  
I know of doing this are:

char[0] p;            //1
char[] p = "";        //2

char[] tmp = "abc";
char[] p = tmp[0..0]; //3

You'd have to (incorrectly) attempt to compare the length of an array with  
"if(p)" and the outcome would have to be wrong in a subtle way for this to  
be a serious problem, a blatant bug is easy to find and you quickly learn  
not to use "if(p)" to check for length.

Most cases I can imagine the non-null zero length array causes no  
problems, because as Ilya mentioned things like "foreach" treat them the  
same. This is part of the "treat them the same" that was Walters initial  
goal and is achieved mostly by array references never being null.

In short, I like it how it is, I can't see a significant problem, and I  
totally dislike your suggested solution. But, like you say thanks for  
listening to my point of view, it's been fun. (I think we've exhausted our  
ideas and I don't think we're agreeing)

Regan.

Jul 20 2005

AJG <AJG_member pathlink.com> writes:

Hi,

 Please
 look at
 my examples further below.

 It's true.

 Praise the lord, agreement. ;)

We're both men of "distinction" ;)

Hehehe. I'll in requiring your testimony in court one day.

In short, I like it how it is, I can't see a significant problem, and I  
totally dislike your suggested solution. But, like you say thanks for  
listening to my point of view, it's been fun. (I think we've exhausted our  
ideas and I don't think we're agreeing)

Yes, I suppose we can agree to disagree.

One last couple of things I'd like to clarify, though: My idea is not
necessarily to make if (array) check length automatically. This is just one of
the three I mentioned. My general suggestion is to improve/clarify and document
the behaviour of the construct because I find it dangerous and leading to the
subtle bugs I mentioned.

You agreed that the bugs can at least happen. It'd be great to know how common
they could appear; alas, this wouldn't be easy. However, in all honesty, bugs
arising from using assignment as a boolean (if (x = y)) haven't happened to me
very much. Maybe once or twice (in years). Yet the construct was made partially
illegal, requiring a more explicit version. That's fine with me. It helps
prevent those subtle (if seldom) bugs.

In addition, IIRC, nowhere on the D site proper is there a mention of what the
correct behaviour is supposed to be. I have a feeling Walter left this construct
a little unfinished with regards to arrays. Maybe he's working on the empty/null
distinction thing and then he will revise it. Anyway, as I've said the lack of
documentation doesn't help.

And finally: Could you give me a concrete example of a useful application of if
(array) to test for the array pointer's nullness? Say, in a complete function? I
simply don't think dealing with ptrs (or checking them) should be necessary in D
except for C-compat. But perhaps you have a really good use for this construct
that I haven't considered.

Thanks,
--AJG.

Jul 21 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 13:35:36 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 And finally: Could you give me a concrete example of a useful  
 application of if (array) to test for the array pointer's nullness? Say,  
 in a complete function? I simply don't think dealing with ptrs (or  
 checking them) should be necessary in D except for C-compat. But perhaps  
 you have a really good use for this construct that I haven't considered.

Template programming is an example of where we rely on the logical  
consistency of types to achieve generic things, see:

import std.stdio;

class A
{
	char[] toString()
	{
		return "A";
	}
}

template doWrite(Type)
{
	void doWrite(Type p)
	{
		if (p) writef(p);
	}
}

alias doWrite!(A) doWriteA;
alias doWrite!(char[]) doWriteC;

void main()
{
	char[] a = "this is an ";
	
	doWriteC(null);
	doWriteC(a);
	doWriteA(null);
	doWriteA(new A());
}

Essentially anywhere you expect consistent behaviour of references (string  
or otherwise) and want to test the reference is not null, i.e.  
non-existant.

Regan

Jul 21 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 20 Jul 2005 23:42:49 +0200, Ilya Minkov wrote:

 My vote is against.
 
 Derek Parnell schrieb:
 Does ...
 
   if (array) ...
 
 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.
 
   if (array.ptr == null) -- test for a non-existence.
 
   if (array.length == 0) -- test for emptiness
 
   if (array) -- test for which?

 
 Making difference between an empty array and a nonexistent one is flaky, 
 if not directly ambiguous, thus D does not do it, as far as i can 
 remember the statement of Walter. Thus if(array) is not ambiguous.

Maybe in your world, but not in mine.

I have a glass of water. The glass exists and it is not empty. I drink the
water. The glass exists and it is empty. I smash the glass. The glass does
not exist and it is neither full nor empty because it doesn't exist.

To repeat: Existence and Emptiness are not the same concept.

And as I've just discovered, 'if (array)' test both the .ptr and the
.length properties of the array variable. 

-- 
Derek
Melbourne, Australia
21/07/2005 10:10:34 AM

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 10:17:02 +1000, Derek Parnell <derek psych.ward> wrote:
 On Wed, 20 Jul 2005 23:42:49 +0200, Ilya Minkov wrote:

 My vote is against.

 Derek Parnell schrieb:
 Does ...

   if (array) ...

 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.

   if (array.ptr == null) -- test for a non-existence.

   if (array.length == 0) -- test for emptiness

   if (array) -- test for which?

 Making difference between an empty array and a nonexistent one is flaky,
 if not directly ambiguous, thus D does not do it, as far as i can
 remember the statement of Walter. Thus if(array) is not ambiguous.

 Maybe in your world, but not in mine.

 I have a glass of water. The glass exists and it is not empty. I drink  
 the
 water. The glass exists and it is empty. I smash the glass. The glass  
 does
 not exist and it is neither full nor empty because it doesn't exist.

 To repeat: Existence and Emptiness are not the same concept.

You know I agree. ;)

 And as I've just discovered, 'if (array)' test both the .ptr and the
 .length properties of the array variable.

Which is pointless because when the array pointer is null the length  
cannot be anything but 0.

Regan

Jul 20 2005

Ilya Minkov <minkov cs.tum.edu> writes:

Derek Parnell schrieb:
Making difference between an empty array and a nonexistent one is flaky, 
if not directly ambiguous, thus D does not do it, as far as i can 
remember the statement of Walter. Thus if(array) is not ambiguous.

 
 Maybe in your world, but not in mine.

[...]

 To repeat: Existence and Emptiness are not the same concept.

The matter of discussion is not your or my view of the real world, nor 
some other programming languages' realm. The matter is how arrays are 
implemented, or should be implemented in D. Considering that D relies on 
garbage collection heaily with arrays anyway, the construct of an empty, 
but existant array is unnecessary.

I believe that making this distinction, between empty and non-existent 
arrays, just provides the possibility for another misconception and bug.

If someone sees real technical necessity to be able to distinguish 
between the empty and the non-existing one, is invited to show it here.

-eye

Jul 22 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Derek Parnell schrieb:
 Making difference between an empty array and a nonexistent one is  
 flaky, if not directly ambiguous, thus D does not do it, as far as i  
 can remember the statement of Walter. Thus if(array) is not ambiguous.

  Maybe in your world, but not in mine.

 [...]

 To repeat: Existence and Emptiness are not the same concept.

 The matter of discussion is not your or my view of the real world, nor  
 some other programming languages' realm. The matter is how arrays are  
 implemented, or should be implemented in D.

Sure, however D exists in the real world. Programmers solve real world  
problems. IMO arrays should be implemented in D in a manner that best  
allows us to do that.

 Considering that D relies on garbage collection heaily with arrays  
 anyway, the construct of an empty, but existant array is unnecessary.

I don't see your point. The concept of existance, non-existance, empty,  
not-empty still exists with garbage collection as much as any other memory  
management sceme. Garbage collection does not obviate the need to express  
non-existance, exists but empty, exists and not empty.

 I believe that making this distinction, between empty and non-existent  
 arrays, just provides the possibility for another misconception and bug.

You're correct in one respect, having the ability to express more i.e.  
non-existance, exists but empty, exists and not empty adds complexity  
increasing the chance that someone will mistakenly use one when they mean  
the other.

However, as a concrete example a very common bug in C/C++ is referencing a  
null pointer (a pointer is a good example of a type which can represent  
non-existance, exists but empty, exists and not empty).

Arrays in D do not share this problem, the array reference cannot be null.  
At the same time, the current array implementation retains the  
expressiveness that allows you to represent non-existance, exists but  
empty, exists and not empty.

My point is that D's arrays have the expressiveness without the  
complexity, you can ignore the non-existance case unless you want/need to  
consider it.

 If someone sees real technical necessity to be able to distinguish  
 between the empty and the non-existing one, is invited to show it here.

I'm not sure there is a "necessity" as in most cases you could probably  
"work around" the restriction (if it was added to D). Here is an example  
where the expressiveness of representing non-existance, exists but empty,  
exists and not empty is useful.

This comment was posted to the DMDScript NG recently:

<quote>
For example, might it not be useful to return 'null' on EOF, thus allowing
this sort of construct:

     var line = readln();

     while (line != null)
     {
          ...
          line = readln();
     }
</quote>

Of course you could implement this in another way, removing the need for  
the ability to represent non-existance. You would have to if your type  
couldn't represent non-existance, that is the price you pay for  
simplicity. The current price paid for the current array's expressiveness  
is very little IMO.

Regan

Jul 23 2005

Ilya Minkov <minkov cs.tum.edu> writes:

Regan Heath schrieb:
 Considering that D relies on garbage collection heaily with arrays  
 anyway, the construct of an empty, but existant array is unnecessary.

 
 I don't see your point. The concept of existance, non-existance, empty,  
 not-empty still exists with garbage collection as much as any other 
 memory  management sceme. Garbage collection does not obviate the need 
 to express  non-existance, exists but empty, exists and not empty.

In C it was extremely important, and one had to keep one's eye on 
uniqueness. At every allocation, one was to think about how to "anchor" 
and where to free this value, and not forget to implement freeing. 
Naturally, C++ automated this process somewhat. In C, the non-existance 
versus emtyness was sometimes very important.

 I believe that making this distinction, between empty and 
 non-existent  arrays, just provides the possibility for another 
 misconception and bug.

 
 You're correct in one respect, having the ability to express more i.e.  
 non-existance, exists but empty, exists and not empty adds complexity  
 increasing the chance that someone will mistakenly use one when they 
 mean  the other.
 
 However, as a concrete example a very common bug in C/C++ is referencing 
 a  null pointer (a pointer is a good example of a type which can 
 represent  non-existance, exists but empty, exists and not empty).

There is a problem with "exists but empty". What does malloc do when you 
request 0 bytes? As far as i can remember, the standard allows 2 
options: the implementation can return NULL, or it could return a tiny 
region of memory - still not "nothing". What will it contain? My bet 
would be "uninitialized space". This is garbage which was in the memory 
before it was allocated, and might be zero, or might be anything else.

So, in C there is no other way than to embed the information on the 
non-existance into your data structure. In the case of strings, this is 
a string having '\0' character at the very beginning.

One could suggest to preallocate one data structure which will be stored 
globally as "the empty singleton", and when one wants to distinguish, do 
a pointer comparison, similarly to the null handling. However, in C it 
might be bad for finding a memory management solution (as we in fact 
deal not with a special inaccessible adress in memory, but a living 
object), while in D the solution is, apart from special cases, simply to 
copy and forget, and make the GC do the dirty work.

 Arrays in D do not share this problem, the array reference cannot be 
 null.  At the same time, the current array implementation retains the  
 expressiveness that allows you to represent non-existance, exists but  
 empty, exists and not empty.

What do you mean by can't be null?

 If someone sees real technical necessity to be able to distinguish  
 between the empty and the non-existing one, is invited to show it here.

 
 I'm not sure there is a "necessity" as in most cases you could probably  
 "work around" the restriction (if it was added to D). Here is an 
 example  where the expressiveness of representing non-existance, exists 
 but empty,  exists and not empty is useful.

Necessity is a fuzzy value which is probably best destinguished by the 
heavyness of workaround.

 This comment was posted to the DMDScript NG recently:
 
 <quote>
 For example, might it not be useful to return 'null' on EOF, thus allowing
 this sort of construct:
 
     var line = readln();
 
     while (line != null)
     {
          ...
          line = readln();
     }
 </quote>

As above, i think preallocated EOL line would do, as long as array 
comparison (done on pointer and length) is a simple operation.

 Of course you could implement this in another way, removing the need 
 for  the ability to represent non-existance. You would have to if your 
 type  couldn't represent non-existance, that is the price you pay for  
 simplicity. The current price paid for the current array's 
 expressiveness  is very little IMO.

Ok, given we still have the ability to manipulate the pointer and the 
length separately, how should array conversion to boolean condition be 
defined then? Should it query the pointer, the length, or some 
combination of both? If length is zero, one obviously cannot iterate 
over it. If pointer is null, the length should be invariably zero?

-eye

Jul 23 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sat, 23 Jul 2005 22:13:24 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Regan Heath schrieb:
 Considering that D relies on garbage collection heaily with arrays   
 anyway, the construct of an empty, but existant array is unnecessary.

  I don't see your point. The concept of existance, non-existance,  
 empty,  not-empty still exists with garbage collection as much as any  
 other memory  management sceme. Garbage collection does not obviate the  
 need to express  non-existance, exists but empty, exists and not empty.

 In C it was extremely important, and one had to keep one's eye on  
 uniqueness. At every allocation, one was to think about how to "anchor"  
 and where to free this value, and not forget to implement freeing.  
 Naturally, C++ automated this process somewhat. In C, the non-existance  
 versus emtyness was sometimes very important.

Sure, memory management makes things complicated. But, uniqueness has  
nothing to do with non-existance. The fact that non-existance is typically  
represented by null is the same regardless of memory management model.

 I believe that making this distinction, between empty and  
 non-existent  arrays, just provides the possibility for another  
 misconception and bug.

  You're correct in one respect, having the ability to express more  
 i.e.  non-existance, exists but empty, exists and not empty adds  
 complexity  increasing the chance that someone will mistakenly use one  
 when they mean  the other.
  However, as a concrete example a very common bug in C/C++ is  
 referencing a  null pointer (a pointer is a good example of a type  
 which can represent  non-existance, exists but empty, exists and not  
 empty).

 There is a problem with "exists but empty". What does malloc do when you  
 request 0 bytes?

Allocates a zero length item on the heap. (I checked this recently).

 As far as i can remember, the standard allows 2 options: the  
 implementation can return NULL, or it could return a tiny region of  
 memory - still not "nothing". What will it contain? My bet would be  
 "uninitialized space". This is garbage which was in the memory before it  
 was allocated, and might be zero, or might be anything else.

 So, in C there is no other way than to embed the information on the  
 non-existance into your data structure.

No, you simply use null. A non-existant string in C is a null pointer. An  
empty string in C is a non-null pointer which contains a \0 as the first  
character. The same applies to any other object. A null pointer indicates  
non-existance, and emptiness is represented in whatever fashion makes  
sense for the object i.e. a length property set to 0.

 In the case of strings, this is a string having '\0' character at the  
 very beginning.

No, that is an "empty" string, not a "non-existant" one.

 One could suggest to preallocate one data structure which will be stored  
 globally as "the empty singleton", and when one wants to distinguish, do  
 a pointer comparison, similarly to the null handling. However, in C it  
 might be bad for finding a memory management solution (as we in fact  
 deal not with a special inaccessible adress in memory, but a living  
 object), while in D the solution is, apart from special cases, simply to  
 copy and forget, and make the GC do the dirty work.

None of this is necessary.

 Arrays in D do not share this problem, the array reference cannot be  
 null.  At the same time, the current array implementation retains the   
 expressiveness that allows you to represent non-existance, exists but   
 empty, exists and not empty.

 What do you mean by can't be null?

char[] p = null;
if (p.length == 0) { //does not crash, p itself is never 'null' }

 If someone sees real technical necessity to be able to distinguish   
 between the empty and the non-existing one, is invited to show it here.

  I'm not sure there is a "necessity" as in most cases you could  
 probably  "work around" the restriction (if it was added to D). Here is  
 an example  where the expressiveness of representing non-existance,  
 exists but empty,  exists and not empty is useful.

 Necessity is a fuzzy value which is probably best destinguished by the  
 heavyness of workaround.

Exactly. However the other thing to consider is the price paid for it, if  
that price is smaller than the cost (as I believe it is in this case) then  
it is a point in it's favour. You then factor in all the other issues,  
complexity of implementation, etc.

 This comment was posted to the DMDScript NG recently:
  <quote>
 For example, might it not be useful to return 'null' on EOF, thus  
 allowing
 this sort of construct:
      var line = readln();
      while (line != null)
     {
          ...
          line = readln();
     }
 </quote>

 As above, i think preallocated EOL line would do, as long as array  
 comparison (done on pointer and length) is a simple operation.

 Of course you could implement this in another way, removing the need  
 for  the ability to represent non-existance. You would have to if your  
 type  couldn't represent non-existance, that is the price you pay for   
 simplicity. The current price paid for the current array's  
 expressiveness  is very little IMO.

 Ok, given we still have the ability to manipulate the pointer and the  
 length separately, how should array conversion to boolean condition be  
 defined then?

The same way it works for every other type in D, the statement "if(x)"  
means "compare x to null or 0". In the case of a reference it compares the  
reference to null.

The confusion arises in this case because arrays in D cannot be null, and  
because arrays are in fact implemented as stack based structs in the  
background. This makes arrays appear to be a struct and not a reference,  
however currently in all (I believe) situations they behave as references.  
I believe this was done on purpose.

As I've noted in all cases where an array reference would be null, i.e.

char[] p = null;

it isn't, but instead the data pointer p.ptr is null.

So, in order for them to behave as references it's logically consistent  
for "if(p)" to check the data ptr vs null. Change that and you need to  
code special cases for arrays vs other reference types, eg.

template doWrite(Type) { void doWrite(Type p) {
   if (p) writefln(p);
}

class C {
   char[] toString() { return "C"; }
}

char[] p = "test";
C c = new C();

doWrite!(char[])(p);
doWrite!(char[])(c);

 Should it query the pointer, the length, or some combination of both?

The ptr, for reasons given above. Checking both is a waste of time as when  
the pointer is null the length must be 0 (as you say below).

 If length is zero, one obviously cannot iterate over it.

Correct. One cannot iterate over an empty or a non-existant array.

 If pointer is null, the length should be invariably zero?

Indeed. It is currently.

Regan

Jul 23 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 11:55:35 +1200, Regan Heath <regan netwin.co.nz> wrote:
 So, in order for them to behave as references it's logically consistent  
 for "if(p)" to check the data ptr vs null. Change that and you need to  
 code special cases for arrays vs other reference types, eg.

 template doWrite(Type) { void doWrite(Type p) {
    if (p) writefln(p);
 }

 class C {
    char[] toString() { return "C"; }
 }

 char[] p = "test";
 C c = new C();

 doWrite!(char[])(p);

TYPO:

 doWrite!(char[])(c);

  Should be:

doWrite!(C)(c);

Regan

Jul 23 2005

Holger <Holger_member pathlink.com> writes:

In article <opsud4qxii23k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Sat, 23 Jul 2005 22:13:24 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 So, in C there is no other way than to embed the information on the  
 non-existance into your data structure.

No, you simply use null. A non-existant string in C is a null pointer. An  
empty string in C is a non-null pointer which contains a \0 as the first  
character. The same applies to any other object. A null pointer indicates  
non-existance, and emptiness is represented in whatever fashion makes  
sense for the object i.e. a length property set to 0.

 In the case of strings, this is a string having '\0' character at the  
 very beginning.

No, that is an "empty" string, not a "non-existant" one.

Hi Regan, you're of course spot-on.
It's not the first time that someone expressed misguided perceptions of "not
existant" vs "empty" in the C language here. I really wonder how often we'll
need to discuss such basic C-isms on the D NG? People should better learn their
stuff before making such bold statements.

Cheers,
Holger

Jul 23 2005

Ilya Minkov <minkov cs.tum.edu> writes:

Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of "not
 existant" vs "empty" in the C language here. I really wonder how often we'll
 need to discuss such basic C-isms on the D NG? People should better learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read 
through the beginnings of the preceding newsgroup, know me, know that, 
although he pretty much popped up shortly before i disapperared. And, i 
really wonder why anyone would have to listen to someone who neither 
leaves his complete name nor a real e-mail adress, who can just drop a 
bomb a disappear.

You are free to google for my name, it is not that common, and make 
yourself a picture.

-i.

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 20:21:30 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of  
 "not
 existant" vs "empty" in the C language here. I really wonder how often  
 we'll
 need to discuss such basic C-isms on the D NG? People should better  
 learn their
 stuff before making such bold statements.

 Misconceptions? That was a typo and Regan could, if he cared to read  
 through the beginnings of the preceding newsgroup, know me

I do, in fact, "know you" well enough to have thought as I typed my reply  
that you must have simply made a mistake. Nor, did I make the above  
statements, so, I really have no idea why you'd react in such a way  
towards *me*?

 , know that, although he pretty much popped up shortly before i  
 disapperared. And, i really wonder why anyone would have to listen to  
 someone who neither leaves his complete name nor a real e-mail adress,  
 who can just drop a bomb a disappear.

Are you referring to me? I'm using my complete name and my real email  
address.

 You are free to google for my name, it is not that common, and make  
 yourself a picture.

You can google mine as well 7 of the top 10 are me.

Regan

Jul 24 2005

Holger <Holger_member pathlink.com> writes:

In article <opsufqj2rn23k2f5 nrage.netwin.co.nz>, Regan Heath says...
On Sun, 24 Jul 2005 20:21:30 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of  
 "not
 existant" vs "empty" in the C language here. I really wonder how often  
 we'll
 need to discuss such basic C-isms on the D NG? People should better  
 learn their
 stuff before making such bold statements.

 Misconceptions? That was a typo and Regan could, if he cared to read  
 through the beginnings of the preceding newsgroup, know me

I do, in fact, "know you" well enough to have thought as I typed my reply  
that you must have simply made a mistake. Nor, did I make the above  
statements, so, I really have no idea why you'd react in such a way  
towards *me*?

 , know that, although he pretty much popped up shortly before i  
 disapperared. And, i really wonder why anyone would have to listen to  
 someone who neither leaves his complete name nor a real e-mail adress,  
 who can just drop a bomb a disappear.

Are you referring to me? I'm using my complete name and my real email  
address.

 You are free to google for my name, it is not that common, and make  
 yourself a picture.

You can google mine as well 7 of the top 10 are me.

Regan

Regan, calm down please. It's me, Holger, that is the hooligan here!
Again, I apologize for my tone.

Cheers,
Holger

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 21:37:09 +0000 (UTC), Holger  
<Holger_member pathlink.com> wrote:
 In article <opsufqj2rn23k2f5 nrage.netwin.co.nz>, Regan Heath says...
 On Sun, 24 Jul 2005 20:21:30 +0200, Ilya Minkov <minkov cs.tum.edu>  
 wrote:
 Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions  
 of
 "not
 existant" vs "empty" in the C language here. I really wonder how often
 we'll
 need to discuss such basic C-isms on the D NG? People should better
 learn their
 stuff before making such bold statements.

 Misconceptions? That was a typo and Regan could, if he cared to read
 through the beginnings of the preceding newsgroup, know me

 I do, in fact, "know you" well enough to have thought as I typed my  
 reply
 that you must have simply made a mistake. Nor, did I make the above
 statements, so, I really have no idea why you'd react in such a way
 towards *me*?

 , know that, although he pretty much popped up shortly before i
 disapperared. And, i really wonder why anyone would have to listen to
 someone who neither leaves his complete name nor a real e-mail adress,
 who can just drop a bomb a disappear.

 Are you referring to me? I'm using my complete name and my real email
 address.

 You are free to google for my name, it is not that common, and make
 yourself a picture.

 You can google mine as well 7 of the top 10 are me.

 Regan

 Regan, calm down please. It's me, Holger, that is the hooligan here!

You are right. ;) I am calm, I did not intend for my comments above to  
sound angry.

 Again, I apologize for my tone.

We're all adults here, you reply shows as much (no condescention  
meant/implied).

Regan

Jul 24 2005

Holger <Holger_member pathlink.com> writes:

In article <42E3DC2A.1040406 cs.tum.edu>, Ilya Minkov says...
Holger schrieb:
 Hi Regan, you're of course spot-on.
 It's not the first time that someone expressed misguided perceptions of "not
 existant" vs "empty" in the C language here. I really wonder how often we'll
 need to discuss such basic C-isms on the D NG? People should better learn their
 stuff before making such bold statements.

Misconceptions? That was a typo and Regan could, if he cared to read 
through the beginnings of the preceding newsgroup, know me, know that, 
although he pretty much popped up shortly before i disapperared. And, i 
really wonder why anyone would have to listen to someone who neither 
leaves his complete name nor a real e-mail adress, who can just drop a 
bomb a disappear.

You are free to google for my name, it is not that common, and make 
yourself a picture.

-i.


Good answer Ilya, you hit the mark. However, my philippic wasn't specifically
addressed at you. It's just that this particular misconception has popped
up quite a few times in the past and I felt annoyed. Anyway, I apologize for
being caustically. I didn't mean to question you personal abilities. Sorry ...

Still,
Holger

Jul 24 2005

Ilya Minkov <minkov cs.tum.edu> writes:

Regan Heath schrieb:
 Sure, memory management makes things complicated. But, uniqueness has  
 nothing to do with non-existance. The fact that non-existance is 
 typically  represented by null is the same regardless of memory 
 management model.

And the emptyness?

 There is a problem with "exists but empty". What does malloc do when 
 you  request 0 bytes?

 
 Allocates a zero length item on the heap. (I checked this recently).

How? If this was so, it would break the promise that malloc never 
returns the same adress unless this memory was returned by free. So, my 
bet would be that it returns a byte or a word of memory, which basically 
doesn't matter since you may not dereference it anyway. Does it segfault 
or throw when you access the first byte?

Which implementation did you check? It is still possible that other 
implementations do it differently. I just checked DMC and Cygwin-GCC, 
and both returned at least 8 bytes.

 So, in C there is no other way than to embed the information on the  
 non-existance into your data structure.


ARRGH. I obviously meant the emptyness, probably was very tired when i 
wrote the message.

And you, you should know that i have posts in the digitalmars newsgroups 
dating as early as two and a half years back, and if you ever looked at 
them you would know that you don't have to explain such stuff to me, but 
that i just mixed up the words.

The current newsgroup is just too much of a bunch of people not able nor 
willing to take time for each other and the subject, which was why i 
basically left. I think i was quite right about it and i don't think i'd 
like to show here up ever again.

 No, you simply use null. A non-existant string in C is a null pointer. 
 An  empty string in C is a non-null pointer which contains a \0 as the 
 first  character. The same applies to any other object. A null pointer 
 indicates  non-existance, and emptiness is represented in whatever 
 fashion makes  sense for the object i.e. a length property set to 0.

Blah. True.

 In the case of strings, this is a string having '\0' character at the  
 very beginning.

 
 No, that is an "empty" string, not a "non-existant" one.

I know dammit.

 None of this is necessary.

Now, if you like to explain *how* exactly you want to distinguish the 
empty something by pointer? You (a) embed the information into the 
target, or (b) you make an empty singleton and distinguish it by pointer 
comparison.

Please note that we're not talking about the D arrays here, it's about 
your statement: "a pointer is a good example of a type which can 
represent  non-existance, exists but empty, exists and not empty" and 
i'm waiting for you to either see your mistake or show me exactly how.

 What do you mean by can't be null?

 
 char[] p = null;
 if (p.length == 0) { //does not crash, p itself is never 'null' }

Ok, if you like to see it so.

I think i would call p "null" if i cannot dereference any element out of 
it. Just like a null pointer is something you cannot dereference, and 
where you can distinguish that by looking at the representation of the 
pointer, lust like you can by looking at the fields of an array slice. 
Whether you get a specialized exception or a general memory protection 
fault if you try to dereference it nontheless, is implementation detail. 
BTW, to add more to this similarity, you can catch the exception 
resulting from dereferencing a null pointer, just as you can catch the 
one resulting from dereferencing an element from the null array.

 Ok, given we still have the ability to manipulate the pointer and the  
 length separately, how should array conversion to boolean condition 
 be  defined then?

 
 The same way it works for every other type in D, the statement "if(x)"  
 means "compare x to null or 0". In the case of a reference it compares 
 the  reference to null.

 The confusion arises in this case because arrays in D cannot be null, 
 and  because arrays are in fact implemented as stack based structs in 
 the  background. This makes arrays appear to be a struct and not a 
 reference,  however currently in all (I believe) situations they behave 
 as references.  I believe this was done on purpose.

I still cannot quite grasp the statement "arrays cannot be null". :)

Yes, your writing looks very confused to me. :) It is quite correct to 
think of array (or, perhaps more correctly slice) as a value struct. 
However, i don't see how it would "behave as a reference" as opposed to 
the pointer... An array references a bunch of objects, which you don't 
access directly, but you "dereference" a certain element of an array by 
using operator[], similarly like operator* is used to dereference a 
single pointer.

Though i'm not that sure whether the distinction between pointers and 
references needs to be kept upright. It is only of syntactical nature, 
but so many sorts of pointers in D do some sort of syntactical 
forwarding to their target - the function pointer in the same way as in 
C, but also a struct pointer makes forwarding of the dot '.' operator.

 As I've noted in all cases where an array reference would be null, i.e.
 
 char[] p = null;
 
 it isn't, but instead the data pointer p.ptr is null.

...

 So, in order for them to behave as references it's logically consistent  
 for "if(p)" to check the data ptr vs null. Change that and you need to  
 code special cases for arrays vs other reference types, eg.
 
 template doWrite(Type) { void doWrite(Type p) {
   if (p) writefln(p);
 }

}
 
 class C {
   char[] toString() { return "C"; }
 }
 
 char[] p = "test";
 C c = new C();
 
 doWrite!(char[])(p);

doWrite!(C)(c);

This just works. I don't see any *specific* problem with templates if 
array is defined to always be null when it is empty, only the *general* 
tiny loss of freedom we just discussed.

There is another thing that comes to my mind: when you do a lot of 
sclicing and the slices get nulled-out as soon as you cannot reference 
any element through them, it can make the garbage collector reclaim the 
memory sooner.

-eye

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 22:09:26 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:
 Regan Heath schrieb:
 Sure, memory management makes things complicated. But, uniqueness has   
 nothing to do with non-existance. The fact that non-existance is  
 typically  represented by null is the same regardless of memory  
 management model.

 And the emptyness?

 There is a problem with "exists but empty". What does malloc do when  
 you  request 0 bytes?

  Allocates a zero length item on the heap. (I checked this recently).

 How?

I have no idea, I am just repeating what the MSDN documentation says ;) I  
wrote some C to test it, and malloc(0) and new char[0] both return  
non-null. That is all I tried.

 If this was so, it would break the promise that malloc never returns the  
 same adress unless this memory was returned by free. So, my bet would be  
 that it returns a byte or a word of memory, which basically doesn't  
 matter since you may not dereference it anyway. Does it segfault or  
 throw when you access the first byte?

Not sure. I didn't check it. I suspect your guesses are correct.

 Which implementation did you check? It is still possible that other  
 implementations do it differently. I just checked DMC and Cygwin-GCC,  
 and both returned at least 8 bytes.

I tested only the M$ compiler that comes with Visual Studio 6.0. I read  
only the MSDN documentation. It appears all implementations tested so far  
(mine and yours above) return something on a malloc of 0, rather than  
nothing.

 So, in C there is no other way than to embed the information on the   
 non-existance into your data structure.


 ARRGH. I obviously meant the emptyness, probably was very tired when i  
 wrote the message.

No problem. I half suspected as much.

 And you, you should know that i have posts in the digitalmars newsgroups  
 dating as early as two and a half years back, and if you ever looked at  
 them you would know that you don't have to explain such stuff to me, but  
 that i just mixed up the words.

I could not be certain it was a mistake. I replied not to correct *you*  
but to correct the *statements* so another would not read them and take  
them as truth. My reply was as much to you as to the newsgroup as a whole,  
so while you may not need the explainations, others might. My comments  
were never intended as personal criticism.

 The current newsgroup is just too much of a bunch of people not able nor  
 willing to take time for each other and the subject, which was why i  
 basically left. I think i was quite right about it and i don't think i'd  
 like to show here up ever again.

I'm sorry you feel that way. I'm here because I like to discuss these  
sorts of things in my spare time. (can anyone say 'geek'). Regardless of  
the quality(*) of the replies I get they all influence and refine _my_  
opinion, giving me a clearer idea of what I'm talking about and all the  
issues involved. In short, it's fun and it's self improvement.

(*) I'll leave that to the reader to define.

 None of this is necessary.

 Now, if you like to explain *how* exactly you want to distinguish the  
 empty something by pointer?

I wasn't suggesting you could. I thought you were confusing "empty" and  
"non-existant" at this stage and framed my reply with that in mind,  
essentially I thought you were suggesting you needed a singleton to  
represent non-existant, not empty.

 You (a) embed the information into the target, or (b) you make an empty  
 singleton and distinguish it by pointer comparison.

You're correct.

 Please note that we're not talking about the D arrays here, it's about  
 your statement: "a pointer is a good example of a type which can  
 represent  non-existance, exists but empty, exists and not empty" and  
 i'm waiting for you to either see your mistake or show me exactly how.

I see the point you're making, and you are correct.

char *a = null;
char *b = "";
char *c = "test";

a is non-existant.
b exists but is empty
c exists and is not empty.

It is not technically the pointer representing all 3, it only really  
represents exists, or not. Whether it is empty or not is represented by  
the data.

All I meant by my statement is that with a pointer you can do all 3.  
Compare that to 'int' which can only do non existance by using one of it's  
_values_ for non existance (this is limiting, and sort of illogical IMO -  
using a value to represent non-existance).

 What do you mean by can't be null?

  char[] p = null;
 if (p.length == 0) { //does not crash, p itself is never 'null' }

 Ok, if you like to see it so.

 I think i would call p "null" if i cannot dereference any element out of  
 it.

I would call it "null" if it was equal to "null". eg.
   if (p is null) {}

 Just like a null pointer is something you cannot dereference, and where  
 you can distinguish that by looking at the representation of the  
 pointer, lust like you can by looking at the fields of an array slice.

There is never a problem de-referencing an array reference itself, eg.

char[] p = null;
if (p.length) { //never crashes, so by your definition 'p' is never null,  
right? }

There _is_ a problem referencing data outside an array, but I don't think  
that's the same thing, eg.

char[] p = "one";
p[4] = 'a';  //crash (array bounds error)

this is dereferencing the data pointer (which doesn't crash) by an offset  
larger than the array (which is what crashes).

 Whether you get a specialized exception or a general memory protection  
 fault if you try to dereference it nontheless, is implementation detail.

True.

 BTW, to add more to this similarity, you can catch the exception  
 resulting from dereferencing a null pointer, just as you can catch the  
 one resulting from dereferencing an element from the null array.

True.

 Ok, given we still have the ability to manipulate the pointer and the   
 length separately, how should array conversion to boolean condition  
 be  defined then?

  The same way it works for every other type in D, the statement  
 "if(x)"  means "compare x to null or 0". In the case of a reference it  
 compares the  reference to null.

 The confusion arises in this case because arrays in D cannot be null,  
 and  because arrays are in fact implemented as stack based structs in  
 the  background. This makes arrays appear to be a struct and not a  
 reference,  however currently in all (I believe) situations they behave  
 as references.  I believe this was done on purpose.

 I still cannot quite grasp the statement "arrays cannot be null". :)

See above for my meaning. The array reference itself is never "null" (and  
can always be dereferenced).

 Yes, your writing looks very confused to me. :) It is quite correct to  
 think of array (or, perhaps more correctly slice) as a value struct.

I disagree. I believe the intention is that we think of them as  
references. The documentation calls them references, their behaviour is  
orthogonal with other references. I believe Walter made them behave as  
references on purpose.

They are however implemented as a stack based struct. Derek has shown that.

 However, i don't see how it would "behave as a reference" as opposed to  
 the pointer... An array references a bunch of objects

I'd say an array 'contains' a bunch of objects, but that's beside the  
point. I'm referring to the array variable itself, eg.
   char[] p;

'p' behaves like a reference, just as:

class C {}
C c;

'c' behaves like a reference.

 , which you don't access directly, but you "dereference" a certain  
 element of an array by using operator[], similarly like operator* is  
 used to dereference a single pointer.

You access it directly when you use [] or any other property or member of  
it. You de-reference the array reference in order to use these properties.  
In the case of [] you dereference the data pointer in the array reference  
by an offset.

 Though i'm not that sure whether the distinction between pointers and  
 references needs to be kept upright. It is only of syntactical nature,  
 but so many sorts of pointers in D do some sort of syntactical  
 forwarding to their target - the function pointer in the same way as in  
 C, but also a struct pointer makes forwarding of the dot '.' operator.

Indeed.. when I think about pointers and references I find myself thinking  
of them as the "same thing". A reference is perhaps just a specific type  
of pointer. I mean, a reference is a pointer to an object, but a pointer  
is more general, you can have a pointer to a pointer to a ... to an  
object. You can create pointers to any other type.

 As I've noted in all cases where an array reference would be null, i.e.
  char[] p = null;
  it isn't, but instead the data pointer p.ptr is null.

 ...

 So, in order for them to behave as references it's logically  
 consistent  for "if(p)" to check the data ptr vs null. Change that and  
 you need to  code special cases for arrays vs other reference types, eg.
  template doWrite(Type) { void doWrite(Type p) {
   if (p) writefln(p);
 }

 }
  class C {
   char[] toString() { return "C"; }
 }
  char[] p = "test";
 C c = new C();
  doWrite!(char[])(p);

 doWrite!(C)(c);

 This just works. I don't see any *specific* problem with templates if  
 array is defined to always be null when it is empty, only the *general*  
 tiny loss of freedom we just discussed.

"general tiny loss of freedom"? you mean the ability to express  
non-existance?

 There is another thing that comes to my mind: when you do a lot of  
 sclicing and the slices get nulled-out as soon as you cannot reference  
 any element through them, it can make the garbage collector reclaim the  
 memory sooner.

I don't see how this is relevant, sorry.

Regan

Jul 24 2005

James McComb <ned jamesmccomb.id.au> writes:

Regan Heath wrote:

 What do you mean by can't be null?

 
 char[] p = null;
 if (p.length == 0) { //does not crash, p itself is never 'null' }

Okay... I obviously don't get D strings because this seems wildly 
counter-intuitive to me. Sure if p CANNOT be null, the line

char[] p = null; // Surely this means: set p to null

should fail to compile or throw an exception or something?

If D strings truly couldn't be null I would expect something like:

// Declare p
// p is initialized to empty (not null - it can never be null)
char[] p;

// Test for empty
// A test for null makes no sense - p can never be null
if (!p) {}

What am I missing?

James McComb

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 25 Jul 2005 08:55:09 +1000, James McComb <ned jamesmccomb.id.au>  
wrote:
 Regan Heath wrote:

 What do you mean by can't be null?

  char[] p = null;
 if (p.length == 0) { //does not crash, p itself is never 'null' }

 Okay... I obviously don't get D strings because this seems wildly  
 counter-intuitive to me. Sure if p CANNOT be null, the line

 char[] p = null; // Surely this means: set p to null

 should fail to compile or throw an exception or something?

I understand where you're going with this, the important fact here is that  
though an array reference itself cannot be null it still behaves like any  
other reference set to null(*). eg.

char[] p = null;
if (p is null) { //true }

(*) with the exception that you can de-reference an array that has been  
set to null, eg.

char[] p = null;
if (p.length == 0) { //does not crash }

Try that with another reference type, eg a class with a length member.
Try that with a struct (value type), you'll find you cannot assign null.

The problem is perhaps trying to think of an array as "strictly" a  
reference type or "strictly" a value type. It's not, it's a mix of the  
two, an attempt by Walter (successful IMO) to get the best performance  
possible, passed by reference, stored on the stack, nice syntax, etc.

I believe you should think of arrays as references with special abilities  
like overloaded = and [] operators, for example. If we had the ability to  
define an opEquals for classes I believe we could emulate arrays with  
classes. (I admit, I haven't tried, nor considered it thoughroughly)

That said, you could perhaps equally argue that you should think of them  
as structs which are passed by reference, can be set to null, can be  
compared to null, .. I think of them as references, it makes more sense to  
me.

 If D strings truly couldn't be null I would expect something like:

 // Declare p
 // p is initialized to empty (not null - it can never be null)
 char[] p;

 // Test for empty
 // A test for null makes no sense - p can never be null
 if (!p) {}

The distinction is perhaps fine: "strings" can be non-existant.  
non-existant is typically represented with null. "array references" cannot  
be null. Yet, arrays can represent non-existant. A contradiction? No,  
those statements are not directly contradictory because:
  - null is not the only way to represent non-existance (beside the point  
in this case)
  - the object (arrays) need not be null in all situations.

In our case point 2 is the relevant one. Arrays are null only when they  
need to be in order to behave like other reference types. Arrays are not  
null when you try to dereference them.

 What am I missing?

Maybe nothing, maybe something, who am I to say. I can only offer my  
opinion above.

Regan

Jul 24 2005

James McComb <ned jamesmccomb.id.au> writes:

Regan Heath wrote:

 ... the important fact here is 
 that  though an array reference itself cannot be null it still behaves 
 like any  other reference set to null ...
 with the exception that you can de-reference an array that has been  
 set to null ...

Thanks for the examples. I understand what you mean, now.

Considering that these lines are valid:

char[] p = null;
if (p is null) { //true }

...it confused me that you say p cannot be null.

I know what you mean now, though. :)

James McComb

Jul 24 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 25 Jul 2005 11:33:58 +1000, James McComb wrote:

 Regan Heath wrote:
 
 ... the important fact here is 
 that  though an array reference itself cannot be null it still behaves 
 like any  other reference set to null ...
 with the exception that you can de-reference an array that has been  
 set to null ...

 
 Thanks for the examples. I understand what you mean, now.
 
 Considering that these lines are valid:
 
 char[] p = null;
 if (p is null) { //true }
 
 ...it confused me that you say p cannot be null.
 
 I know what you mean now, though. :)

These lines are equivalent to ...

  char[] p;
  p.ptr = null;
  p.length = 0;
  if (p.ptr == null or p.length == 0) { //true }

I know Regan talks about an 'array reference' as an abstract concept but I
just find it easier to think of it in terms of how it is actually
implemented, namely a two field struct. To me, I keep thinking of a
'reference' as the address of a pointer - but that's just me ;-)

For example, '&p' returns the address of the array reference (struct) and
you can manipulate it directly (at your peril, of course).

-- 
Derek
Melbourne, Australia
25/07/2005 12:56:21 PM

Jul 24 2005

AJG <AJG_member pathlink.com> writes:

Hi,

In article <dc1689$h1q$1 digitaldaemon.com>, James McComb says...
Regan Heath wrote:

 What do you mean by can't be null?

 
 char[] p = null;
 if (p.length == 0) { //does not crash, p itself is never 'null' }

Okay... I obviously don't get D strings because this seems wildly 
counter-intuitive to me. Sure if p CANNOT be null, the line

char[] p = null; // Surely this means: set p to null

should fail to compile or throw an exception or something?

You are right, it is a little counter-intuitive. Then you say array = null you
are sort of talking about the array's pointer, not the reference itself. I
suspect one of two things happens internally:

1) The " = null" is simply ignored by the compiler (for efficiency). So
char[] p = null;
becomes:
char[] p;

2) A *new* reference (null pointer, 0-length) is created, in the style of
function parameters, and assigned. In other words, it does whatever it does when
you pass "null" to a function expecting an array.

If D strings truly couldn't be null I would expect something like:

// Declare p
// p is initialized to empty (not null - it can never be null)
char[] p;

// Test for empty
// A test for null makes no sense - p can never be null
if (!p) {}

What am I missing?

That's one of the original points of this thread. I wanted D to outlaw that test
precisely because of the confusion. It didn't make sense to me either. You can
think of "if (!p)" as "if (!p.ptr)". It is _not_ a test for emptiness.
Therefore, I'm against this automatic conversion, for what it's worth.

Cheers,
--AJG.

Jul 24 2005

Ilya Minkov <minkov cs.tum.edu> writes:

AJG schrieb:
 You are right, it is a little counter-intuitive. Then you say array =
  null you are sort of talking about the array's pointer, not the 
 reference itself. I suspect one of two things happens internally:

 2) A *new* reference (null pointer, 0-length) is created, in the 
 style of function parameters, and assigned. In other words, it does 
 whatever it does when you pass "null" to a function expecting an 
 array.

You cannot say "new" because the array (slice) does not have the
reference sematics by itself, and does not get allocated. It is stored
inline - that is, wherever you mention it - on stack, within a class,
etc, as a *value*.

The statement "char[]p = null" declares an array slice on the stack (the
LHS), and at the same time assigns null to its pointer field, and 0 to
its length field.

The slice has a reference semantics with respect to its referred
elements, i.e. you can manipulate the elements directly and the changes 
propagate outside. However, it does not have a reference semantics in 
respect to its fields:

void manipulate(char[] blah) {
	assert(blah.length > 1);
	blah[0] = 'x'; //This change propagates
	blah.length = blah.length-1;
	//Above change does not propagate
	blah[0] = 'y'; //This change propagates
	blah~="z";
	//Above statement decouples completely
	//Now no changes will ever propagate,
	//whatever you do to the poor array!
}

I happen to like the strange semantics, but this is in strong contrast
to a string& or vector<blah>& in C++ or the Java array.

 1) The " = null" is simply ignored by the compiler (for efficiency). 
 So char[] p = null; becomes: char[] p;

I don't see any special provision for efficiency, just that an array
slice has a default value tuple (null,0), and the same gets assigned to
it by an assignment of null.

-eye

Jul 27 2005

AJG <AJG_member pathlink.com> writes:

In article <dc86pk$2rta$1 digitaldaemon.com>, Ilya Minkov says...
AJG schrieb:
 You are right, it is a little counter-intuitive. Then you say array =
  null you are sort of talking about the array's pointer, not the 
 reference itself. I suspect one of two things happens internally:

 2) A *new* reference (null pointer, 0-length) is created, in the 
 style of function parameters, and assigned. In other words, it does 
 whatever it does when you pass "null" to a function expecting an 
 array.

You cannot say "new" because the array (slice) does not have the
reference sematics by itself, and does not get allocated. It is stored
inline - that is, wherever you mention it - on stack, within a class,
etc, as a *value*.

I meant at compile-time, not in the "new int[5]" sense. 

The slice has a reference semantics with respect to its referred
elements, i.e. you can manipulate the elements directly and the changes 
propagate outside. However, it does not have a reference semantics in 
respect to its fields:

I had been wondering how to define reference semantics for a while, and I think
your definition is spot-on.

I happen to like the strange semantics, but this is in strong contrast
to a string& or vector<blah>& in C++ or the Java array.

I don't like the strange array field semantics. I ran into a particularly nasty
length problem a couple of days ago, and life would have been much simpler if
fields worked just like contents.

My solution was to use an array-reference pointer. E.g.






 1) The " = null" is simply ignored by the compiler (for efficiency). 
 So char[] p = null; becomes: char[] p;

I don't see any special provision for efficiency, just that an array
slice has a default value tuple (null,0), and the same gets assigned to
it by an assignment of null.

What I meant is that char[] p; is already initialized to null. Doing an
additional = null is redundant and thus eliminating it would be a small
efficiency gain (again, to the compiler, not the runtime).

----

While we are on this topic, is this legal? (and safe?):















I think it should be, for consistency and simplicity, but I don't know about the
safety of it. I don't have DMD handy to test it, either.

Thanks,
--AJG.

Jul 27 2005

Derek Parnell <derek psych.ward> writes:

On Fri, 22 Jul 2005 15:00:51 +0200, Ilya Minkov wrote:


[snip]
 
 I believe that making this distinction, between empty and non-existent 
 arrays, just provides the possibility for another misconception and bug.

When I started seriously coding with D, I was making mistakes in my code
because I assumed that D would make this distinction.

 If someone sees real technical necessity to be able to distinguish 
 between the empty and the non-existing one, is invited to show it here.

One reasonable use for a non-existent string is to represent the fact that
a default value has not been supplied. As every possible string value,
including an empty string, could be the default value, I needed a way to
state that a string has no default yet.

-- 
Derek Parnell
Melbourne, Australia
24/07/2005 12:07:59 AM

Jul 23 2005

AJG <AJG_member pathlink.com> writes:

Sorry, I got the two last examples backwards. The comments should read
"the statement will print"
and then
"the statement will *not* print"

Not the other way around. The point remains the same, though.
Thanks,
--AJG.

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 00:17:51 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Sorry, I got the two last examples backwards. The comments should read
 "the statement will print"
 and then
 "the statement will *not* print"

 Not the other way around. The point remains the same, though.

Sorry, I replied before seeing this post. My reply remains the same minus  
correcting your mistakes.

Regan

Jul 20 2005

Charles Hixson <charleshixsn earthlink.net> writes:

Derek Parnell wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:
 
 Mr Heath, I agree with You on this.

 
 I don't.
 
 Does ...
 
   if (array) ...
 
 test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.
 
   if (array.ptr == null) -- test for a non-existence.
 
   if (array.length == 0) -- test for emptiness
 
   if (array) -- test for which?

If array might be null, can you be certain that it's proper to 
dereference it, e.g. array.length would seem to presume that 
array wasn't null.  (Actually, so would array.ptr...but perhaps 
that's just me.)

Jul 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 20 Jul 2005 15:38:04 -0700, Charles Hixson  
<charleshixsn earthlink.net> wrote:
 Derek Parnell wrote:
 On Wed, 20 Jul 2005 11:21:55 +0200, Dejan Lekic wrote:

 Mr Heath, I agree with You on this.

  I don't.
  Does ...
    if (array) ...
  test for an empty array or a non-existent array? I can't tell from the
 syntax. It is thus ambiguous.
    if (array.ptr == null) -- test for a non-existence.
    if (array.length == 0) -- test for emptiness
    if (array) -- test for which?

 If array might be null, can you be certain that it's proper to  
 dereference it, e.g. array.length would seem to presume that array  
 wasn't null.  (Actually, so would array.ptr...but perhaps that's just  
 me.)

D guarantees an array reference is never null.

Is an array reference _the_ array? No, just like an object reference is  
not _the_ object (thus why you can have x references to the same object)

A null array, "char[] p = null;" has a null data pointer. So, to check for  
a null array you check the data pointer.

An empty array, "char[] p = "";" has a non-null data pointer but a 0  
length. So, to check for an empty array you check the length.

See my other posts for reasoning as to why "if(array)" checks the null  
array case.

Regan

Jul 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opst6x8cje23k2f5 nrage.netwin.co.nz...
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com> 
 wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What 
 about
 making if (array) illegal in D? I think it brings ambiguity and a high 
 potential
 for errors to the language. The main two uses for this construct can 
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is supposed 
 to
 mean, since it's _not_ like C. In C, if (array), where array is 
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr is 
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

 I prefer the current behaviour (for all the reasons I mentioned in the 
 previous thread):
   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804

 "if (array)" is the same as "if (array.ptr)" which acts just like it does 
 in C, comparing it to 0/null.

 Essentially the "if" statement is checking the not zero state of the 
 variable itself. In the case of value types it compares the value to 0. In 
 the case of pointers and references it compares them to null.

 In the case of an array, which (as explained in link above) is a 
 mix/pseudo value/reference type, it compares the data pointer to null.

 The reason this is the correct behaviour is that a null array has a null 
 data pointer, but, an empty array i.e. an existing set containing no 
 elements may have a non-null data pointer. In both cases they have a 0 
 length property.

 Of course we could change this, we could remove the case where an array 
 contains no items but has a non-null data pointer. This IMO would remove a 
 useful distinction, the "existing set containing no items" would be 
 un-representable with a single array variable. IMO that would be a bad 
 move, the current situation(*) is good.

 (*) there remains the problem where setting the length of an array sets 
 the data pointer to null. This can change an "existing set with no 
 elements" into a "non existant set".

 Regan

I was poking around the Qt documentation and interestingly enough QString 
has a concept of null and empty. Here's what they say, though: "For 
historical reasons, QString distinguishes between a null string and an empty 
string. [snip] We recommend that you always use isEmpty() and avoid 
isNull()."

The exact doc is 
http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

Jul 21 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Thu, 21 Jul 2005 22:31:37 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opst6x8cje23k2f5 nrage.netwin.co.nz...
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com>
 wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What
 about
 making if (array) illegal in D? I think it brings ambiguity and a high
 potential
 for errors to the language. The main two uses for this construct can
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is  
 supposed
 to
 mean, since it's _not_ like C. In C, if (array), where array is
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr  
 is
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two  
 options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

 I prefer the current behaviour (for all the reasons I mentioned in the
 previous thread):
   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804

 "if (array)" is the same as "if (array.ptr)" which acts just like it  
 does
 in C, comparing it to 0/null.

 Essentially the "if" statement is checking the not zero state of the
 variable itself. In the case of value types it compares the value to 0.  
 In
 the case of pointers and references it compares them to null.

 In the case of an array, which (as explained in link above) is a
 mix/pseudo value/reference type, it compares the data pointer to null.

 The reason this is the correct behaviour is that a null array has a null
 data pointer, but, an empty array i.e. an existing set containing no
 elements may have a non-null data pointer. In both cases they have a 0
 length property.

 Of course we could change this, we could remove the case where an array
 contains no items but has a non-null data pointer. This IMO would  
 remove a
 useful distinction, the "existing set containing no items" would be
 un-representable with a single array variable. IMO that would be a bad
 move, the current situation(*) is good.

 (*) there remains the problem where setting the length of an array sets
 the data pointer to null. This can change an "existing set with no
 elements" into a "non existant set".

 Regan

 I was poking around the Qt documentation and interestingly enough QString
 has a concept of null and empty. Here's what they say, though: "For
 historical reasons, QString distinguishes between a null string and an  
 empty
 string. [snip] We recommend that you always use isEmpty() and avoid
 isNull()."

 The exact doc is
 http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

That's not too surprising. A lot of people have never seen the need for  
the distinction, and it certainly can make life "simpler". However, I  
don't believe you can argue that it doesn't exist, at least logically.  
That is why you get situations like this (stolen from a post to the  
DMDScript group):

<quote>
For example, might it not be useful to return 'null' on EOF, thus allowing
this sort of construct:

     var line = readln();

     while (line != null)
     {
          ...
          line = readln();
     }
</quote>

which is an example where there is a desire to distinguish between  
existance and empty.

Sure, you can remove the distinction, lessen the expressiveness of arrays  
and force everyone to "work around" the deficiency in other ways, it's  
possible, it can make life simpler for the general case and more  
complicated for the rest.

I think arrays in D are nearly perfect(*). They allow you to ignore the  
distinction in the general case (thus life is pretty easy already) yet you  
can tell the difference if you require it.

(*) there are only 2 problems with them IMO:

1. length = 0; resets the data pointer to null, changing emtpy into  
non-existant.
2. "int[0] a;" and "int[] a = new int[0];" produce different results when  
you'd expect the same thing.

Regan

Jul 21 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsuaqfmcv23k2f5 nrage.netwin.co.nz...
 On Thu, 21 Jul 2005 22:31:37 -0400, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opst6x8cje23k2f5 nrage.netwin.co.nz...
 On Wed, 20 Jul 2005 02:15:58 +0000 (UTC), AJG <AJG_member pathlink.com>
 wrote:
 This is a suggestion based on a thread from a couple of weeks ago. What
 about
 making if (array) illegal in D? I think it brings ambiguity and a high
 potential
 for errors to the language. The main two uses for this construct can
 already be
 done with a slightly more explicit syntax:

 if (array.ptr == null) // Check for a kind of "non-existance."
 if (array.length == 0) // Check for explicit emptiness.

 On the other hand, one is not sure what if (array) by itself is 
 supposed
 to
 mean, since it's _not_ like C. In C, if (array), where array is
 typically a
 pointer, means simply != NULL. The problem in D is that the array ptr 
 is
 tricky
 and IMHO it's best not to interface with it directly.

 I think it would be wise to remove this ambiguity. I propose two 
 options:
 1) Make if (array) equal _always_ to if (array.length).
 2) Simply make it illegal.

 What do you guys think? Walter?

 I prefer the current behaviour (for all the reasons I mentioned in the
 previous thread):
   http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/25804

 "if (array)" is the same as "if (array.ptr)" which acts just like it 
 does
 in C, comparing it to 0/null.

 Essentially the "if" statement is checking the not zero state of the
 variable itself. In the case of value types it compares the value to 0. 
 In
 the case of pointers and references it compares them to null.

 In the case of an array, which (as explained in link above) is a
 mix/pseudo value/reference type, it compares the data pointer to null.

 The reason this is the correct behaviour is that a null array has a null
 data pointer, but, an empty array i.e. an existing set containing no
 elements may have a non-null data pointer. In both cases they have a 0
 length property.

 Of course we could change this, we could remove the case where an array
 contains no items but has a non-null data pointer. This IMO would 
 remove a
 useful distinction, the "existing set containing no items" would be
 un-representable with a single array variable. IMO that would be a bad
 move, the current situation(*) is good.

 (*) there remains the problem where setting the length of an array sets
 the data pointer to null. This can change an "existing set with no
 elements" into a "non existant set".

 Regan

 I was poking around the Qt documentation and interestingly enough QString
 has a concept of null and empty. Here's what they say, though: "For
 historical reasons, QString distinguishes between a null string and an 
 empty
 string. [snip] We recommend that you always use isEmpty() and avoid
 isNull()."

 The exact doc is
 http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

 That's not too surprising. A lot of people have never seen the need for 
 the distinction, and it certainly can make life "simpler". However, I 
 don't believe you can argue that it doesn't exist, at least logically. 
 That is why you get situations like this (stolen from a post to the 
 DMDScript group):

 <quote>
 For example, might it not be useful to return 'null' on EOF, thus allowing
 this sort of construct:

     var line = readln();

     while (line != null)
     {
          ...
          line = readln();
     }
 </quote>

 which is an example where there is a desire to distinguish between 
 existance and empty.

 Sure, you can remove the distinction, lessen the expressiveness of arrays 
 and force everyone to "work around" the deficiency in other ways, it's 
 possible, it can make life simpler for the general case and more 
 complicated for the rest.

 I think arrays in D are nearly perfect(*). They allow you to ignore the 
 distinction in the general case (thus life is pretty easy already) yet you 
 can tell the difference if you require it.

 (*) there are only 2 problems with them IMO:

 1. length = 0; resets the data pointer to null, changing emtpy into 
 non-existant.
 2. "int[0] a;" and "int[] a = new int[0];" produce different results when 
 you'd expect the same thing.

 Regan

Sure, I agree special values can be useful and null is an easy special value 
to use. Note the same behavior can be obtained with returning a singleton 
empty just for eof, if desired. The singleton approach could arguably make 
the code more readable, too, since the reader wouldn't have to know that 
null line meant eof. For example
 char[] line = din.readLine();
 while (line !is din.eofLine()) { ... line = din.readLine(); }
where eofLine can return null or if the stream author wishes it can return 
some other unique empty string.

Jul 22 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Fri, 22 Jul 2005 09:06:48 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 I was poking around the Qt documentation and interestingly enough  
 QString
 has a concept of null and empty. Here's what they say, though: "For
 historical reasons, QString distinguishes between a null string and an
 empty
 string. [snip] We recommend that you always use isEmpty() and avoid
 isNull()."

 The exact doc is
 http://doc.trolltech.com/4.0/qstring.html#distinction-between-null-and-empty-strings

 That's not too surprising. A lot of people have never seen the need for
 the distinction, and it certainly can make life "simpler". However, I
 don't believe you can argue that it doesn't exist, at least logically.
 That is why you get situations like this (stolen from a post to the
 DMDScript group):

 <quote>
 For example, might it not be useful to return 'null' on EOF, thus  
 allowing
 this sort of construct:

     var line = readln();

     while (line != null)
     {
          ...
          line = readln();
     }
 </quote>

 which is an example where there is a desire to distinguish between
 existance and empty.

 Sure, you can remove the distinction, lessen the expressiveness of  
 arrays
 and force everyone to "work around" the deficiency in other ways, it's
 possible, it can make life simpler for the general case and more
 complicated for the rest.

 I think arrays in D are nearly perfect(*). They allow you to ignore the
 distinction in the general case (thus life is pretty easy already) yet  
 you
 can tell the difference if you require it.

 (*) there are only 2 problems with them IMO:

 1. length = 0; resets the data pointer to null, changing emtpy into
 non-existant.
 2. "int[0] a;" and "int[] a = new int[0];" produce different results  
 when
 you'd expect the same thing.

 Regan

 Sure, I agree special values can be useful and null is an easy special  
 value to use.

Indeed, null and NAN have a lot in common. They indicate non-existance, or  
un-initialised. Think how much trouble we have coding with 'int' and other  
'value' types that cannot indicate non-existance? esp with container  
classes and the like. std.boxer wouldn't exist if int could indicate  
non-existance.

 Note the same behavior can be obtained with returning a singleton
 empty just for eof, if desired. The singleton approach could arguably  
 make the code more readable, too, since the reader wouldn't have to know  
 that
 null line meant eof. For example
  char[] line = din.readLine();
  while (line !is din.eofLine()) { ... line = din.readLine(); }
 where eofLine can return null or if the stream author wishes it can  
 return some other unique empty string.

That code is more descriptive, sure. However, null is more generic in  
application. You can use it 'everywhere' and everywhere it is used it can  
have the same meaning. This means no 'special case' code is required (like  
that shown above).

Regan

Jul 23 2005

AJG <AJG_member pathlink.com> writes:

Hi,

 Sure, I agree special values can be useful and null is an easy special  
 value to use.

Indeed, null and NAN have a lot in common. They indicate non-existance, or  
un-initialised. Think how much trouble we have coding with 'int' and other  
'value' types that cannot indicate non-existance? esp with container  
classes and the like. std.boxer wouldn't exist if int could indicate  
non-existance.

Yes! That is exactly right. The problem with using array.ptr as null for
existance checks is that it's not orthogonal at all. It only works with arrays.
It might also work with classes (not sure). What about primitives? No, it's back
to an additional boolean or somesuch. That's why I think it's a crappy solution,
and that's exactly the source of the if (array) dilemma in the first place. 


will be nullable. I'm not sure how this is going to work (haven't tried it), but
at least it's orthogonal. It works everywhere. Whereas array.ptr is shaky,
buggy, likely to change and IMHO unsemantic. If we at least see that this is a
problem, and that there is a need for a more complete feature, maybe we can work
towards a better solution.

 Note the same behavior can be obtained with returning a singleton
 empty just for eof, if desired. The singleton approach could arguably  
 make the code more readable, too, since the reader wouldn't have to know  
 that
 null line meant eof. For example
  char[] line = din.readLine();
  while (line !is din.eofLine()) { ... line = din.readLine(); }
 where eofLine can return null or if the stream author wishes it can  
 return some other unique empty string.

That code is more descriptive, sure. However, null is more generic in  
application. You can use it 'everywhere' and everywhere it is used it can  
have the same meaning. This means no 'special case' code is required (like  
that shown above).

That's not true. 'Everywhere' would mean complete orthogonality, and as we know,
this trick only works with certain types. But I agree with the premise, that
nullness is a great (and easy) special value that makes life simpler. Thus a
good solution should be built into the language.

Cheers,
--AJG.

Jul 23 2005

Dave <Dave_member pathlink.com> writes:

In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...

will be nullable. I'm not sure how this is going to work (haven't tried it), 

Interesting stuff.. I looked into this a bit and apparently the underlying
implementation is done through System.Nullable<T>.

System.Nullable<int> j;
int? k;

The 'T?' form is shorthand.

As you can imagine, there appears to be quite a bit of overhead involved as the
nullable types aren't native. But there's nothing stopping you from retrieving,
say, a DB value into a nullable type, checking if it's null and then assigning
it to a native variable if it's not. But assigning it requires accessing a
property (int k = j.Value;) or a cast (int k = (int)j;).

I like the idea, but given that you will still always have to check if a
nullable variable is not null before using it or even assigning it to another
(non-nullable) variable, I'm having trouble imagining how much more productive /
readable it's going to make coding for most chores where "nullable native types"
would be useful. For example, for database applications, I can still see a need
to write a library of wrapper functions to assign a column to native data types,
or if the table was represented by a class, to check for null each time a column
was retrieved in order to assign the value to a native type. 

Either way it seems like it will require about the same amount of code to write
most applications, but with added complexity to the language.

Jul 23 2005

AJG <AJG_member pathlink.com> writes:

Hi,

In article <dbv8fr$1nnm$1 digitaldaemon.com>, Dave says...
In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...

will be nullable. I'm not sure how this is going to work (haven't tried it), 

Interesting stuff.. I looked into this a bit and apparently the underlying
implementation is done through System.Nullable<T>.

System.Nullable<int> j;
int? k;

The 'T?' form is shorthand.

Interesting. I didn't know that. I was actually kinda hoping they found a magic
"native" way, but I guess not.

As you can imagine, there appears to be quite a bit of overhead involved as the
nullable types aren't native.

Yes, I agree. Though I shouldn't speculate without having even tested for
performance.

But there's nothing stopping you from retrieving,
say, a DB value into a nullable type, checking if it's null and then assigning
it to a native variable if it's not. But assigning it requires accessing a
property (int k = j.Value;) or a cast (int k = (int)j;).

This looks a little cumbersome. It will remain cumbersome without language
support, IMHO.

I like the idea, but given that you will still always have to check if a
nullable variable is not null before using it or even assigning it to another
(non-nullable) variable, I'm having trouble imagining how much more >productive
/
readable it's going to make coding for most chores where "nullable native
>types" would be useful.

I disagree here. I think the non-existent concept is a good one. It's useful in
arrays (and possibly classes), and I think the usefulness extends across
primitives as well.

For example, for database applications, I can still see a need
to write a library of wrapper functions to assign a column to native data
>types,
or if the table was represented by a class, to check for null each time a
>column was retrieved in order to assign the value to a native type. 

For instance:












Either way it seems like it will require about the same amount of code to write
most applications, but with added complexity to the language.

Some (most?) of the complexity is already there. Arrays and Classes both are
already capable of existing vs. being empty. This would merely extend the
feature for orthogonality. I think it would be fairly useful.

Just my 2 cents.
--AJG.

Jul 23 2005

Dave <Dave_member pathlink.com> writes:

In article <dbvajm$1p2i$1 digitaldaemon.com>, AJG says...
Hi,

In article <dbv8fr$1nnm$1 digitaldaemon.com>, Dave says...
In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...

will be nullable. I'm not sure how this is going to work (haven't tried it), 

Interesting stuff.. I looked into this a bit and apparently the underlying
implementation is done through System.Nullable<T>.

System.Nullable<int> j;
int? k;

The 'T?' form is shorthand.

Interesting. I didn't know that. I was actually kinda hoping they found a magic
"native" way, but I guess not.

As you can imagine, there appears to be quite a bit of overhead involved as the
nullable types aren't native.

Yes, I agree. Though I shouldn't speculate without having even tested for
performance.

I did run a quick test w/ a simple loop using a few assignment operators and the
penalty was on the order of 4-5x.

Even if that is actually indicative of what you could expect with 'real world'

performance penalty. Who knows, if built into D 'natively', maybe all/most of
that penalty could be (practically) optimized away or maybe there's an internal
implementation possible that wouldn't cause any 'penalty'?

But there's nothing stopping you from retrieving,
say, a DB value into a nullable type, checking if it's null and then assigning
it to a native variable if it's not. But assigning it requires accessing a
property (int k = j.Value;) or a cast (int k = (int)j;).

This looks a little cumbersome. It will remain cumbersome without language
support, IMHO.

I like the idea, but given that you will still always have to check if a
nullable variable is not null before using it or even assigning it to another
(non-nullable) variable, I'm having trouble imagining how much more >productive
/
readable it's going to make coding for most chores where "nullable native
>types" would be useful.

I disagree here. I think the non-existent concept is a good one. It's useful in
arrays (and possibly classes), and I think the usefulness extends across
primitives as well.

For example, for database applications, I can still see a need
to write a library of wrapper functions to assign a column to native data
>types,
or if the table was represented by a class, to check for null each time a
>column was retrieved in order to assign the value to a native type. 

For instance:












Either way it seems like it will require about the same amount of code to write
most applications, but with added complexity to the language.

Some (most?) of the complexity is already there. Arrays and Classes both are
already capable of existing vs. being empty. This would merely extend the
feature for orthogonality. I think it would be fairly useful.

Just my 2 cents.
--AJG.

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 14:38:05 +0000 (UTC), Dave <Dave_member pathlink.com>  
wrote:
 In article <dbv8fr$1nnm$1 digitaldaemon.com>, Dave says...
 In article <dbv45d$1ju8$1 digitaldaemon.com>, AJG says...

 Even ints
 will be nullable. I'm not sure how this is going to work (haven't  
 tried it),

 Interesting stuff.. I looked into this a bit and apparently the  
 underlying
 implementation is done through System.Nullable<T>.

 System.Nullable<int> j;
 int? k;

 The 'T?' form is shorthand.

 Interesting. I didn't know that. I was actually kinda hoping they found  
 a magic
 "native" way, but I guess not.

 As you can imagine, there appears to be quite a bit of overhead  
 involved as the
 nullable types aren't native.

 Yes, I agree. Though I shouldn't speculate without having even tested  
 for
 performance.

 I did run a quick test w/ a simple loop using a few assignment operators  
 and the
 penalty was on the order of 4-5x.

 Even if that is actually indicative of what you could expect with 'real  
 world'

 of a
 performance penalty. Who knows, if built into D 'natively', maybe  
 all/most of
 that penalty could be (practically) optimized away or maybe there's an  
 internal
 implementation possible that wouldn't cause any 'penalty'?

D's arrays are exactly this. They are a nullable type which is implemented  
as a stack based value type. They are fast and efficient. Walter could do  
the same thing with boxing i.e. build boxing into the language, implement  
it using a stack based value type.

Regan

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 04:07:09 +0000 (UTC), AJG <AJG_member pathlink.com>  
wrote:
 Sure, I agree special values can be useful and null is an easy special
 value to use.

 Indeed, null and NAN have a lot in common. They indicate non-existance,  
 or
 un-initialised. Think how much trouble we have coding with 'int' and  
 other
 'value' types that cannot indicate non-existance? esp with container
 classes and the like. std.boxer wouldn't exist if int could indicate
 non-existance.

 Yes! That is exactly right. The problem with using array.ptr as null for
 existance checks is that it's not orthogonal at all. It only works with  
 arrays.

No, the key point you seem to be missing is: "if(x)" compares 'x' to null  
or 0. It is _not_ intended to test for existance, that is _not_ it's  
purpose.

The "if(x)" rule is true *for all types* even primitives (with the  
exception of a struct - because it is user defined and cannot be compared  
to null or 0).

If the variable 'x' is a reference type it compares the reference to null.  
Arrays are references, so it compares the array reference to null. Array  
references cannot be null. When an array reference would be null, the  
array.ptr is null. Therefore to compare an array to null, you compare the  
array.ptr to null.

This behaviour is _required_ to make arrays orthogonal with other  
references.

This behaviour is completely orthogonal *for all types* and this can be  
proven by example.

class C {}

char[] p = null;
C c = null;
int i = 0;

if (c) { //not true }
if (p) { //not true }
if (i) { //not true }

 It might also work with classes (not sure).

Yes, see above.

 What about primitives? No, it's back to an additional boolean or  
 somesuch. That's why I think it's a crappy solution, and that's exactly  
 the source of the if (array) dilemma in the first place.

The ability to express non-existance has nothing to do with the "if(x)"  
statement. The "if(x)" statement's purpose is not specifically to test for  
non-existance. I repeat:
   "if(x)" compares 'x' to null or 0

That's it.

Yes, you can use it to test for non-existance with reference and pointer  
types. That however is not it's purpose.



And we have std.boxer.

 Even ints will be nullable. I'm not sure how this is going to work  
 (haven't tried it), but at least it's orthogonal. It works everywhere.

Likely it's going to work like std.boxer except automagically.

 Whereas array.ptr is shaky, buggy, likely to change and IMHO unsemantic.  
 If we at least see that this is a problem, and that there is a need for  
 a more complete feature, maybe we can work towards a better solution.

IMO there is no problem with "if(x)". Not being able to represent  
non-existance is a tradeoff when using value types. std.boxer is the  
solution to those tradeoffs, that or using pointers.

 Note the same behavior can be obtained with returning a singleton
 empty just for eof, if desired. The singleton approach could arguably
 make the code more readable, too, since the reader wouldn't have to  
 know
 that
 null line meant eof. For example
  char[] line = din.readLine();
  while (line !is din.eofLine()) { ... line = din.readLine(); }
 where eofLine can return null or if the stream author wishes it can
 return some other unique empty string.

 That code is more descriptive, sure. However, null is more generic in
 application. You can use it 'everywhere' and everywhere it is used it  
 can
 have the same meaning. This means no 'special case' code is required  
 (like
 that shown above).

 That's not true. 'Everywhere' would mean complete orthogonality, and as  
 we know, this trick only works with certain types.

You are correct. I meant only to refer to reference and pointer types  
above which can express non existance.

However, I repeat (because this is an important fact): The purpose of  
"if(x)" is not to test for non-existance, it's purpose is to compare 'x'  
to null or 0. Nothing more, nothing less.

 But I agree with the premise, that nullness is a great (and easy)  
 special value that makes life simpler.

In this we agree. :)

 Thus a good solution should be built into the language.

IMO a good solution _is_ built into the language. Arrays, are a good  
solution to the problem posed by types which can represent non-existance,  
that problem being that the added expressiveness comes with greater risk  
of accidental use. Arrays cannot be null, yet can represent non-existance,  
they're a great solution.

Unfortunately they're not the solution to the "non-existance of value  
types" problem, which currently has 2 solutions:
   - std.boxer.
   - pointers.

Both these solutions involve references/pointers that can be null, so they  
suffer from the risk involved in using null, unlike arrays.

Regan

Jul 24 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 24 Jul 2005 20:47:43 +1200, Regan Heath <regan netwin.co.nz> wrote:
 Thus a good solution should be built into the language.

 IMO a good solution _is_ built into the language. Arrays, are a good  
 solution to the problem posed by types which can represent  
 non-existance, that problem being that the added expressiveness comes  
 with greater risk of accidental use. Arrays cannot be null, yet can  
 represent non-existance, they're a great solution.

Re-reading this I don't think I was clear enough. What I meant here is  
that arrays themselves are types which can represent non-existance,  
they're done in a clever way which enables the expressiveness without the  
cost. I didn't mean to imply that you could store a value type in an array  
and represent non-existance in some way (except of the whole array).

Regan

Jul 24 2005

D Programming

C/C++ Programming

Other

digitalmars.D - [Suggestion] Make if(array) illegal.