www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Walter - Should we use arrays as Null?

reply AJG <AJG_member pathlink.com> writes:
Hi Walter,

This is something that's confused me quite a bit and I think you are the only
one that can settle it for good. The question is whether we should be using null
as a special array value. Maybe it can be broken down to pieces:

1) Why can objects be null but arrays can't (given that _both_ are by-ref)?

# // Example:
# Object obj  = null; // This is a "proper" null.
# int[] array = null; // This is not. Only via array.ptr.

IMHO this is inconsistent. The former makes sense, the latter is weird. Another
way of looking at it is: why dumb-down arrays but not objects?

2) Is it a technical limitation (for now)?
3) Is support for "proper" null arrays planned?

I, for one, would _like_ to see support for both null arrays and continued
support for null objects. As Regan has argued (and now I'm a believer), the null
special value is very useful, and we should keep this distinction (vs. empty).
Perhaps you can clarify whether this is going to happen properly or not.

In my view, proper array nulls do _not_ exist. What we have right now is very
confusing because sometimes we can use the null value and sometimes we can't. It
is also fickle because the null value is tied to the pointer. Regan thinks that
you are planning on merging emptiness and existence into one (a bad thing).

Some of the problems (not technically "bugs"):
- array.length = 0 sets the pointer to null.
- static int[0] is not null, but new int[0] is.
- .dup of an empty string (static or not) also sets the pointer to null.
- static arrays can't have null pointers.

4) What exactly does [if (array)] mean (or theoretically should mean)?
- if (array.ptr)
- if (array.length)
- if (array == null)
- if (array is null)
- or some combination thereof?

==============

In short, I think it would be dangerous to use this feature if you are planning
on subtly phasing it out. Could you please shed some light on the situation?

Thanks!
--AJG.
Jul 28 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 29 Jul 2005 05:30:52 +0000 (UTC), AJG wrote:

 Hi Walter,

I know I'm not the big W. but here's my take this anyhow ;-)
 This is something that's confused me quite a bit and I think you are the only
 one that can settle it for good. The question is whether we should be using
null
 as a special array value. Maybe it can be broken down to pieces:
 
 1) Why can objects be null but arrays can't (given that _both_ are by-ref)?
 
 # // Example:
 # Object obj  = null; // This is a "proper" null.
 # int[] array = null; // This is not. Only via array.ptr.

I don't understand your concern. Both these are allowed and both work as expected; that is to say in both cases, after the assignment, the variable does not reference anything. What more than this are you expecting?
 IMHO this is inconsistent. The former makes sense, the latter is weird. Another
 way of looking at it is: why dumb-down arrays but not objects?

Huh? I still can't see what you are worried about. Object obj = null; // Makes sense. int[]array = null; // Also makes sense (to me anyhow).
 2) Is it a technical limitation (for now)?

Is *what* a limitation?
 3) Is support for "proper" null arrays planned?
 
 I, for one, would _like_ to see support for both null arrays and continued
 support for null objects. 

Who says that this support is going away?
As Regan has argued (and now I'm a believer), the null
 special value is very useful, and we should keep this distinction (vs. empty).

Absolutely! Both are good and distinct concepts.
 Perhaps you can clarify whether this is going to happen properly or not.
 
 In my view, proper array nulls do _not_ exist. 

But they do. If array.ptr is null, then array is a null array.
What we have right now is very
 confusing because sometimes we can use the null value and sometimes we can't.

Huh? When can't you use it?
 It is also fickle because the null value is tied to the pointer. 

Huh? Of course it is. What else could it be?
 Regan thinks that
 you are planning on merging emptiness and existence into one (a bad thing).

I don't think that Walter is planning on this.
 Some of the problems (not technically "bugs"):
 - array.length = 0 sets the pointer to null.

This is a bug and Walter has said so. He will fix this.
 - static int[0] is not null, but new int[0] is.

static arrays cannot be null (not reference anything) by their very nature. A static array must always reference some RAM somewhere. What do you think that 'new int[0]' should return?
 - .dup of an empty string (static or not) also sets the pointer to null.

This is because of the bug.
 - static arrays can't have null pointers.

Of course not. The 'static' attribute means that they occupy RAM that is allocated at compile time.
 4) What exactly does [if (array)] mean (or theoretically should mean)?
 - if (array.ptr)
 - if (array.length)
 - if (array == null)
 - if (array is null)
 - or some combination thereof?

It actually means ... if (array.ptr !is null || array.length != 0) which is a bit redundant because we can never have the situation where the ptr is null and the length is > 0.
 ==============
 
 In short, I think it would be dangerous to use this feature if you are planning
 on subtly phasing it out. Could you please shed some light on the situation?

Once Walter fixes the bug in which setting the length to zero also clears the ptr, I think we will have what you want. Hope I've helped. -- Derek Melbourne, Australia 29/07/2005 4:17:38 PM
Jul 28 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 29 Jul 2005 16:39:54 +1000, Derek Parnell <derek psych.ward> wrote:
 Regan thinks that
 you are planning on merging emptiness and existence into one (a bad  
 thing).

I don't think that Walter is planning on this.

I hope not. Someone once mentioned it as a goal Walter had. I've not heard from Big W himself. Regan
Jul 29 2005
prev sibling next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 29 Jul 2005 16:39:54 +1000, Derek Parnell <derek psych.ward> wrote:
 - static int[0] is not null, but new int[0] is.

static arrays cannot be null (not reference anything) by their very nature. A static array must always reference some RAM somewhere.

Why? I mean, I know what you mean: What's the point in having a non-existant static array. A static array always exists, therefore cannot be null. But doesn't that then make: int[0] a; illegal? I thought about this for a sec and decided that no, to make it illegal would likely annoy the heck out of a template programmer some time in the future. But, it can be null, can't it? I mean the data pointer, not the array 'reference'. I'm not sure an 'array reference' even exists for static arrays? My impression is that a static array is simply implemented as a pointer, the length property which is static is 'macro replaced' at compile time. In which case, the data pointer could be null, right? Statements like a.length would be fine, it's marco replaced after all. Statements like a[0] = 'a'; would crash, or give array bounds errors, just like any other array would. Maybe I'm missing some secret of their implementation.
 What do you think that 'new int[0]' should return?

Well, at first glance 'null'. You're asking for (0 * int.sizeof) memory which is 0 bytes. But, have you tried it in C/C++? The MSDN documentation states: "If size is 0, malloc allocates a zero-length item in the heap and returns a valid pointer to that item" There is nothing in the docs for "new" but a quick experiment showed the same behaviour as malloc for the statement "new int[0]". Ilya wondered immediately how it was possible to have a "zero-length item in the heap" so he tried DMC and Cygwin-GCC and found: "both returned at least 8 bytes." If you step back and just look it at a conceptual level you'd expect the statements: int[0] a; int[] a = new int[0]; to result in the same thing, surely? i mean 'a' is an instance of an 'int[0]' in both cases (whatever that is decided to be). Currently they don't and there appears to be 3 choices: - leave it as it, nothing is wrong. - make "int[0] a" null. - make "new int[0]" non-null. Regan
Jul 29 2005
parent AJG <AJG_member pathlink.com> writes:
Hi,

If you step back and just look it at a conceptual level you'd expect the  
statements:

int[0] a;
int[] a = new int[0];

to result in the same thing, surely? i mean 'a' is an instance of an  
'int[0]' in both cases (whatever that is decided to be).

Exactly.
Currently they don't and there appears to be 3 choices:
  - leave it as it, nothing is wrong.

Something _is_ wrong, IMHO.
  - make "int[0] a" null.

Two wrongs don't a right make. ;)
  - make "new int[0]" non-null.

Bingo!
Regan

--AJG.
Jul 29 2005
prev sibling next sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
 Some of the problems (not technically "bugs"):
 - array.length = 0 sets the pointer to null.

This is a bug and Walter has said so. He will fix this.

I don't remember what Walter said but I hope he thinks about the options. There are three factors involved (that I can see): 1) setting length to/from 0 2) slicing to a 0 length array and appending to a 0 length array 3) the +1 that gets added to every array allocation which makes powers-of-2 allocations the most inefficient (takes 2x the memory of what you asked for) Currently item 3 is added because one can slice off the end of an array and then ask to grow that. Should 2 behave like 1 or should 1 behave like 2? I could imagine a solution where appending to a zero-length array reallocs like setting the length from 0 reallocs. In any case there isn't a pain-free solution.
Jul 29 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi Derek,

I know I'm not the big W. but here's my take this anyhow ;-)

 This is something that's confused me quite a bit and I think you are the only
 one that can settle it for good. The question is whether we should be using
null
 as a special array value. Maybe it can be broken down to pieces:
 
 1) Why can objects be null but arrays can't (given that _both_ are by-ref)?
 
 # // Example:
 # Object obj  = null; // This is a "proper" null.
 # int[] array = null; // This is not. Only via array.ptr.

I don't understand your concern. Both these are allowed and both work as expected; that is to say in both cases, after the assignment, the variable does not reference anything. What more than this are you expecting?

Sure, they both "work" to a certain extent. But if you try to access a property on a null object -that's illegal. On a null array, it's not. That's not a null array. That's a pseudo-null reference that never goes away.
 IMHO this is inconsistent. The former makes sense, the latter is weird. Another
 way of looking at it is: why dumb-down arrays but not objects?

Huh? I still can't see what you are worried about.

Arrays are dumbed-down so that you can do things like: foreach (char c; null) { // do something } NULL, in my opinion, is _not_ the same as empty. BUT, the above operation makes it so. Instead, it should throw an exception at the very least, or even better, it could be detected at compile-time.
 2) Is it a technical limitation (for now)?


The distinction I made before between the nullness of objects (which is complete), and that of arrays (which is incomplete). I asked this because perhaps it was due to the way D properties worked (more like functions), or somesuch, meaning, it was not _intended_ to be that way.
 3) Is support for "proper" null arrays planned?
 
 I, for one, would _like_ to see support for both null arrays and continued
 support for null objects. 

Who says that this support is going away?

If Walter decides emptiness and existence should be one. This already happens in the language. Maybe it's due to bugs, maybe not. That's why I asked Walter for his "vision," if you will, regarding arrays and nulls. If array null disappears, it's likely object null will also disappear. Both of these worry me. But if indeed they are just bugs, then why doesn't Walter say so?
As Regan has argued (and now I'm a believer), the null
 special value is very useful, and we should keep this distinction (vs. empty).

Absolutely! Both are good and distinct concepts.

In theory, yes. In D, not entirely. Please see below:
 Perhaps you can clarify whether this is going to happen properly or not.
 
 In my view, proper array nulls do _not_ exist. 

But they do. If array.ptr is null, then array is a null array.

Why should be have to resort to array.ptr for nullness? Why can't the _array itself_ be null? An object _can_ be null by itself, no need to check an "object.ptr". In fact, on a null object .ptr would throw. You have to acknowledge this is a significant difference.
What we have right now is very
 confusing because sometimes we can use the null value and sometimes we can't.

Huh? When can't you use it?

I can't use it when the "bugs" get in my way. And they just so happen to get in the way a lot. I'm working with databases right not, and essentially there's no way to have a string represent a DBNULL, because when I dup an empty string, it _too_ becomes NULL.
 It is also fickle because the null value is tied to the pointer. 

Huh? Of course it is. What else could it be?

It could be, say, a simple boolean. Or, it could be, say, like objects. Objects don't rely on object.ptr, why should arrays?
 Regan thinks that
 you are planning on merging emptiness and existence into one (a bad thing).

I don't think that Walter is planning on this.

I certainly hope not, but how can we be sure? This is why I asked.
 Some of the problems (not technically "bugs"):
 - array.length = 0 sets the pointer to null.

This is a bug and Walter has said so. He will fix this.

Just out of curiosity, is there a post that I could read regarding this? I'd really like to see what he said.
 - static int[0] is not null, but new int[0] is.

static arrays cannot be null (not reference anything) by their very nature.

Their very nature says nothing of nullness. It just means allocate in a different area of memory.
A static array must always reference some RAM somewhere.

Why? Why can't it reference null? Conceptually, I don't see a problem. But maybe this is one of the "technical limitations" I was talking about.
What do you think that 'new int[0]' should return? 

It should return a NON-null empty array. In current terminology: in[] arr = new int[0]; if (arr) // this should be TRUE. if (arr.length) // this should be FALSE.
 - .dup of an empty string (static or not) also sets the pointer to null.

This is because of the bug.

Well this subtle bugs renders DB-null impossible because as it happens .dups are fairly common. This is what I said about it being "fickle." char[] s = ""; // here you have it. char[] p = s.dup; // now you don't. very fickle.
 - static arrays can't have null pointers.

Of course not. The 'static' attribute means that they occupy RAM that is allocated at compile time.

This is a technical limitation. Once again, conceptually, it should be able to point to static null just as well. Perhaps allocate 0-bytes? In other words, this is a problem with the implementation. The language itself shouldn't be limited because of this.
 
 4) What exactly does [if (array)] mean (or theoretically should mean)?
 - if (array.ptr)
 - if (array.length)
 - if (array == null)
 - if (array is null)
 - or some combination thereof?

It actually means ... if (array.ptr !is null || array.length != 0)

which is a bit redundant because we can never have the situation where the
ptr is null and the length is > 0.

IIRC this was deduced from a single dissasembly, wasn't it? Is it _always_ the same thing? (static/dynamic/associative)?
 ==============
 
 In short, I think it would be dangerous to use this feature if you are planning
 on subtly phasing it out. Could you please shed some light on the situation?

Once Walter fixes the bug in which setting the length to zero also clears the ptr, I think we will have what you want. Hope I've helped.

And the very important duping "bug" (for DBs). And the inconsistency with static arrays. And I'm sure I could find some more problems. But first I need to know whether they are problems in the first place. Only Walter knows... Cheers, --AJG.
Jul 29 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 29 Jul 2005 14:19:06 +0000 (UTC), AJG wrote:

 Hi Derek,
 
I know I'm not the big W. but here's my take this anyhow ;-)

 This is something that's confused me quite a bit and I think you are the only
 one that can settle it for good. The question is whether we should be using
null
 as a special array value. Maybe it can be broken down to pieces:
 
 1) Why can objects be null but arrays can't (given that _both_ are by-ref)?
 
 # // Example:
 # Object obj  = null; // This is a "proper" null.
 # int[] array = null; // This is not. Only via array.ptr.

I don't understand your concern. Both these are allowed and both work as expected; that is to say in both cases, after the assignment, the variable does not reference anything. What more than this are you expecting?

Sure, they both "work" to a certain extent. But if you try to access a property on a null object -that's illegal. On a null array, it's not. That's not a null array. That's a pseudo-null reference that never goes away.

Yes ... but so what? Is that behavior hurting anyone? Object properties are user-defined and all take 'this' as an automatic argument, thus they require an instance. Array properties are built-in to D. The 'array' is the instance. Don't confuse its elements as being instances of the array. Thus 'int[] array;' creates an instance of the array even though it has no data yet. And because the instance exists, you can use its properties. There is no inconsistency here. I think you have merged object nullness and array nullness into the same meaning. But they are not the same thing. A null object is a placeholder into which you can later store a reference to an object instance. A null array is an instance of an array that has no data.
 IMHO this is inconsistent. The former makes sense, the latter is weird. Another
 way of looking at it is: why dumb-down arrays but not objects?

Huh? I still can't see what you are worried about.

Arrays are dumbed-down so that you can do things like: foreach (char c; null) { // do something } NULL, in my opinion, is _not_ the same as empty. BUT, the above operation makes it so. Instead, it should throw an exception at the very least, or even better, it could be detected at compile-time.

You use the phrase 'dumbed-down' where as I see this as a smart thing. And just because the coder does some weirdo foreach() statement, doesn't mean the language is wrong. And by the way, you code example does produce a compiler error - " foreach: void* is not an aggregate type". To get it to compile you have to use 'foreach(char c; cast(char[])null) {};' which shows to me that somebody with a big stick needs to chat with the coder.
 2) Is it a technical limitation (for now)?


The distinction I made before between the nullness of objects (which is complete), and that of arrays (which is incomplete). I asked this because perhaps it was due to the way D properties worked (more like functions), or somesuch, meaning, it was not _intended_ to be that way.

But arrays are not classes or objects. So what if they are both reference types. They are still not the same beasties.
 3) Is support for "proper" null arrays planned?
 
 I, for one, would _like_ to see support for both null arrays and continued
 support for null objects. 

Who says that this support is going away?

If Walter decides emptiness and existence should be one. This already happens in the language. Maybe it's due to bugs, maybe not. That's why I asked Walter for his "vision," if you will, regarding arrays and nulls. If array null disappears, it's likely object null will also disappear. Both of these worry me. But if indeed they are just bugs, then why doesn't Walter say so?

That could be said about lots of things ;-)
As Regan has argued (and now I'm a believer), the null
 special value is very useful, and we should keep this distinction (vs. empty).

Absolutely! Both are good and distinct concepts.

In theory, yes. In D, not entirely. Please see below:

I distinctly remember reading a note from Walter saying that he was surprised that setting the length to zero also nulled the pointer. He has code in Phobos that assumes that this is not the right behavior.
 Perhaps you can clarify whether this is going to happen properly or not.
 
 In my view, proper array nulls do _not_ exist. 

But they do. If array.ptr is null, then array is a null array.

Why should be have to resort to array.ptr for nullness? Why can't the _array itself_ be null?

Because its like saying, why can't an object instance be null. The array IS an instance. A null array means something different to a null object.
An object _can_ be null by itself, no need to check an
 "object.ptr". In fact, on a null object .ptr would throw. You have to
 acknowledge this is a significant difference.

Yes it is a difference. So what? Learn it and move on. This is D, not C/C++.
What we have right now is very
 confusing because sometimes we can use the null value and sometimes we can't.

Huh? When can't you use it?

I can't use it when the "bugs" get in my way. And they just so happen to get in the way a lot. I'm working with databases right not, and essentially there's no way to have a string represent a DBNULL, because when I dup an empty string, it _too_ becomes NULL.

Yep. Been there, done that. I just wish he'd fix this bug. Its very easy to fix.
 It is also fickle because the null value is tied to the pointer. 

Huh? Of course it is. What else could it be?

It could be, say, a simple boolean. Or, it could be, say, like objects. Objects don't rely on object.ptr, why should arrays?

Because they are arrays and not class instances.
 Regan thinks that
 you are planning on merging emptiness and existence into one (a bad thing).

I don't think that Walter is planning on this.

I certainly hope not, but how can we be sure? This is why I asked.
 Some of the problems (not technically "bugs"):
 - array.length = 0 sets the pointer to null.

This is a bug and Walter has said so. He will fix this.

Just out of curiosity, is there a post that I could read regarding this? I'd really like to see what he said.

Yes, but I don't know how to search for it.
 - static int[0] is not null, but new int[0] is.

static arrays cannot be null (not reference anything) by their very nature.

Their very nature says nothing of nullness. It just means allocate in a different area of memory.

By 'static' are you meaning non-dynamic arrays or single-instance arrays. For example, which of these lines are static to you? void func() { int[] a; int[1] b; static int[1] c; } To me, I only call array 'c' a static array. The array 'a' is a dynamic(-length) array and array 'b' is a fixed-length array. But array 'a' and 'b' are not single-instance arrays. After checking with the usage in D itself, it seems that D uses static ambiguously when it comes to arrays.
A static array must always reference some RAM somewhere.

Why? Why can't it reference null? Conceptually, I don't see a problem. But maybe this is one of the "technical limitations" I was talking about.

Because static arrays are allocated RAM at compile time and they reference themselves. Because they exist they can't be null. Given ... static int[1] x; You will find that ... x.ptr == &x And because &x will always return a non-null, then x.ptr is always non-null. -- Derek Parnell Melbourne, Australia 30/07/2005 2:09:05 AM
Jul 29 2005
parent AJG <AJG_member pathlink.com> writes:
Hi Derek,

 Sure, they both "work" to a certain extent. But if you try to access a property
 on a null object -that's illegal. On a null array, it's not. That's not a null
 array. That's a pseudo-null reference that never goes away.

Yes ... but so what? Is that behavior hurting anyone?

Well, if we look at it that way, then everything becomes a lot easier, doesn't it? Whether it hurts anyone or not is not the way to build a language. There are things that are correct, and those that are not. Arrays, as it stands, _break_ reference semantics. I don't know whether this hurts anyone or not, but it is certainly inconsistent, and it is my view simply incorrect.
Object properties are user-defined and all take 'this' as an automatic
argument, thus they require an instance. Array properties are built-in to
D. The 'array' is the instance.

Technically speaking, this is half-right: # char[] a = "hello"; // whatever. # char[] b = a; // This is a reference to a. # b.length = 2; // Now b became its own instance. Semantically speaking, I think this is wrong. int[] arr // This is the reference. = new int[0] // _This_ is the array.
Don't confuse its elements as being
instances of the array.

I never did. Elements are fine the way they are. But by your logic, perhaps we should be able to do this: # char[] nullArray = null; # nullArray[5]; // valid, but returns null, since there is no item. Or wouldn't you say this is wrong? Does it "hurt" anyone? Nah. In fact, it will help by preventing those annoying ArrayOutOfBounds thingamajjigs.
Thus 'int[] array;' creates an instance of the array even though it has no
data yet. And because the instance exists, you can use its properties.

This just doesn't make sense. int[] array creates a _reference_. That's the very definition of a reference. That's why arrays are reference-types. It's essentially a nicer version of a pointer.
There is no inconsistency here. I think you have merged object nullness and
array nullness into the same meaning. But they are not the same thing. A
null object is a placeholder into which you can later store a reference to
an object instance. A null array is an instance of an array that has no
data. 

If this is so, then arrays can't be called reference types. That's not what references do. Frankly, I wouldn't know what the heck to call arrays if these are the semantics we're supposed to follow.
You use the phrase 'dumbed-down' where as I see this as a smart thing. And
just because the coder does some weirdo foreach() statement, doesn't mean
the language is wrong. 

It's not whether it's a smart-thing or a dumb thing. I see it as being a dumbing down, but that's not the point. The point is that it muddies the distinction between _empty_ and _non-existant_. Conceptually, you can't iterate thru a non-existant array. It doesn't exist. It should be a bug. Conceptually, you _can_ iterate thru an empty array. It exists and has no elements, thus no iteration would happen, but the construct is valid. With this "smart" feature, the two are fused into one. Empty and Non-existant can _both_ be iterated thru. They both produce the same result: 0 iterations. This I think is incorrect. 0 iterations == 0 elements // correct 0 iterations == null // incorrect
And by the way, you code example does produce a
compiler error - " foreach: void* is not an aggregate type". To get it to
compile you have to use 'foreach(char c; cast(char[])null) {};' which shows
to me that somebody with a big stick needs to chat with the coder.

Sorry, my bad. I don't have access to DMD. But you know what I meant: char[] nullArray = null; foreach (char c; nullArray) { /* do something */ }
 2) Is it a technical limitation (for now)?


The distinction I made before between the nullness of objects (which is complete), and that of arrays (which is incomplete). I asked this because perhaps it was due to the way D properties worked (more like functions), or somesuch, meaning, it was not _intended_ to be that way.

But arrays are not classes or objects. So what if they are both reference types. They are still not the same beasties.

Certainly they are not the same. But they both have the same "nature" as you put it, -references. As it is, arrays are breaking reference behaviour too, as my example above showed. Or do you know agree that arrays are reference types either?
I distinctly remember reading a note from Walter saying that he was
surprised that setting the length to zero also nulled the pointer. He has
code in Phobos that assumes that this is not the right behavior.

This is all very circumstantial, but oh well...
 Perhaps you can clarify whether this is going to happen properly or not.
 
 In my view, proper array nulls do _not_ exist. 

But they do. If array.ptr is null, then array is a null array.

Why should be have to resort to array.ptr for nullness? Why can't the _array itself_ be null?

Because its like saying, why can't an object instance be null. The array IS an instance. A null array means something different to a null object.

Once again, this view is incorrect. An array can be both a reference and an instance. int[] arr // This is the reference. = new int[0] // _This_ is the array.
An object _can_ be null by itself, no need to check an
 "object.ptr". In fact, on a null object .ptr would throw. You have to
 acknowledge this is a significant difference.

Yes it is a difference. So what? Learn it and move on. This is D, not C/C++.

Why learn and move on from something that is clearly wrong? I'd rather fix it, thank you very much ;)
Yep. Been there, done that. I just wish he'd fix this bug. Its very easy to
fix.

It's very frustrating. I would sue this bug if I could. :p
 It is also fickle because the null value is tied to the pointer. 

Huh? Of course it is. What else could it be?

It could be, say, a simple boolean. Or, it could be, say, like objects. Objects don't rely on object.ptr, why should arrays?

Because they are arrays and not class instances.

Assuming the _wrong_ semantics stay in place, why couldn't we do something like: array.isNull or array.exists as a simple boolean check, instead of the more complicated array.ptr that is riddled with "bugs?" That way we separate the implementation details (how the array.ptr is handled internally), from the semantics (whether the array exists or not).
 Regan thinks that
 you are planning on merging emptiness and existence into one (a bad thing).

I don't think that Walter is planning on this.

I certainly hope not, but how can we be sure? This is why I asked.
 Some of the problems (not technically "bugs"):
 - array.length = 0 sets the pointer to null.

This is a bug and Walter has said so. He will fix this.

Just out of curiosity, is there a post that I could read regarding this? I'd really like to see what he said.

Yes, but I don't know how to search for it.

Well, perhaps he can clarify his position now. ---- Re:Static <snip>
By 'static' are you meaning non-dynamic arrays or single-instance arrays.

It doesn't matter. I don't understand why you bring technical implementation details to the discussion, when I am talking solely about the concept. Conceptually, whether an array is static or not has no effect on whether the array can exist or not. Static changes allocation semantics, _not_ existance semantics. That's my point. Now, if you tell me: We can't have that because of a technical limitation, then I would understand. However, the point in your memory argument can be fixed the way Regan and others have mentioned. Cheers, --AJG.
Jul 29 2005
prev sibling next sibling parent reply Niko Korhonen <niktheblak hotmail.com> writes:
AJG wrote:
 1) Why can objects be null but arrays can't (given that _both_ are by-ref)?
 
 # // Example:
 # Object obj  = null; // This is a "proper" null.
 # int[] array = null; // This is not. Only via array.ptr.

Do you want all operations on a null array, such as: # int[] array = null; # int len = array.length; // <-- segfault to segfault (to throw a NullPointerException in managed environments parlance), like they do on a null object reference? -- Niko Korhonen SW Developer
Jul 29 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

In article <dcclvb$v0i$1 digitaldaemon.com>, Niko Korhonen says...
AJG wrote:
 1) Why can objects be null but arrays can't (given that _both_ are by-ref)?
 
 # // Example:
 # Object obj  = null; // This is a "proper" null.
 # int[] array = null; // This is not. Only via array.ptr.

Do you want all operations on a null array, such as: # int[] array = null; # int len = array.length; // <-- segfault to segfault (to throw a NullPointerException in managed environments parlance), like they do on a null object reference?

For starters, yes. Why should objects be different than arrays when they are both reference types? This is inconsistent IMHO. Thanks, --AJG.
Jul 29 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"AJG" <AJG_member pathlink.com> wrote in message 
news:dcde0p$1kop$1 digitaldaemon.com...
 Hi,

 In article <dcclvb$v0i$1 digitaldaemon.com>, Niko Korhonen says...
AJG wrote:
 1) Why can objects be null but arrays can't (given that _both_ are 
 by-ref)?

 # // Example:
 # Object obj  = null; // This is a "proper" null.
 # int[] array = null; // This is not. Only via array.ptr.

Do you want all operations on a null array, such as: # int[] array = null; # int len = array.length; // <-- segfault to segfault (to throw a NullPointerException in managed environments parlance), like they do on a null object reference?

For starters, yes. Why should objects be different than arrays when they are both reference types? This is inconsistent IMHO.

I think you'll have a hard time getting lots of support for that. I much prefer the current behavior and I bet there is lots of existing D code that assumes one can test the length of an array at any time. Since an array is not an object I see no problem with the "inconistency" - an array is an array.
Jul 29 2005
next sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi Ben,

Do you want all operations on a null array, such as:

# int[] array = null;
# int len = array.length; // <-- segfault

to segfault (to throw a NullPointerException in managed environments
parlance), like they do on a null object reference?

For starters, yes. Why should objects be different than arrays when they are both reference types? This is inconsistent IMHO.

I think you'll have a hard time getting lots of support for that. I much prefer the current behavior and I bet there is lots of existing D code that assumes one can test the length of an array at any time. Since an array is not an object I see no problem with the "inconistency" - an array is an array.

I agree that there won't be much support for this. I don't suppose it will change. But ideally that's what the behaviour should be. Say you had no D code written at the moment, would you support the change? On the other hand, would you support access to object properties that don't require an instance from a null reference? It's the same thing, isn't it? Yet aren't those illegal at the moment? (don't have DMD at hand). Cheers, --AJG. "What is popular is not always right; what is right is not always popular."
Jul 29 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"AJG" <AJG_member pathlink.com> wrote in message 
news:dcdqig$1us6$1 digitaldaemon.com...
 Hi Ben,

Do you want all operations on a null array, such as:

# int[] array = null;
# int len = array.length; // <-- segfault

to segfault (to throw a NullPointerException in managed environments
parlance), like they do on a null object reference?

For starters, yes. Why should objects be different than arrays when they are both reference types? This is inconsistent IMHO.

I think you'll have a hard time getting lots of support for that. I much prefer the current behavior and I bet there is lots of existing D code that assumes one can test the length of an array at any time. Since an array is not an object I see no problem with the "inconistency" - an array is an array.

I agree that there won't be much support for this. I don't suppose it will change. But ideally that's what the behaviour should be. Say you had no D code written at the moment, would you support the change?

no
 On the other hand, would you support access to object properties that 
 don't
 require an instance from a null reference?

no (assuming you aren't referring to static class properties)
 It's the same thing, isn't it?

no
Yet aren't those illegal at the moment? (don't have DMD at hand).

yes
 Cheers,
 --AJG.

 "What is popular is not always right; what is right is not always 
 popular."

Jul 29 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi Ben,

Ok, I don't think I said exactly what I meant before. Let's look at this piece
by piece:

1) Arrays are ("in theory") reference types.
2) Objects are reference types.
3) Arrays are not objects.
4) So, even though Arrays and Objects are different, they share (or should)
reference semantics.

I believe most of us can agree up to here.

My overall point is that D is not keeping its promise regarding Arrays obeying
reference semantics. Whether this is good or not is debatable, but at least it
should be noted. Do you agree that D's arrays break reference semantics?

Thanks,
--AJG.
Jul 29 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"AJG" <AJG_member pathlink.com> wrote in message 
news:dcdtq5$21q6$1 digitaldaemon.com...
 Hi Ben,

 Ok, I don't think I said exactly what I meant before. Let's look at this 
 piece
 by piece:

 1) Arrays are ("in theory") reference types.

no - an array is two pieces of information: (1) a pointer to the data and (2) a length. The pointer can be considered a reference but the length information is definitely not manipulated by reference. For example int[] a,b; a.length = 10; b = a; b.length = 100; assert( a.length == 10 ); If arrays had "pure" reference semantics in the same way objects do then one would expect a.length == 100. In casual conversations one often says arrays have reference semantics but the unspoken assumption is that one is talking about the data pointer. This can confuse people who aren't used to D array semantics.
 2) Objects are reference types.
 3) Arrays are not objects.

these I agree with.
 4) So, even though Arrays and Objects are different, they share (or 
 should)
 reference semantics.

naturally I disagree given 1).
 I believe most of us can agree up to here.

 My overall point is that D is not keeping its promise regarding Arrays 
 obeying
 reference semantics. Whether this is good or not is debatable, but at 
 least it
 should be noted. Do you agree that D's arrays break reference semantics?

The length information is not manipulated with reference semantics. I think this is a good design choice that shouldn't be changed. I agree it is different than object behavior but that's well worth the benefits of the current system. If there are statements in the D doc that say "arrays have reference sematnics" I think they should be changed to be more accurate and say something like "the array data has reference semantics". It's common to ignore the length field when you are casually talking about arrays.
Jul 29 2005
next sibling parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
I've been following this, but have as of yet been unable to express my problem
with this whole issue. My feelings line up with yours ben.

Arrays are not pointers, nor are they reference types.  In C, pointers happen to
be able to be dereferenced with the array index operator, but that's a side
effect of implementation.  If something is a true array, I think there is a
reasonable expectation that the array always points to some data.

array != null; //should always be true; Especially in the case of D where the
array is really a structure with a ptr and a length.


However, if for some reason you need the reference symantics, those are not
denied to you.  You're free to do this:

int* array = new int[100];

My 2 cents
-Sha

In article <dcdur3$232d$1 digitaldaemon.com>, Ben Hinkle says...
"AJG" <AJG_member pathlink.com> wrote in message 
news:dcdtq5$21q6$1 digitaldaemon.com...
 Hi Ben,

 Ok, I don't think I said exactly what I meant before. Let's look at this 
 piece
 by piece:

 1) Arrays are ("in theory") reference types.

no - an array is two pieces of information: (1) a pointer to the data and (2) a length. The pointer can be considered a reference but the length information is definitely not manipulated by reference. For example int[] a,b; a.length = 10; b = a; b.length = 100; assert( a.length == 10 ); If arrays had "pure" reference semantics in the same way objects do then one would expect a.length == 100. In casual conversations one often says arrays have reference semantics but the unspoken assumption is that one is talking about the data pointer. This can confuse people who aren't used to D array semantics.
 2) Objects are reference types.
 3) Arrays are not objects.

these I agree with.
 4) So, even though Arrays and Objects are different, they share (or 
 should)
 reference semantics.

naturally I disagree given 1).
 I believe most of us can agree up to here.

 My overall point is that D is not keeping its promise regarding Arrays 
 obeying
 reference semantics. Whether this is good or not is debatable, but at 
 least it
 should be noted. Do you agree that D's arrays break reference semantics?

The length information is not manipulated with reference semantics. I think this is a good design choice that shouldn't be changed. I agree it is different than object behavior but that's well worth the benefits of the current system. If there are statements in the D doc that say "arrays have reference sematnics" I think they should be changed to be more accurate and say something like "the array data has reference semantics". It's common to ignore the length field when you are casually talking about arrays.

Jul 29 2005
parent AJG <AJG_member pathlink.com> writes:
Hi,

I've been following this, but have as of yet been unable to express my problem
with this whole issue. My feelings line up with yours ben.

Arrays [...] are not reference types.

If this is so, it is unfortunate. I'm asking Walter to clarify this, that is all.
In C, pointers happen to
be able to be dereferenced with the array index operator, but that's a side
effect of implementation.  If something is a true array, I think there is a
reasonable expectation that the array always points to some data.

This is not a reasonable expectation. We are talking about two things here: a) Existance. b) Emptiness. Even in C, you can express both. I'm asking whether Walter thinks we should do that in D or not. Some examples (of the 3 possible cases): char[] string = "hi"; // non-null non-empty array. char[] empty = ""; // non-null empty array. char[] cnull = null; // null array.
array != null; //should always be true; Especially in the case of D where the
array is really a structure with a ptr and a length.

This is not the case in D. array != null is sometimes false, because it's comparing the pointer. This is the very thing that allows an array to be non-existant (a true, NULL array). Thus, that was my original question to Walter, whether we should rely on this behaviour or if he's planning on phasing it out.
However, if for some reason you need the reference symantics, those are not
denied to you.  You're free to do this:

int* array = new int[100];

Yes, there's nothing like regressing a couple of decades ;) I think one of D's design goals was to make pointer use unnecessary. Using a pointer you lose safety, lose info (.length, etc.) and lose functionality. This is not a valid solution, IMHO. Cheers, --AJG.
Jul 29 2005
prev sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

Well, this is certainly an interesting development. So, to recap, arrays in D
are not reference types. I was always under the impression that they were. This
is very saddening to me.

Is this correct? Walter, could you clarify this?

 1) Arrays are ("in theory") reference types.

no - an array is two pieces of information: (1) a pointer to the data and (2) a length. The pointer can be considered a reference but the length information is definitely not manipulated by reference. For example

What about .dup, .sort, .reverse, .sizeof? Do those have reference semantics or not?
If arrays had "pure" reference semantics in the same way objects do then one 
would expect a.length == 100. In casual conversations one often says arrays 
have reference semantics but the unspoken assumption is that one is talking 
about the data pointer. This can confuse people who aren't used to D array 
semantics.

Yes, arrays semantics are definitely weird. I was hoping they were references and that .length was simply buggy, but perhaps it's by design. In addition, IMO this "unspoken assumption" is not mentioned anywhere in the docs.
 My overall point is that D is not keeping its promise regarding Arrays 
 obeying
 reference semantics. Whether this is good or not is debatable, but at 
 least it
 should be noted. Do you agree that D's arrays break reference semantics?

The length information is not manipulated with reference semantics. I think this is a good design choice that shouldn't be changed.

Why is it a good design choice? Forget about legacy for a second. Wouldn't it be much simpler, more consistent and less confusing to make arrays pure reference types? It would eliminate a lot of the various special cases that we have to deal with given the current convoluted semantics. It would also align their behaviour to that of objects, much like a struct's behaviour is aligned to that of a primitive.
I agree it is 
different than object behavior but that's well worth the benefits of the 
current system.

Like what? Which benefits?
If there are statements in the D doc that say "arrays have 
reference sematnics" I think they should be changed to be more accurate and 
say something like "the array data has reference semantics". It's common to 
ignore the length field when you are casually talking about arrays.

Or perhaps the arrays themselves could be changed to reference types? ;) Cheers, --AJG
Jul 29 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"AJG" <AJG_member pathlink.com> wrote in message 
news:dce24h$272t$1 digitaldaemon.com...
 Hi,

 Well, this is certainly an interesting development. So, to recap, arrays 
 in D
 are not reference types. I was always under the impression that they were. 
 This
 is very saddening to me.

 Is this correct? Walter, could you clarify this?

 1) Arrays are ("in theory") reference types.

no - an array is two pieces of information: (1) a pointer to the data and (2) a length. The pointer can be considered a reference but the length information is definitely not manipulated by reference. For example

What about .dup, .sort, .reverse, .sizeof? Do those have reference semantics or not?

Yes - they "have reference semantics" in the sense that they act on the data (though in the case of .dup and .sizeof the reference/value semantics is irrelevant).
If arrays had "pure" reference semantics in the same way objects do then 
one
would expect a.length == 100. In casual conversations one often says 
arrays
have reference semantics but the unspoken assumption is that one is 
talking
about the data pointer. This can confuse people who aren't used to D array
semantics.

Yes, arrays semantics are definitely weird. I was hoping they were references and that .length was simply buggy, but perhaps it's by design. In addition, IMO this "unspoken assumption" is not mentioned anywhere in the docs.

The first sentance of http://www.digitalmars.com/d/arrays.html section Dynamic Arrays says "Dynamic arrays consist of a length and a pointer to the array data." I agree, though, that the doc needs to emphasize this more. I added some feedback to the Wiki about arrays asking for examples illustrating how array assignment works.
 My overall point is that D is not keeping its promise regarding Arrays
 obeying
 reference semantics. Whether this is good or not is debatable, but at
 least it
 should be noted. Do you agree that D's arrays break reference semantics?

The length information is not manipulated with reference semantics. I think this is a good design choice that shouldn't be changed.

Why is it a good design choice? Forget about legacy for a second. Wouldn't it be much simpler, more consistent and less confusing to make arrays pure reference types? It would eliminate a lot of the various special cases that we have to deal with given the current convoluted semantics. It would also align their behaviour to that of objects, much like a struct's behaviour is aligned to that of a primitive.

It would be very annoying to have to check for null before asking if an array length is zero. Plus the whole design of slicing would need to be redone and probably would lose much of the efficiency it has today. I view an array as much closer to a struct than an object: an array is just like a struct with a pointer field and a length field. That's the simplest description of what an array is. Comparing them to objects is the wrong analogy.
I agree it is
different than object behavior but that's well worth the benefits of the
current system.

Like what? Which benefits?

see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone. See Java for examples of making people check for null before asking for the length.
If there are statements in the D doc that say "arrays have
reference sematnics" I think they should be changed to be more accurate 
and
say something like "the array data has reference semantics". It's common 
to
ignore the length field when you are casually talking about arrays.

Or perhaps the arrays themselves could be changed to reference types? ;)

Sure - one can change anything in D if the tradeoffs are worth it. I happen to believe D's dynamic array semantics are an excellent balance of tradeoffs.
Jul 29 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

 What about .dup, .sort, .reverse, .sizeof?
 Do those have reference semantics or not?

Yes - they "have reference semantics" in the sense that they act on the data (though in the case of .dup and .sizeof the reference/value semantics is irrelevant).

Just to make sure I understand: char[] A = "123"; char[] B = A; B.reverse; // B will be 321 // A will be 321 also. // correct? BUT: char[] A = "123"; char[] B = A; B.length = 2; // B will be 12 // A will be remain 123. // correct? If this is true, then it seems rather arbitrary to me that .length should break reference semantics. Why not keep it in line to how the rest work? (Specially since it's not related to the benefits you talked about before).
The first sentance of http://www.digitalmars.com/d/arrays.html section 
Dynamic Arrays says "Dynamic arrays consist of a length and a pointer to the 
array data." I agree, though, that the doc needs to emphasize this more. I 
added some feedback to the Wiki about arrays asking for examples 
illustrating how array assignment works.

Ok. This would be an improvement.
 Why is it a good design choice?


It would be very annoying to have to check for null before asking if an 
array length is zero. Plus the whole design of slicing would need to be 
redone and probably would lose much of the efficiency it has today. 

Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.
I view 
an array as much closer to a struct than an object: an array is just like a 
struct with a pointer field and a length field. That's the simplest 
description of what an array is. Comparing them to objects is the wrong 
analogy.

Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.
I agree it is
different than object behavior but that's well worth the benefits of the
current system.

Like what? Which benefits?

see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone.

It wouldn't be error prone. Perhaps you mean exceptions would be thrown, and that's fine, but there wouldn't be unnoticed errors. But in general I agree with you, slicing would lose its "magic" having to check for nulls.
See Java for 
examples of making people check for null before asking for the length.

You can also learn from their mistakes and avoid them.
If there are statements in the D doc that say "arrays have
reference sematnics" I think they should be changed to be more accurate 
and
say something like "the array data has reference semantics". It's common 
to
ignore the length field when you are casually talking about arrays.

Or perhaps the arrays themselves could be changed to reference types? ;)

Sure - one can change anything in D if the tradeoffs are worth it. I happen to believe D's dynamic array semantics are an excellent balance of tradeoffs.

I think the semantics could use a little rethinking and specially a bit of clarification. Cheers, --AJG.
Jul 29 2005
parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <dce4up$2cbc$1 digitaldaemon.com>, AJG says...
Hi,

 What about .dup, .sort, .reverse, .sizeof?
 Do those have reference semantics or not?

Yes - they "have reference semantics" in the sense that they act on the data (though in the case of .dup and .sizeof the reference/value semantics is irrelevant).

Just to make sure I understand: char[] A = "123"; char[] B = A; B.reverse; // B will be 321 // A will be 321 also. // correct?

yes - aside from the fact that you should dup the "123" before trying to modify it since "123" is put in read-only memory. Reverse acts in-place because it is a method of the array type - like sorting is in-place.
BUT:

char[] A = "123";
char[] B = A;
B.length = 2;
// B will be 12
// A will be remain 123.
// correct?

yes
If this is true, then it seems rather arbitrary to me that .length should break
reference semantics. Why not keep it in line to how the rest work? (Specially
since it's not related to the benefits you talked about before).

It is not arbitrary. There are advantages to the current design. I don't see why you say it is not related since it would be silly to have length do something different if there weren't benefits to making length special.
 Why is it a good design choice?


It would be very annoying to have to check for null before asking if an 
array length is zero. Plus the whole design of slicing would need to be 
redone and probably would lose much of the efficiency it has today. 

Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.

Where is the reaction in fear? I only see people trying to explain the current design and its advantages. I said I doubt a solution exists that would have the benefits of the current design while having reference semantics (if even reference semantics for length would be desirable). If you want to present some ideas that would be great - do whatever you want and enjoy (remember we're all doing this for fun).
I view 
an array as much closer to a struct than an object: an array is just like a 
struct with a pointer field and a length field. That's the simplest 
description of what an array is. Comparing them to objects is the wrong 
analogy.

Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.

uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.
I agree it is
different than object behavior but that's well worth the benefits of the
current system.

Like what? Which benefits?

see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone.

It wouldn't be error prone. Perhaps you mean exceptions would be thrown, and that's fine, but there wouldn't be unnoticed errors. But in general I agree with you, slicing would lose its "magic" having to check for nulls.

By error-prone I mean the programmer will introduce bugs into the code by forgetting to check for null every time they want to know if an array has any content (meaning non-zero length).
See Java for 
examples of making people check for null before asking for the length.

You can also learn from their mistakes and avoid them.

That's what D has now - it is avoiding the mistakes of Java by not requiring all those annoying null checks. Plus slicing is fast by not requiring memory allocations. Note in Java the length of an array is read-only so the whole question about length having value/reference semantics doesn't apply.
Jul 29 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi Ben,

If this is true, then it seems rather arbitrary to me that .length should break
reference semantics. Why not keep it in line to how the rest work? (Specially
since it's not related to the benefits you talked about before).

It is not arbitrary. There are advantages to the current design. I don't see why you say it is not related since it would be silly to have length do something different if there weren't benefits to making length special.

So then .length is related to slicing? How does the semantics of .length affect slicing? Or perhaps you meant other benefits?
 Why is it a good design choice?


It would be very annoying to have to check for null before asking if an 
array length is zero. Plus the whole design of slicing would need to be 
redone and probably would lose much of the efficiency it has today. 

Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.

Where is the reaction in fear? I only see people trying to explain the current design and its advantages. I said I doubt a solution exists that would have the benefits of the current design while having reference semantics (if even reference semantics for length would be desirable). If you want to present some ideas that would be great - do whatever you want and enjoy (remember we're all doing this for fun).

The general impression I get is that as soon as something creates the possibility of breaking existing code, then there is backlash. This would be fine for the embedded C language that runs medical heart devices. But for a language that isn't even out the door, it's disheartening (haha, no pun intended ;). Just my 2 cents.
I view 
an array as much closer to a struct than an object: an array is just like a 
struct with a pointer field and a length field. That's the simplest 
description of what an array is. Comparing them to objects is the wrong 
analogy.

Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.

uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.

SomeObject A = new SomeObject; SomeObject B = A; B.SomeProperty; // Operates on A. SomeStruct A; SomeStruct B = A; B.SomeProperty; // Operates on B. int[] A = new int[5]; int[] B = A; B.SomeProperty; // Operates on A; // _Except_ if it's .length. This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.
I agree it is
different than object behavior but that's well worth the benefits of the
current system.

Like what? Which benefits?

see above - checking all the time for null would be very annoying. Almost all the time with arrays one cares if the length is zero and making people check for null before asking that question is error-prone.

It wouldn't be error prone. Perhaps you mean exceptions would be thrown, and that's fine, but there wouldn't be unnoticed errors. But in general I agree with you, slicing would lose its "magic" having to check for nulls.

By error-prone I mean the programmer will introduce bugs into the code by forgetting to check for null every time they want to know if an array has any content (meaning non-zero length).

Ok.
See Java for 
examples of making people check for null before asking for the length.

You can also learn from their mistakes and avoid them.

That's what D has now - it is avoiding the mistakes of Java by not requiring all those annoying null checks. Plus slicing is fast by not requiring memory allocations. Note in Java the length of an array is read-only so the whole question about length having value/reference semantics doesn't apply.

I'm not suggesting making .length read-only. I'm suggesting making it operate on the same data it has a pointer to. Just like .sort or .reverse would. The way I see it, if you explicitly want to make a copy of the data, that's why there is dup. Why should .length secretely call .dup sometimes, and sometimes not? Cheers, --AJG.
Jul 30 2005
next sibling parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <dcgc3q$13i9$1 digitaldaemon.com>, AJG says...
Hi Ben,

If this is true, then it seems rather arbitrary to me that .length should break
reference semantics. Why not keep it in line to how the rest work? (Specially
since it's not related to the benefits you talked about before).

It is not arbitrary. There are advantages to the current design. I don't see why you say it is not related since it would be silly to have length do something different if there weren't benefits to making length special.

So then .length is related to slicing? How does the semantics of .length affect slicing? Or perhaps you meant other benefits?

I recommend you pursue some of your ideas where length is manipulated by reference and follow the dependencies to see how different dynamic arrays (and, yes, slicing) would be. In particular I recommend you learn more about slicing. I'm sorry if that sounds harsh but I've gotten the opinion now that you haven't really gotten experience with D arrays as they exist now.
 Why is it a good design choice?


It would be very annoying to have to check for null before asking if an 
array length is zero. Plus the whole design of slicing would need to be 
redone and probably would lose much of the efficiency it has today. 

Ok. This is a valid point. However, that's not to say the problem is insurmoutable. Solutions do exist. In fact, I have thought of a couple of possible solutions, but I'm afraid it'll scare everybody so for now this would be "something to think about." I just want to say that change, even if it breaks things, can be very good. It shouldn't be automatically ruled out in fear.

Where is the reaction in fear? I only see people trying to explain the current design and its advantages. I said I doubt a solution exists that would have the benefits of the current design while having reference semantics (if even reference semantics for length would be desirable). If you want to present some ideas that would be great - do whatever you want and enjoy (remember we're all doing this for fun).

The general impression I get is that as soon as something creates the possibility of breaking existing code, then there is backlash. This would be fine for the embedded C language that runs medical heart devices. But for a language that isn't even out the door, it's disheartening (haha, no pun intended ;). Just my 2 cents.

For my case when I said essentially "much code will break" it wasn't meant as a backlash - just as a fact you would have to address. A proposed change that breaks lots of code is harder to push through than one that doesn't as a simple practical matter more than any emotional attachment to old code.
I view 
an array as much closer to a struct than an object: an array is just like a 
struct with a pointer field and a length field. That's the simplest 
description of what an array is. Comparing them to objects is the wrong 
analogy.

Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.

uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.

SomeObject A = new SomeObject; SomeObject B = A; B.SomeProperty; // Operates on A. SomeStruct A; SomeStruct B = A; B.SomeProperty; // Operates on B. int[] A = new int[5]; int[] B = A; B.SomeProperty; // Operates on A; // _Except_ if it's .length. This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.

Please think about structs that contain pointers. [snip]
Why should .length secretely call .dup sometimes, and sometimes not?

Here I agree that the documentation should be more explicit in describing when setting the length reallocated and when it doesn't. If it is compiler-dependent the doc should say so.
Jul 30 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi Ben,

So then .length is related to slicing? How does the semantics of
.length affect
slicing? Or perhaps you meant other benefits?

I recommend you pursue some of your ideas where length is manipulated by reference and follow the dependencies to see how different dynamic arrays (and, yes, slicing) would be. In particular I recommend you learn more about slicing. I'm sorry if that sounds harsh but I've gotten the opinion now that you haven't really gotten experience with D arrays as they exist now.

Would an example do? I may not be an expert regarding slicing, but I could see a discrete problem if you point it out.
The general impression I get is that as soon as something creates the
possibility of breaking existing code, then there is backlash. This would be
fine for the embedded C language that runs medical heart devices. But for a
language that isn't even out the door, it's disheartening (haha, no pun intended
;). Just my 2 cents.

For my case when I said essentially "much code will break" it wasn't meant as a backlash - just as a fact you would have to address. A proposed change that breaks lots of code is harder to push through than one that doesn't as a simple practical matter more than any emotional attachment to old code.

This kind of thinking only works ceteris paribus. But if a solution that breaks less code is not as good, then the language loses. I think at this point the language can afford such changes before it becomes like C, where a header file was needed to introduce mere booleans.
I view 
an array as much closer to a struct than an object: an array is just like a 
struct with a pointer field and a length field. That's the simplest 
description of what an array is. Comparing them to objects is the wrong 
analogy.

Except that _all_ properties other than .length operate via reference semantics. Structs wouldn't do that. Objects would.

uhh - the struct has a pointer to the data. The pointer part has reference semantics and the length part doesn't. A struct can easily have methods that derefence the pointer and modify shared state. I do it all the time with the MinTL containers and pretty much any struct that stores a pointer.

SomeObject A = new SomeObject; SomeObject B = A; B.SomeProperty; // Operates on A. SomeStruct A; SomeStruct B = A; B.SomeProperty; // Operates on B. int[] A = new int[5]; int[] B = A; B.SomeProperty; // Operates on A; // _Except_ if it's .length. This behaviour seems much more in line with Objects than with Structs, to me. That's why I don't see how .length should break the current semantics.

Please think about structs that contain pointers.

Even if we see arrays as structs (which I don't, but for the sake of the argument), it doesn't explain why .length should break the other properties' semantics. If there's an obvious reason I'm blind to, could you point it out? I'm a little dense sometimes.
[snip]
Why should .length secretely call .dup sometimes, and sometimes not?

Here I agree that the documentation should be more explicit in describing when setting the length reallocated and when it doesn't. If it is compiler-dependent the doc should say so.

Ok. Cheers, --AJG.
Jul 30 2005
next sibling parent Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dcgkt5$1b4i$1 digitaldaemon.com>, AJG says...

Even if we see arrays as structs (which I don't, but for the sake of the
argument), it doesn't explain why .length should break the other properties'
semantics. If there's an obvious reason I'm blind to, could you point it out?
I'm a little dense sometimes.

Because sometimes it needs to reallocate memory. Why don't you look at `man realloc`: The realloc() function tries to change the size of the allocation pointed to by ptr to size, and return ptr. If there is not enough room to enlarge the memory allocation pointed to by ptr, realloc() creates a new allocation, copies as much of the old data pointed to by ptr as will fit to the new allocation, frees the old allocation, and returns a pointer to the allocated memory. realloc() returns a NULL pointer if there is an error, and the allocation pointed to by ptr is still valid. The difference is that D cannot let it free the original, because if it did then other refereces to the data would break. So it dups the data if a realloc is going to allocate memory in a different area. I'm not sure fo the exact implementation details in D, but that's my basic understanding. So for recap: If length increases, and there is not enough space available to grow the array it, it allocates another block of memory and copies the data. It leaves the original pointer in tack then and lets the garbage collector decide if anybody else has references to it still. This may seem confusing, but it's about array slicing being fast. If you don't want there do be this mixed semantics, and always dup your data. (P.S. You mention C++ reference symatecs when you're talking about these arrays. But this isn't even legal in C++: int foo[10]; foo = null; You really can't compare the two languages in this aspect. I think D arrays are a big step forward when compared to C arrays, which literally couldn't find their ass with both hands.) -Sha
Jul 30 2005
prev sibling parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <dcgkt5$1b4i$1 digitaldaemon.com>, AJG says...
Hi Ben,

So then .length is related to slicing? How does the semantics of
.length affect
slicing? Or perhaps you meant other benefits?

I recommend you pursue some of your ideas where length is manipulated by reference and follow the dependencies to see how different dynamic arrays (and, yes, slicing) would be. In particular I recommend you learn more about slicing. I'm sorry if that sounds harsh but I've gotten the opinion now that you haven't really gotten experience with D arrays as they exist now.

Would an example do? I may not be an expert regarding slicing, but I could see a discrete problem if you point it out.

Let me step through some choices that I was hoping you would do. Let's start by thinking about what an array with reference-based length would look like. It would either be a pointer to today's dynamic array (a ptr and a length) or it would be a pointer to one memory block with the length stored either at the front or end of the array data. How would slicing work for those two implementations? For the first slicing would have to allocate memory to store the new ptr and new length. For the second slicing would have to be a different type since it is impossible to store the length for the slice in the middle of the original source array. So that's why I suggested you think through your initial suggestion and work out the impact on slicing and arrays in general. But to be honest I would still prefer the current behavior where the length information is always available without having to check for null first - even if you could somehow make the rest of D remain the same as today.
Jul 31 2005
parent AJG <AJG_member pathlink.com> writes:
Hi Ben,

Let me step through some choices that I was hoping you would do. Let's start by
thinking about what an array with reference-based length would look like. It
would either be a pointer to today's dynamic array (a ptr and a length) or it
would be a pointer to one memory block with the length stored either at the
front or end of the array data. How would slicing work for those two
implementations? For the first slicing would have to allocate memory to store
the new ptr and new length. For the second slicing would have to be a different
type since it is impossible to store the length for the slice in the middle of
the original source array. So that's why I suggested you think through your
initial suggestion and work out the impact on slicing and arrays in general.

I don't think this change in the way arrays operate internally would be necessary. What about simply using the current data pointer as it is to implement reference semantics? A null pointer means the reference is null; and vice-versa. The problem I keep hearing comes when trying to re-size (specifically, enlarge), an array, by reference. So then what it all comes down to re: .length is the inability of realloc() to guarantee that the pointer it returns is the same on it receives. Is this correct?
But to be honest I would still prefer the current behavior where the length
information is always available without having to check for null first - even
>if you could somehow make the rest of D remain the same as today.

I understand this concern, and it is a valid one. However, at this point D is trying to have the cake and eating it too: It wants to have null arrays, but not have to go thru null checks. The result is a bit confusing, IMHO. Moreover, it is buggy. Worse of all, it is not well documented. This combination of factors leads me to think something should be done. Frankly, from the docs I can't make out what the semantics of arrays are supposed to be. That was why I asked the original question: should we or shouldn't we treat arrays as null? I guess maybe not even Walter knows ;) ? Cheers, --AJG.
Jul 31 2005
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sat, 30 Jul 2005 17:07:06 +0000 (UTC), AJG wrote:


[snip]

 
 SomeObject A = new SomeObject;
 SomeObject B = A;
 B.SomeProperty; // Operates on A.
 
 SomeStruct A;
 SomeStruct B = A;
 B.SomeProperty; // Operates on B.
 
 int[] A = new int[5];
 int[] B = A;
 B.SomeProperty; // Operates on A; 
 // _Except_ if it's .length.
 
 This behaviour seems much more in line with Objects than with Structs, to me.
 That's why I don't see how .length should break the current semantics.

You are wrong here because 'B.someProperty' operates on B not A. A simple proof is this ... int[] A = new int[5]; int[] B = A; A.length = 4; writefln("%d", B.length); // displays 5. In your example, it *appears* to operate on A (the 8-byte array structure) because B and A have the same values. That is A.ptr == B.ptr and A.length == B.length. We just have to admit that arrays in D are not the classical array definition and are really a different type of thing altogether. Then get to learn the rules of D 'arrays'. If you want arrays to behave like objects, then maybe you can write an array class. -- Derek Parnell Melbourne, Australia 31/07/2005 8:26:46 AM
Jul 30 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi Derek,

 int[] A = new int[5];
 int[] B = A;
 B.SomeProperty; // Operates on A; 
 // _Except_ if it's .length.
 
 This behaviour seems much more in line with Objects than with Structs, to me.
 That's why I don't see how .length should break the current semantics.

You are wrong here because 'B.someProperty' operates on B not A. A simple proof is this ... int[] A = new int[5]; int[] B = A; A.length = 4; writefln("%d", B.length); // displays 5. In your example, it *appears* to operate on A (the 8-byte array structure) because B and A have the same values. That is A.ptr == B.ptr and A.length == B.length.

Um... I said "except .length" for a reason. That's my very point. That .length is the exception. All others operate on A.
We just have to admit that arrays in D are not the classical array
definition and are really a different type of thing altogether. Then get to
learn the rules of D 'arrays'. If you want arrays to behave like objects,
then maybe you can write an array class.

First of all, this would throw efficiency out the window. Second, let me quote you a little of the D manifesto: [Taken from "The D Programming Language" written by Walter Bright] [Arrays Section] "Arrays are enhanced from being little more than an alternative syntax for a pointer into first class objects." That's, ahem, "First Class Objects," for those that missed it. Cheers, --AJG.
Jul 30 2005
parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dch28c$1nrj$1 digitaldaemon.com>, AJG says...
Hi Derek,

 int[] A = new int[5];
 int[] B = A;
 B.SomeProperty; // Operates on A; 
 // _Except_ if it's .length.
 
 This behaviour seems much more in line with Objects than with Structs, to me.
 That's why I don't see how .length should break the current semantics.

You are wrong here because 'B.someProperty' operates on B not A. A simple proof is this ... int[] A = new int[5]; int[] B = A; A.length = 4; writefln("%d", B.length); // displays 5. In your example, it *appears* to operate on A (the 8-byte array structure) because B and A have the same values. That is A.ptr == B.ptr and A.length == B.length.

Um... I said "except .length" for a reason. That's my very point. That .length is the exception. All others operate on A.

No, All others do _NOT_ operate on A. They happen to operate on the same data that A points to. A is a struct which an int and a ptr, obviously changing B's ptr, or B's length do not affect A. You're thinking about D arrays all wrong. That's what Derek was getting at. A and B are two separate objects which happen to be able to have references to the same data. For effiencies sake both the length and the ptr are assigned by value. Think of it this way in C, if you have this structure: struct Array { int length; void* ptr; } a, b; a.ptr = new char[100]; b = a; What does this do? This is the semantics of D arrays. A and B are distinct structures, and if you allocate more memory for b then it's not going to change A. As you can see this is not the same as reference semantics at all, otherwise A's ptr would change as well. If you want reference semantics you are free to use an array handle. But the way D arrays are handled is not mystical or inconsistent. They're perfectly consistent with themselves, and if you understand how they operate (which is not hard) then you won't make mistakes. As for your other issue, where array nullness and length == 0 being converged, I do not think this is an issue. length == 0 is the definition of a null set (arrays in CS seem to be more in line with sets, dunno why they're named as they are). But if you want to be consitent with terminology, techincally a null array is a an array with all elements set to null. Can you show me an example where it matters if length == 0 and arr.ptr == null does not denote the same thing? -Sha
Jul 30 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

Um... I said "except .length" for a reason. That's my very point. That .length
is the exception. All others operate on A.

No, All others do _NOT_ operate on A. They happen to operate on the same data that A points to.

You are simply splitting hairs here. You are arguing language semantics. The fact of the matter is that for all practical purposes, EXCEPT for .length, arrays in D are by reference. This means that for all practical purposes, EXCEPT for .length, B operates on A. It doesn't matter if it's because of the pointer (an implementation, system-dependent, gory detail) or because of any other reason. If assiging an array _immediately_ copied the data, then what you said is true. But it doesn't, because (a) that would be inefficient, and (b) that would remove _all_ reference semantics. Therefore, as it is, reference semantics are broken when it comes to .length. <snip>
 RE: Arrays as structs.

This is were _you_ are wrong. Arrays are not structs. Arrays do not share the semantics of structs. Arrays share _implementation details_ with structs, and that's _it_. Didn't you see the quote from the D language doc? It clearly says "First-Class Objects." Not structs. Not primitives. Not pointers. If you, however, equate that with structs, that's fine. But I certainly do not.
They're 
perfectly consistent with themselves,

This means absolutely nothing. A bug can be perfectly consistent with itself and it is still a bug. To be meaningful, they would have to be consistent with the rest of the language. Or perhaps, consistent with another part of the language, like, say, Objects.
and if you understand how they operate
(which is not hard) then 
you won't make mistakes.

It's not about making mistakes. Sure, I can just as well avoid a function in a library that is buggy, and I'll avoid a mistake. That's not the point. If something is broken, then it need to be fixed. If Walter could perhaps clarify the semantics of arrays, then we would get somewhere.
As for your other issue, where array nullness and length == 0 being converged
do not think this is an 
issue.  length == 0 is the definition of a null set

So? What I would like to express is _No Set_.
(arrays in CS seem to be
more in line with sets, dunno 
why they're named as they are). But if you want to be consitent with
terminology, techincally a null array 
is a an array with all elements set to null. Can you show me an example
it matters if length == 
0 and arr.ptr == null does not denote the same thing?

When you are returning fields from a database, for instance. If you've ever dealt with a DB, you would know fields can be NULL, meaning no value. This is different than "", which means explicitly the empty string. It is very difficult to do this because of certain bugs which meld .length == 0 and .ptr == null. They are not the same thing. Not semantically. Not technically, at the moment, except for the "bugs." That's why I'm asking Walter whether he _plans_ on merging the two into one. If that's his vision, which would be unfortunate, then those things aren't "bugs" at all, but rather the intended design. Cheers, --AJG.
Jul 30 2005
parent Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dchgkl$23v5$1 digitaldaemon.com>, AJG says...
Hi,

Um... I said "except .length" for a reason. That's my very point. That .length
is the exception. All others operate on A.

No, All others do _NOT_ operate on A. They happen to operate on the same data that A points to.

You are simply splitting hairs here. You are arguing language semantics. The fact of the matter is that for all practical purposes, EXCEPT for .length, arrays in D are by reference. This means that for all practical purposes, EXCEPT for .length, B operates on A. It doesn't matter if it's because of the pointer (an implementation, system-dependent, gory detail) or because of any other reason.

I am not splitting hairs. I gave you a very valid reason why a and b are not references, not even theoretically. They happen to have a reference member that in some cases, will point to the same data. YOU are in full control over when that happens. If that's not what you intended, then you should be using references to the ARRAY. Rather than using multiple arrays with have references to the same data. I might ask you this: What MAGIC would you like to happen with arrays? What you want is not possible without some kind of magic. Try this example on for size, from classic C: int* a = malloc(100 * sizeof(int)); int* b = a; b = realloc( b, 1000 * sizeof(int) ); Guess what, a is most likely now a bad reference. Is this what you would like D to do? Probably not, you probably want 'a' to point to the new array of length 1000. Do you want the compiler to magically handle this for you? Would you like length to be read only? Forcing us to call b = new int[], and then manually code up the data copy to resize the array? Starting to sound like C.... What a pain arrays were. And a still didn't change automatically to where b is pointing now.
If assiging an array _immediately_ copied the data, then what you said is true.
But it doesn't, because (a) that would be inefficient, and (b) that would remove
_all_ reference semantics.

Therefore, as it is, reference semantics are broken when it comes to .length.

There are no reference semantics when it comes to arrays. Maybe what you want is D to automagically do a Copy-on-Write. Any time an array that is set to a reference of another array the flag could get turned on, and when you use it as an lvalue and that is on, it could dup the array. But that's silly since b = new int[100]; is perfectly legal in D, and would result in a double memory access if you ever tried to assign to the array. Wonder what kind of magic would have to be done to fix this case. IMHO, Better to let the programmer specify when he wants a and b to point a the same data.
<snip>
 RE: Arrays as structs.

This is were _you_ are wrong. Arrays are not structs. Arrays do not share the semantics of structs. Arrays share _implementation details_ with structs, and that's _it_. Didn't you see the quote from the D language doc? It clearly says "First-Class Objects." Not structs. Not primitives. Not pointers. If you, however, equate that with structs, that's fine. But I certainly do not.

You can't use a language to it's fully potential if you don't know implementation details. There will always be ambiguities of when references are by value, by ref, or whatever else. As the saying goes: the language is in the details. Here's a good example for you, from a VB.NET project i just inherited: If arr.Length - arr.Replace(",", "").Length <> 17 Then 'error out What's the big deal? It's only one line of code, must be just as good as counting the number of commas in the array....
They're 
perfectly consistent with themselves,

This means absolutely nothing. A bug can be perfectly consistent with itself and it is still a bug. To be meaningful, they would have to be consistent with the rest of the language. Or perhaps, consistent with another part of the language, like, say, Objects.
and if you understand how they operate
(which is not hard) then 
you won't make mistakes.

It's not about making mistakes. Sure, I can just as well avoid a function in a library that is buggy, and I'll avoid a mistake. That's not the point. If something is broken, then it need to be fixed. If Walter could perhaps clarify the semantics of arrays, then we would get somewhere.
As for your other issue, where array nullness and length == 0 being converged
do not think this is an 
issue.  length == 0 is the definition of a null set

So? What I would like to express is _No Set_.

Not Set?
(arrays in CS seem to be
more in line with sets, dunno 
why they're named as they are). But if you want to be consitent with
terminology, techincally a null array 
is a an array with all elements set to null. Can you show me an example
it matters if length == 
0 and arr.ptr == null does not denote the same thing?

When you are returning fields from a database, for instance. If you've ever dealt with a DB, you would know fields can be NULL, meaning no value. This is different than "", which means explicitly the empty string. It is very difficult to do this because of certain bugs which meld .length == 0 and .ptr == null.

I see your point, but any kind of attempt to do that would be abusing the array. There are laws against array abuse in most countries these days. </sarcasm> Most every single database api in existence deals with that by having special objects. so you have this: static char[0] DBNull; in your database module; then char[] foo; foo = dbCommand.executeScalar( ); if( foo is DBNull ) // I'm not sure if the .ptr prop is needed here. Last I heard if you just use the array name it defaults to the ptr . oh noes, the field was null! else . oh good ..
They are not the same thing. Not semantically. Not technically, at the moment,
except for the "bugs." That's why I'm asking Walter whether he _plans_ on
merging the two into one.

They should never be the same thing. But there's a gotcha, if .ptr is null, then length should always be 0. Other way around is not necessarily true. Just because length == 0 the ptr isn't necesisarily null. This should be the case when the array was at one point allocated, and then length was reduced. It should be that way for efficiency. That however is not useful for your example of DBNulls. It would be silly to allocate some space and then just not use it and say that's when somebody entered something, and it was nothing.
 If that's his vision, which would be unfortunate, then
those things aren't "bugs" at all, but rather the intended design.

What 'things'? Are you talking about the .ptr value being the same for two arrays?
Jul 31 2005
prev sibling parent Carlos Santander <csantander619 gmail.com> writes:
AJG escribió:
 
 I'm not suggesting making .length read-only. I'm suggesting making it operate
on
 the same data it has a pointer to. Just like .sort or .reverse would. The way I
 see it, if you explicitly want to make a copy of the data, that's why there is
 dup. Why should .length secretely call .dup sometimes, and sometimes not?
 
 Cheers,
 --AJG.
 
 

First of all, I don't agree with AJG: I think D arrays are very well the way they're now. There's something, though, and correct me if I'm wrong, but I think array.length doesn't go hand in hand with COW. char [] a; a.length = 3; foo(a); void foo(char [] b) { b[0] = 'f'; // 1 b.length = 5; // 2 } COW says to do 1, you have to dup first, because you don't own the array, but when you do 2, b is automatically dupped. So, my point is that to be consistent, maybe resizing should also require dupping. Am I right? Does it make sense? -- Carlos Santander Bernal
Jul 30 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 29 Jul 2005 18:50:45 +0000 (UTC), AJG wrote:

 Hi Ben,
 
 Ok, I don't think I said exactly what I meant before. Let's look at this piece
 by piece:
 
 1) Arrays are ("in theory") reference types.

This is where I think we separate. I don't think that D arrays are reference types in the same manner as objects. I think they are value types in that they always have two fields; a pointer and a length. D arrays are more like a predefined struct. Your phrase "in theory", depends on whose theory you are talking about.
 2) Objects are reference types.

Okay.
 3) Arrays are not objects.

True.
 4) So, even though Arrays and Objects are different, they share (or should)
 reference semantics.

I assume at this point that you are talking about arrays as defined in some computer science book rather than how they are implemented in D.
 I believe most of us can agree up to here.

Apparently not ;-)
 My overall point is that D is not keeping its promise regarding Arrays obeying
 reference semantics. 

"Promise"? Where is that written down?
Whether this is good or not is debatable, but at least it
 should be noted. Do you agree that D's arrays break reference semantics?

I suppose so. But it doesn't worry me because it is a pragmatic implementation that makes coding clearer (IMO) and improves performance. I'm not sorry that D doesn't have text-book arrays, in that case. In your previous example ...
 # char[] a = "hello"; // whatever.
 # char[] b = a; // This is a reference to a.
 # b.length = 2; // Now b became its own instance.
 
 Semantically speaking, I think this is wrong. 

I've adjusted my thinking when using D. To me, after the assignment 'b = a', I see that 'a' and 'b' are distinct arrays that happen to share the same data. This may be seen as twisting words or playing with semantics, but it works for me. And by the way, the 'b.length = 2' statement does not cause 'b' to become another instance. It still shares the same data as 'a'. You only get a new instance when the length increases. If D has not implemented text-book arrays, what are we losing? I can't see that we have lost anything, in fact we have gained. -- Derek Parnell Melbourne, Australia 30/07/2005 11:19:20 AM
Jul 29 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi Derek,

 Ok, I don't think I said exactly what I meant before. Let's look at this piece
 by piece:
 
 1) Arrays are ("in theory") reference types.

This is where I think we separate. I don't think that D arrays are reference types in the same manner as objects. I think they are value types in that they always have two fields; a pointer and a length. D arrays are more like a predefined struct. Your phrase "in theory", depends on whose theory you are talking about.

Well, my "in theory" is actually pretty down-to-earth. I mean reference semantics in the way C++, C#, PHP, Java, Javascript and many other languages do references. This is not an ivory tower concept. It means essentially a nicer, fancier version of a pointer. When using the languages I mentioned, if you assign a reference, it will not become its own instance spontaineously in certain cases.
 4) So, even though Arrays and Objects are different, they share (or should)
 reference semantics.

I assume at this point that you are talking about arrays as defined in some computer science book rather than how they are implemented in D.

Guilty as charged re: being a computer scientist ;). However, once again, this is not a high-brow idea. Reference semantics are very basic and are implemented fairly similarly across various mainstream languages (C++, C#, PHP, Java, Javascript). D breaks reference semantics when it comes to arrays. This leads me to believe arrays are _not_ reference types, which is not the impression I got from their description. Walter has remained conspicously silent about the matter, and has not answered the question. Are arrays reference types or not? If yes, then they are broken.
 I believe most of us can agree up to here.

Apparently not ;-)

Indeed. The final word can only come from the Big W., I'm afraid.
 My overall point is that D is not keeping its promise regarding Arrays obeying
 reference semantics. 

"Promise"? Where is that written down?

It was a figure of speech :p. The promise "would" be written down if D agrees to implement array reference semantics and then doesn't. This is what I'm not sure about.
Whether this is good or not is debatable, but at least it
 should be noted. Do you agree that D's arrays break reference semantics?

I suppose so. But it doesn't worry me because it is a pragmatic implementation that makes coding clearer (IMO) and improves performance. I'm not sorry that D doesn't have text-book arrays, in that case.

Once more, these "text-book" arrays are fairly common across modern languages, and D's semantics are certainly a twisted variation. Also, I don't follow how that improves performance. If anything, it _decreases_ performance by spawning deep copies of array instances in certain special cases.
In your previous example ...

 # char[] a = "hello"; // whatever.
 # char[] b = a; // This is a reference to a.
 # b.length = 2; // Now b became its own instance.
 
 Semantically speaking, I think this is wrong. 

I've adjusted my thinking when using D. To me, after the assignment 'b = a', I see that 'a' and 'b' are distinct arrays that happen to share the same data. This may be seen as twisting words or playing with semantics, but it works for me.

Well, then that's not a reference. Sharing just the same data is some weird variation of array that I hadn't encountered. This is not a reference.
And by the way, the 'b.length = 2' statement does not cause 'b' to become
another instance. It still shares the same data as 'a'. You only get a new
instance when the length increases.

Great, yet another exception. Thanks for pointing it out.
If D has not implemented text-book arrays, what are we losing? I can't see
that we have lost anything, in fact we have gained.

Well, so what if we lost object reference semantics? Would that also be another "gain?" Less is more! Rations will be increased -33%. It's doubleplusgood! ;) Cheers, --AJG.
Jul 29 2005
next sibling parent Mike Parker <aldacron71 yahoo.com> writes:
AJG wrote:

 
 
 Well, then that's not a reference. Sharing just the same data is some weird
 variation of array that I hadn't encountered. This is not a reference.
 
 

 
 
 Great, yet another exception. Thanks for pointing it out.
 
 

 Well, so what if we lost object reference semantics? Would that also be another
 "gain?" Less is more! Rations will be increased -33%. It's doubleplusgood!
 

Wasn't it you who posted elsewhere in this thread that change is good? ;) D has changed the way we think about arrays. From my perspective, it's a good change and your desire to revert to the 'array as a reference' paradigm is not. Maybe it would help if you think of the D array as a wrapper/facade to the actual reference?
Jul 30 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Sat, 30 Jul 2005 02:30:17 +0000 (UTC), AJG wrote:

 Hi Derek,
 
 Ok, I don't think I said exactly what I meant before. Let's look at this piece
 by piece:
 
 1) Arrays are ("in theory") reference types.

This is where I think we separate. I don't think that D arrays are reference types in the same manner as objects. I think they are value types in that they always have two fields; a pointer and a length. D arrays are more like a predefined struct. Your phrase "in theory", depends on whose theory you are talking about.

Well, my "in theory" is actually pretty down-to-earth. I mean reference semantics in the way C++, C#, PHP, Java, Javascript and many other languages do references. This is not an ivory tower concept. It means essentially a nicer, fancier version of a pointer. When using the languages I mentioned, if you assign a reference, it will not become its own instance spontaineously in certain cases.

I think I have the solution. Rename them. Don't call them arrays. Call them something else. Then your problem goes away ;-) -- Derek Parnell Melbourne, Australia 30/07/2005 10:49:59 PM
Jul 30 2005
prev sibling parent reply Niko Korhonen <niktheblak hotmail.com> writes:
Ben Hinkle wrote:
 I think you'll have a hard time getting lots of support for that. I much 
 prefer the current behavior and I bet there is lots of existing D code that 
 assumes one can test the length of an array at any time. Since an array is 
 not an object I see no problem with the "inconistency" - an array is an 
 array. 

Indeed. I think the array semantics where you can't access a property of the array without the Fear of the NullPointerException is the most annoying thing in the world, or at least in the field of programming. I will happily agree to this difference in semantics because the benefits far outweigh the slight inconsistency. Besides, in a way there is no inconsistency. An array reference is a value type consisting of two 4-byte integers (in 32-bit environments). This is different from an object reference. The first integer is the length of the array and the second is a pointer to the first item of the array. Whenever an array reference is created a pointer to the data exists. The .length property is just a shortcut to access the length field of the array. The .sort property is a function called on the array reference. These always work even if the array reference points to an empty array. Trying to access the elements of an empty array will segfault in the usual way. Object references stored in an array have the usual semantics. IMO nothing forces a language to treat arrays as templated instances of a class Array with regular object semantics. D's way is just better. -- Niko Korhonen SW Developer
Jul 31 2005
parent Derek Parnell <derek psych.ward> writes:
On Mon, 01 Aug 2005 09:56:57 +0300, Niko Korhonen wrote:

 Ben Hinkle wrote:
 I think you'll have a hard time getting lots of support for that. I much 
 prefer the current behavior and I bet there is lots of existing D code that 
 assumes one can test the length of an array at any time. Since an array is 
 not an object I see no problem with the "inconistency" - an array is an 
 array. 

Indeed. I think the array semantics where you can't access a property of the array without the Fear of the NullPointerException is the most annoying thing in the world, or at least in the field of programming. I will happily agree to this difference in semantics because the benefits far outweigh the slight inconsistency. Besides, in a way there is no inconsistency. An array reference is a value type consisting of two 4-byte integers (in 32-bit environments). This is different from an object reference.

Agreed. The way I look at it is that a D array variable *contains* a reference to the array elements but is, in itself, not the reference. When it comes to implementation, dynamic-length arrays always have an 8-byte structure allocated to themselves, and may have more RAM allocated if there are any elements in the array. The address of the array variable is not the address of the first element; the length property is fetched at runtime from the array variable. However, fixed-length arrays always have a minimum of 8 bytes allocated regardless of the number of elements declared, and the address of the array variable is also the address of its first element; the length property is 'hard-coded' by the compiler in any expressions that use it. -- Derek Melbourne, Australia 1/08/2005 5:01:43 PM
Aug 01 2005
prev sibling parent reply J Thomas <jtd514 ameritech.net> writes:
so wait, you basically want an array to be a pointer to data containing 
a length and a pointer? i have been following this thread somewhat but I 
can hardly find the benifit here. it seems to me you want to take 
something very straightforward and close to the metal and turn it into a 
referenced object, for some bizzare reason regarding reference 
semantics. why dont you just put your arrays in objects if you are 
having problems?
Jul 30 2005
parent reply AJG <AJG_member pathlink.com> writes:
Hi,

so wait, you basically want an array to be a pointer to data containing 
a length and a pointer? i have been following this thread somewhat but I 
can hardly find the benifit here.

No. I would like it to be that way, but I know there wouldn't be support for this. What I'd like is for all array properties to follow reference semantics.
it seems to me you want to take 
something very straightforward and close to the metal and turn it into a 
referenced object, for some bizzare reason regarding reference 
semantics. 

What is bizarre is the current array semantics, be it due to "close to the metal" requirements, or whatever. If you don't think arrays at the moment follow at least _partial_ reference semantics, then why does: # char[] A = "123"; // Yes, it's static, bear with me. # char[] B = A; # B.reverse; Reverse _also_ the contents of A? Those are reference semantics. According to Derek, the array reference itself is implemented on the stack in 8-byte chunks. That's fine. I'm not talking about making the array itself a pointer. Now, my point is that .length breaks reference semantics in special cases, because: # char[] A = "123"; # char[] B = A; # B.length = 4; A.length did not change. If it were consistent with .reverse and .sort, then A's length too would have changed. Cheers, --AJG. why dont you just put your arrays in objects if you are
having problems?

Jul 30 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:


[snip]
 What is bizarre is the current array semantics, be it due to "close to the
 metal" requirements, or whatever. If you don't think arrays at the moment
follow
 at least _partial_ reference semantics, then why does:
 
 # char[] A = "123"; // Yes, it's static, bear with me.
 # char[] B = A;
 # B.reverse;
 
 Reverse _also_ the contents of A?

There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code. -- Derek Parnell Melbourne, Australia 31/07/2005 8:53:41 AM
Jul 30 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:


 [snip]
 What is bizarre is the current array semantics, be it due to "close to 
 the
 metal" requirements, or whatever. If you don't think arrays at the moment 
 follow
 at least _partial_ reference semantics, then why does:

 # char[] A = "123"; // Yes, it's static, bear with me.
 # char[] B = A;
 # B.reverse;

 Reverse _also_ the contents of A?

There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code.

Besides those reasons writing "B.reverse" to me indicates you want to affect B hence no COW while "reverse(B)" says you want a reversed B hence COW. That's one reason why I don't really like the current syntax hack of being able to write B.tolower() to mean tolower(B).
Aug 01 2005
parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message 
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:


 [snip]
 What is bizarre is the current array semantics, be it due to "close to 
 the
 metal" requirements, or whatever. If you don't think arrays at the moment 
 follow
 at least _partial_ reference semantics, then why does:

 # char[] A = "123"; // Yes, it's static, bear with me.
 # char[] B = A;
 # B.reverse;

 Reverse _also_ the contents of A?

There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code.

Besides those reasons writing "B.reverse" to me indicates you want to affect B hence no COW while "reverse(B)" says you want a reversed B hence COW. That's one reason why I don't really like the current syntax hack of being able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha
Aug 01 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message 
news:dcleqr$2ti5$1 digitaldaemon.com...
 In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 On Sat, 30 Jul 2005 22:12:31 +0000 (UTC), AJG wrote:


 [snip]
 What is bizarre is the current array semantics, be it due to "close to
 the
 metal" requirements, or whatever. If you don't think arrays at the 
 moment
 follow
 at least _partial_ reference semantics, then why does:

 # char[] A = "123"; // Yes, it's static, bear with me.
 # char[] B = A;
 # B.reverse;

 Reverse _also_ the contents of A?

There might have been be an argument that .reverse and .sort should follow Walter's Copy-on-Write rules of engagement, but the current behavior is documented and relied upon in current code.

Besides those reasons writing "B.reverse" to me indicates you want to affect B hence no COW while "reverse(B)" says you want a reversed B hence COW. That's one reason why I don't really like the current syntax hack of being able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?
Aug 01 2005
parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dclfvs$2usj$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message 
news:dcleqr$2ti5$1 digitaldaemon.com...
 In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to 
affect
B hence no COW while "reverse(B)" says you want a reversed B hence COW.
That's one reason why I don't really like the current syntax hack of being
able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?

I wasn't proposing a change at all. I was disagreing with Derek. I think COW is a bad thing for API functions to be doing mysteriously. It leads to crap like this: foo = foo.Replace("Hello",""); dateFoo = dateFoo.AddDays(1); If I want a duplicate something, in D, it's as easy as saying: # foo2 = foo.dup.replace("Hello",""); (Not that replace is a valid property for char[]s, but you get my gist) This leads to effective memory use, and no confusion about: reverse(b), or b.reverse Which one does c-o-w? The name certainly doesn't say, maybe by somebodies reasoning it might make sense that one does cow and one doesn't. But certainly not mine, from the information given. Also, you might say for consistency, always use cow. But cow is not always what you want. Since there's no way to manually un-cowify it, It would make logical sense to NEVER do cow, and let the programmer call dup first. -Sha
Aug 01 2005
next sibling parent reply AJG <AJG_member pathlink.com> writes:
Hi,

If I want a duplicate something, in D, it's as easy as saying:
# foo2 = foo.dup.replace("Hello","");
(Not that replace is a valid property for char[]s, but you get my gist)

Exactly.
This leads to effective memory use, and no confusion about:
reverse(b), or b.reverse 

Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies
reasoning it might make sense that one does cow and one doesn't.  But certainly
not mine, from the information given.

IMHO, and for consistency, it should never do COW. If a user wants to do COW, let the user do it. That's exactly what I mean by reference semantics, so it seems we are in agreement here.
Also, you might say for consistency, always use cow.  But cow is not always 
you want. Since there's no way to manually un-cowify it,  It would make logical
sense to NEVER do cow, and let the programmer call dup first.

Interestingly enough (and one of my points), .length does COW about half of the time, and there's no way to un-cowify it. That's a great word, btw, un-cowify. It had me chuckling. Cheers, --AJG.
Aug 01 2005
parent Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dclls6$37o$1 digitaldaemon.com>, AJG says...
Hi,

If I want a duplicate something, in D, it's as easy as saying:
# foo2 = foo.dup.replace("Hello","");
(Not that replace is a valid property for char[]s, but you get my gist)

Exactly.
This leads to effective memory use, and no confusion about:
reverse(b), or b.reverse 

Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies
reasoning it might make sense that one does cow and one doesn't.  But certainly
not mine, from the information given.

IMHO, and for consistency, it should never do COW. If a user wants to do COW, let the user do it. That's exactly what I mean by reference semantics, so it seems we are in agreement here.
Also, you might say for consistency, always use cow.  But cow is not always 
you want. Since there's no way to manually un-cowify it,  It would make logical
sense to NEVER do cow, and let the programmer call dup first.

Interestingly enough (and one of my points), .length does COW about half of the time, and there's no way to un-cowify it.

While I agree with you that it could be annoying, the problem is that arrays are really stack variables which have a reference member. (As you well know by now.) So, in order to un-cowify .length we would have to make all arrays true references which contain references. Also, that still doesn't fix array slices. We would ALWAYS need to dup when an array slice is made. :( However, there's an easy way to handle the first problem already: char[] a = "Hello"; char[]* b = &a; // (I hope anyways, & shouldn't return a.ptr... I haven't checked this.) (*b).length = 10; writef("%i", a.length); Although array slices won't be fixed without a special array slice type. So that it would know the start of the array and resize that.
That's a great word, btw, un-cowify. It had me chuckling.

Thanks :) -Sha
Aug 01 2005
prev sibling next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message 
news:dclk4p$1o0$1 digitaldaemon.com...
 In article <dclfvs$2usj$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message
news:dcleqr$2ti5$1 digitaldaemon.com...
 In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to
affect
B hence no COW while "reverse(B)" says you want a reversed B hence COW.
That's one reason why I don't really like the current syntax hack of 
being
able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?

I wasn't proposing a change at all. I was disagreing with Derek. I think COW is a bad thing for API functions to be doing mysteriously. It leads to crap like this: foo = foo.Replace("Hello",""); dateFoo = dateFoo.AddDays(1);

I didn't read Derek's post as proposing reverse use COW. He was pointing out that it doesn't. It's too bad you see COW as mysterious.
 If I want a duplicate something, in D, it's as easy as saying:
 # foo2 = foo.dup.replace("Hello","");
 (Not that replace is a valid property for char[]s, but you get my gist)

 This leads to effective memory use, and no confusion about:

 reverse(b), or b.reverse

 Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies
 reasoning it might make sense that one does cow and one doesn't.  But 
 certainly
 not mine, from the information given.

The statement about effective memory use only is true when the operation is guaranteed to change the string. If foo in the example didn't contain any Hellos then the dup would be wasteful. Plus I'm surprised you don't see any difference between reverse(b) and b.reverse since it's common in OOP to interpret b.foo as acting on b while foo(b) is just some function of b.
 Also, you might say for consistency, always use cow.  But cow is not 
 always what
 you want. Since there's no way to manually un-cowify it,  It would make 
 logical
 sense to NEVER do cow, and let the programmer call dup first.

That would be a big change in D style since many times you do not know if a dup will be needed or not (eg most of the functions in std.string might just return the original string).
Aug 01 2005
parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dclmn7$42s$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message 
news:dclk4p$1o0$1 digitaldaemon.com...
 In article <dclfvs$2usj$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message
news:dcleqr$2ti5$1 digitaldaemon.com...
 In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to
affect
B hence no COW while "reverse(B)" says you want a reversed B hence COW.
That's one reason why I don't really like the current syntax hack of 
being
able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?

I wasn't proposing a change at all. I was disagreing with Derek. I think COW is a bad thing for API functions to be doing mysteriously. It leads to crap like this: foo = foo.Replace("Hello",""); dateFoo = dateFoo.AddDays(1);

I didn't read Derek's post as proposing reverse use COW. He was pointing out that it doesn't.

You're right, he didn't. I was contesting that tolower(b) and b.tolower should do different things.
 It's too bad you see COW as mysterious.

I don't find anything mysterious about it. It's just not useful most every time I've had any dealing with COW functions. If I want COW, I can dupe the object first.
 If I want a duplicate something, in D, it's as easy as saying:
 # foo2 = foo.dup.replace("Hello","");
 (Not that replace is a valid property for char[]s, but you get my gist)

 This leads to effective memory use, and no confusion about:

 reverse(b), or b.reverse

 Which one does c-o-w?  The name certainly doesn't say, maybe by somebodies
 reasoning it might make sense that one does cow and one doesn't.  But 
 certainly
 not mine, from the information given.

The statement about effective memory use only is true when the operation is guaranteed to change the string. If foo in the example didn't contain any Hellos then the dup would be wasteful.

I hope you're not implying that replace should only return a new instance if something was actually changed. That is obsurd. I would then need to check to see if it's given me back a reference to a new array before I could use it?
 Plus I'm surprised you don't see any 
difference between reverse(b) and b.reverse since it's common in OOP to 
interpret b.foo as acting on b while foo(b) is just some function of b.

Why don't you tell microsoft that. Many of the examples I listed were from VB.NET, and do COW from member functions. Also, Just because it is common doesn't make it logical, consistent, or obvious to a somebody not familiar with these __unwritten__ agreements.
 Also, you might say for consistency, always use cow.  But cow is not 
 always what
 you want. Since there's no way to manually un-cowify it,  It would make 
 logical
 sense to NEVER do cow, and let the programmer call dup first.

That would be a big change in D style since many times you do not know if a dup will be needed or not (eg most of the functions in std.string might just return the original string).

If I'm understanding what you just said, let me say this: As I said above, I think it's silly to have non-deterministic behavior from those functions. When I say deterministic, I mean that I should be able to expect it to always return a duplicate string, or not. -Sha
Aug 01 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message 
news:dcm4an$grn$1 digitaldaemon.com...
 In article <dclmn7$42s$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message
news:dclk4p$1o0$1 digitaldaemon.com...
 In article <dclfvs$2usj$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message
news:dcleqr$2ti5$1 digitaldaemon.com...
 In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to
affect
B hence no COW while "reverse(B)" says you want a reversed B hence 
COW.
That's one reason why I don't really like the current syntax hack of
being
able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?

I wasn't proposing a change at all. I was disagreing with Derek. I think COW is a bad thing for API functions to be doing mysteriously. It leads to crap like this: foo = foo.Replace("Hello",""); dateFoo = dateFoo.AddDays(1);

I didn't read Derek's post as proposing reverse use COW. He was pointing out that it doesn't.

You're right, he didn't. I was contesting that tolower(b) and b.tolower should do different things.
 If I want a duplicate something, in D, it's as easy as saying:
 # foo2 = foo.dup.replace("Hello","");
 (Not that replace is a valid property for char[]s, but you get my gist)

 This leads to effective memory use, and no confusion about:

 reverse(b), or b.reverse

 Which one does c-o-w?  The name certainly doesn't say, maybe by 
 somebodies
 reasoning it might make sense that one does cow and one doesn't.  But
 certainly
 not mine, from the information given.

The statement about effective memory use only is true when the operation is guaranteed to change the string. If foo in the example didn't contain any Hellos then the dup would be wasteful.

I hope you're not implying that replace should only return a new instance if something was actually changed.

That is what I'm implying - and that's what many std.string functions do.
 That is obsurd.  I would then need to check to
 see if it's given me back a reference to a new array before I could use 
 it?

why? The only time you would care is if you start modifying the array in-place.
 Plus I'm surprised you don't see any
difference between reverse(b) and b.reverse since it's common in OOP to
interpret b.foo as acting on b while foo(b) is just some function of b.

Why don't you tell microsoft that. Many of the examples I listed were from VB.NET, and do COW from member functions.

Strings in VB.NET are immutable so I'm not surprised that methods return new strings - that's the definition of immutable. Mutable objects would interpret b.reverse as acting on b.
 Also, Just because it is common
 doesn't make it logical, consistent, or obvious to a somebody not familiar 
 with
 these __unwritten__ agreements.

Unwritten in what sense? COW is documented in several places in D (though I would like even more documenation about it since it appears people don't know about it).
 Also, you might say for consistency, always use cow.  But cow is not
 always what
 you want. Since there's no way to manually un-cowify it,  It would make
 logical
 sense to NEVER do cow, and let the programmer call dup first.

That would be a big change in D style since many times you do not know if a dup will be needed or not (eg most of the functions in std.string might just return the original string).

If I'm understanding what you just said, let me say this: As I said above, I think it's silly to have non-deterministic behavior from those functions. When I say deterministic, I mean that I should be able to expect it to always return a duplicate string, or not.

ok - everyone is entitled to their opinions. To me it's simpler to obey COW. Changing an array in-place is rare enough that special care is ok with me.
Aug 01 2005
parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dcmaak$l5m$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message 
news:dcm4an$grn$1 digitaldaemon.com...
 In article <dclmn7$42s$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message
news:dclk4p$1o0$1 digitaldaemon.com...
 In article <dclfvs$2usj$1 digitaldaemon.com>, Ben Hinkle says...
"Shammah Chancellor" <Shammah_member pathlink.com> wrote in message
news:dcleqr$2ti5$1 digitaldaemon.com...
 In article <dclba9$2pif$1 digitaldaemon.com>, Ben Hinkle says...
"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to
affect
B hence no COW while "reverse(B)" says you want a reversed B hence 
COW.
That's one reason why I don't really like the current syntax hack of
being
able to write B.tolower() to mean tolower(B).

Utterly confusing! reserve(b) and B.reverse have nothing in their name to imply that either one copies the data. By default COW should not happen. Believe me, look at .NET where everything is COW. New memory allocations all over the place. IMHO .dup is there for a reason, and nothing is preventing you from doing: foo.dup.reverse If somebody else comes along, they will knows you are copying the array. It's only 4 more characters of typing. Plus no confusion as to what does cow and what doesn't. I can copy the thing first with .dup if I want. This isn't C where it's 5 lines of code every time you need to copy an array! -Sha

You've lost me. Are you proposing a change to any existing behavior or coding practice (ie COW)?

I wasn't proposing a change at all. I was disagreing with Derek. I think COW is a bad thing for API functions to be doing mysteriously. It leads to crap like this: foo = foo.Replace("Hello",""); dateFoo = dateFoo.AddDays(1);

I didn't read Derek's post as proposing reverse use COW. He was pointing out that it doesn't.

You're right, he didn't. I was contesting that tolower(b) and b.tolower should do different things.
 If I want a duplicate something, in D, it's as easy as saying:
 # foo2 = foo.dup.replace("Hello","");
 (Not that replace is a valid property for char[]s, but you get my gist)

 This leads to effective memory use, and no confusion about:

 reverse(b), or b.reverse

 Which one does c-o-w?  The name certainly doesn't say, maybe by 
 somebodies
 reasoning it might make sense that one does cow and one doesn't.  But
 certainly
 not mine, from the information given.

The statement about effective memory use only is true when the operation is guaranteed to change the string. If foo in the example didn't contain any Hellos then the dup would be wasteful.

I hope you're not implying that replace should only return a new instance if something was actually changed.

That is what I'm implying - and that's what many std.string functions do.

Bah
 That is obsurd.  I would then need to check to
 see if it's given me back a reference to a new array before I could use 
 it?

why? The only time you would care is if you start modifying the array in-place.

Exactly. Quite often when I want to replace one thing, I want to replace ALOT of things. (Or take any other example.) If each replace allocates a new string, that's inefficient. Maybe I only want to copy it once, and then modify it in place. When .dup is only 4 extra characters per instance of this, it does not justify having two copies of every array function, one for cow and one for in place.
 Plus I'm surprised you don't see any
difference between reverse(b) and b.reverse since it's common in OOP to
interpret b.foo as acting on b while foo(b) is just some function of b.

Why don't you tell microsoft that. Many of the examples I listed were from VB.NET, and do COW from member functions.

Strings in VB.NET are immutable so I'm not surprised that methods return new strings - that's the definition of immutable. Mutable objects would interpret b.reverse as acting on b.

True. However, for mutable objects, would you like to duplicate every function for COW and non-COW? I find it less confusing to explicitly dup. (It also clutters my namespace less!)
 Also, Just because it is common
 doesn't make it logical, consistent, or obvious to a somebody not familiar 
 with
 these __unwritten__ agreements.

Unwritten in what sense? COW is documented in several places in D (though I would like even more documenation about it since it appears people don't know about it).

There's barely any documentation for the API as it is. And a footnote about tolower( string ) on a man page is not enough for me.
 Also, you might say for consistency, always use cow.  But cow is not
 always what
 you want. Since there's no way to manually un-cowify it,  It would make
 logical
 sense to NEVER do cow, and let the programmer call dup first.

That would be a big change in D style since many times you do not know if a dup will be needed or not (eg most of the functions in std.string might just return the original string).

If I'm understanding what you just said, let me say this: As I said above, I think it's silly to have non-deterministic behavior from those functions. When I say deterministic, I mean that I should be able to expect it to always return a duplicate string, or not.

ok - everyone is entitled to their opinions. To me it's simpler to obey COW. Changing an array in-place is rare enough that special care is ok with me.

Rare in what? Rare in what you're writing? I think you'll find that many projects have use for it alot. -Sha
Aug 01 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 That is obsurd.  I would then need to check to
 see if it's given me back a reference to a new array before I could use
 it?

why? The only time you would care is if you start modifying the array in-place.

Exactly. Quite often when I want to replace one thing, I want to replace ALOT of things. (Or take any other example.) If each replace allocates a new string, that's inefficient. Maybe I only want to copy it once, and then modify it in place. When .dup is only 4 extra characters per instance of this, it does not justify having two copies of every array function, one for cow and one for in place.

I don't know if you followed the recent COW/const/inplace performance discussion but my own $0.02 is that one should use COW as a general rule and after profiling the performance target a (presumably) small set of routines that need more careful memory management and possibly inplace manipulations. In a "worst case" one can use one of the many other memory management techniques listed in the D docs. In any case you might want to look over those recent COW threads for more (and more and more) discussion on the topic. On a side note, remember that operations like "replace" might increase the length of the string (if the replacement is longer that the pattern) in which case modifying it inplace becomes tricky. A general rule like COW can take the place of lots of individual rules for each function. But you can code your app however you like or write a phobos lib that does everything inplace - there's nothing technically preventing that and it's perfectly ok if that's what you want to do.
 Also, Just because it is common
 doesn't make it logical, consistent, or obvious to a somebody not 
 familiar
 with
 these __unwritten__ agreements.

Unwritten in what sense? COW is documented in several places in D (though I would like even more documenation about it since it appears people don't know about it).

There's barely any documentation for the API as it is. And a footnote about tolower( string ) on a man page is not enough for me.

I'm not sure what man page you are referring to since D doesn't have man pages (or footnotes from what I can tell). Maybe you are speaking figuratively in which case I recommend that if you have concrete suggestions for improving the doc that you add comments to the doc wiki. On a slight OT I wonder if/how the doc wiki is being used. Are comments removed as they are fixed in the doc? what's the process for using the wiki? I add my comments where I think they should go but I notice there's stuff ranging all over the map and to be honest I have no clue if Walter ever looks at it, how often, and what happens when he does look at it.
Aug 01 2005
parent reply Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <dcmpne$10pp$1 digitaldaemon.com>, Ben Hinkle says...
 That is obsurd.  I would then need to check to
 see if it's given me back a reference to a new array before I could use
 it?

why? The only time you would care is if you start modifying the array in-place.

Exactly. Quite often when I want to replace one thing, I want to replace ALOT of things. (Or take any other example.) If each replace allocates a new string, that's inefficient. Maybe I only want to copy it once, and then modify it in place. When .dup is only 4 extra characters per instance of this, it does not justify having two copies of every array function, one for cow and one for in place.

I don't know if you followed the recent COW/const/inplace performance discussion but my own $0.02 is that one should use COW as a general rule and after profiling the performance target a (presumably) small set of routines that need more careful memory management and possibly inplace manipulations. In a "worst case" one can use one of the many other memory management techniques listed in the D docs. In any case you might want to look over those recent COW threads for more (and more and more) discussion on the topic.

I think this would be a bad choice. It might be wise with respect to performance, but having different methods randomly be cow or not cow depending on how much more time they take is a bit confusing to say the least.
On a side note, remember that operations like "replace" might increase the 
length of the string (if the replacement is longer that the pattern) in 
which case modifying it inplace becomes tricky. 

Inplace may not be possible, but it could still follow the normal rule of modifying the ptr of your array to point to the new value. That way a dup only happens when it is required, and the calling function does not care. This would be ideal IMHO.
 A general rule like COW can 
take the place of lots of individual rules for each function. But you can 
code your app however you like or write a phobos lib that does everything 
inplace - there's nothing technically preventing that and it's perfectly ok 
if that's what you want to do.

That's true, but it would be nice not to be including my own runtime in every little application I write. I suppose I could force installation of a shared library. Ugh.
 Also, Just because it is common
 doesn't make it logical, consistent, or obvious to a somebody not 
 familiar
 with
 these __unwritten__ agreements.

Unwritten in what sense? COW is documented in several places in D (though I would like even more documenation about it since it appears people don't know about it).

There's barely any documentation for the API as it is. And a footnote about tolower( string ) on a man page is not enough for me.

I'm not sure what man page you are referring to since D doesn't have man pages (or footnotes from what I can tell). Maybe you are speaking figuratively in which case I recommend that if you have concrete suggestions for improving the doc that you add comments to the doc wiki.

Which I have been doing when I see them. However, most of the doc that you can post on the Wiki. (I haven't looked alot at it. ) Seems to be for the language specification. Can the phobos docs be modified?
On a slight OT I wonder if/how the doc wiki is being used. Are comments 
removed as they are fixed in the doc? what's the process for using the wiki? 
I add my comments where I think they should go but I notice there's stuff 
ranging all over the map and to be honest I have no clue if Walter ever 
looks at it, how often, and what happens when he does look at it. 

Aug 01 2005
parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
 There's barely any documentation for the API as it is.   And a footnote
 about tolower( string ) on a man page is not enough for me.

I'm not sure what man page you are referring to since D doesn't have man pages (or footnotes from what I can tell). Maybe you are speaking figuratively in which case I recommend that if you have concrete suggestions for improving the doc that you add comments to the doc wiki.

Which I have been doing when I see them. However, most of the doc that you can post on the Wiki. (I haven't looked alot at it. ) Seems to be for the language specification. Can the phobos docs be modified?

There's a link at the bottom of the phobos page for the wiki. I don't know if the modules with stand-along pages have the link, though.
Aug 02 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 1 Aug 2005 16:54:49 +0000 (UTC), Shammah Chancellor wrote:


"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to 
affect
B hence no COW while "reverse(B)" says you want a reversed B hence COW.
That's one reason why I don't really like the current syntax hack of being
able to write B.tolower() to mean tolower(B).




 I was disagreing with Derek.  I think COW
 is a bad thing for API functions to be doing mysteriously.  It leads to crap
 like this:
 
 foo = foo.Replace("Hello","");
 dateFoo = dateFoo.AddDays(1);

Hi Shammah, I wasn't actually saying that .reverse must use CoW. I was saying that it didn't and that fact seems go counter to Walter's general principle (as I understand it) about when to use Cow or not. I thought that one should use CoW if the code is actually changing the data *and* the data might be accessible to the calling routine. Thus as the .reverse will change the data for lengths > 1, and the data is probably accessible to the code using .reverse, one could have expected it to CoW. Of course, I might be misunderstanding that 'general principle' ;-) As the current behaviour is documented, we can cope with this seeming exception. -- Derek Parnell Melbourne, Australia 2/08/2005 7:21:43 AM
Aug 01 2005
parent Shammah Chancellor <Shammah_member pathlink.com> writes:
In article <1as80g46qpg5w$.1dfr6mqon4u1t$.dlg 40tude.net>, Derek Parnell says...
On Mon, 1 Aug 2005 16:54:49 +0000 (UTC), Shammah Chancellor wrote:


"Derek Parnell" <derek psych.ward> wrote in message
news:a118xxgyuee7.t1828b9vk5du$.dlg 40tude.net...
 [snip]
Besides those reasons writing "B.reverse" to me indicates you want to 
affect
B hence no COW while "reverse(B)" says you want a reversed B hence COW.
That's one reason why I don't really like the current syntax hack of being
able to write B.tolower() to mean tolower(B).




 I was disagreing with Derek.  I think COW
 is a bad thing for API functions to be doing mysteriously.  It leads to crap
 like this:
 
 foo = foo.Replace("Hello","");
 dateFoo = dateFoo.AddDays(1);

Hi Shammah, I wasn't actually saying that .reverse must use CoW. I was saying that it didn't and that fact seems go counter to Walter's general principle (as I understand it) about when to use Cow or not. I thought that one should use CoW if the code is actually changing the data *and* the data might be accessible to the calling routine. Thus as the .reverse will change the data for lengths > 1, and the data is probably accessible to the code using .reverse, one could have expected it to CoW. Of course, I might be misunderstanding that 'general principle' ;-) As the current behaviour is documented, we can cope with this seeming exception.

No,no I understood that. I'm just being argumentative. I don't agree with you that tolower(b) and b.tolower should do different things. I don't agree that tolower(b) should even exist in the face of b.tolower. It clutters up my namespace. (Aside that user properties in D can't be added to a special char[] namespace =/ ) It just happened my example from VB was using class methods. For example in NET in order to round a date up from seconds to 5 minutes, you need to allocate like 3 or 4 datetimes. Of course you don't SEE this, but .AddDays, .AddSeconds etc. They all allocate a new datetime. For example, in .NET in order to get tomorrow's date: Dim tomorrow as String = DateTime.Now.Date.AddDays(1).ToLongDateString() That required allocations of 3 dateTimes and a String. I could completely be abusing the .NET Framework, but I searched far and wide and couldn't find an alternative that worked on the original. This kind of crud is why I'm very opposed to COW. In class methods or global functions. If .NET had D style dupes and I really wanted to operate on a new object: Dim tomorrow as String = DateTime.Now.Duplicate.Date.AddDays(1).ToLongDateString() One less allocation since AddDays didn't need/get it's own copy of the memory. You might still cite tolower(b) instead of b.tolower as not being as rediculous as what .NET wants. But I ask you thi: If somebody doesn't know your COW conventions, would they know the difference in what happens? In any case arr.dup.tolower would fit the same purpose just fine, and it's more explicit. -Sha
Aug 01 2005