www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Array type conversion

reply Mark Burnett <unstained gmail.com> writes:
Content-Type: text/plain

I have spent much of the last couple of weeks trying to choose a language in
which to write the code for my PhD thesis (in computational physics).  I had
very nearly decided on using c++, when yesterday I stumbled upon D.  So far I'm
ecstatic about it's feature set.

Still there are one or two things that strike me as odd:  in particular that
arrays of a derived type can be converted to an array of a base type.  As
pointed out by Marshall Cline,
[http://www.parashift.com/c++-faq-lite/proper-inheritance.html#faq-21.4] this
is dangerous.  Is this possibly a holdover from c++?  It is explicitly
mentioned in the array page that they behave this way, so I am not convinced
that is the case.

Fortunately not all of the problems associated with doing this in c++ exist in
d (see attached code).  What d seems to do is treat all derived[] as base[],
which is silly because if i want a base[], I would just declare it that way. 
Asking for a derived[] is how I say that I  *only* want derived objects in
there.

The attached code generates this output using gdc 0.23 on OSX:
Here are the different apples we have:
A P P L E -- Red
A P P L E -- Red
A P P L E -- Red
Orange -- Orange
A P P L E -- Red

Please, keep in mind that this test is the first d I have written, and I don't
claim to understand the language.  Array type promotion just seems odd to
include, and I would like to understand the motivation for doing so.
Apr 28 2007
next sibling parent reply torhu <fake address.dude> writes:
Mark Burnett wrote:
Cline, 
[http://www.parashift.com/c++-faq-lite/proper-inheritance.html#faq-21.4] 
this is dangerous.  Is this possibly a holdover from c++?  It is 
explicitly mentioned in the array page that they behave this way, so I 
am not convinced that is the case.

I guess you're already aware that objects in D are reference types, so 
the specific problem mentioned on that page does not apply?  I you want 
value types, you would use structs, which cannot be subclassed.
Apr 28 2007
parent reply Mark Burnett <unstained gmail.com> writes:
Content-Type: text/plain

torhu Wrote:

 Mark Burnett wrote:
 Cline, 
 [http://www.parashift.com/c++-faq-lite/proper-inheritance.html#faq-21.4] 
 this is dangerous.  Is this possibly a holdover from c++?  It is 
 explicitly mentioned in the array page that they behave this way, so I 
 am not convinced that is the case.
 
 I guess you're already aware that objects in D are reference types, so 
 the specific problem mentioned on that page does not apply?  I you want 
 value types, you would use structs, which cannot be subclassed.

Right the specific problem of size differences in c++ does not exist (Orange has extra data members to demonstrate this), but the "a container of derived is not a container of base" problem still does. Imagine adding a core function to Apple (and not to Orange of course), then after an Orange is added to the array, you loop go through and core all the Apples..only one of them isn't an Apple and you have undefined behavior. The attached file again compiles with gdc, and crashes at runtime (though of course it could do almost anything).
Apr 28 2007
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Mark Burnett wrote

 only one of them isn't an Apple and you have undefined behavior. 

From the specs: | Multiple dynamic arrays can share all or parts of the array data. With the asssignment: justsomefruits = lotsofapples; you did pointer assignments and thereby declared that the array data can be interpreted as both: Fruits and Apples. The handling of this declaration is up to your intelligence. If you fail, then you might have tricked out yourself. If you wanted a componentwise array copy you should have written: justsomefruits[] = lotsofapples; // observe the [] and the compiler would have served you with appropriate error messages. -manfred
Apr 28 2007
parent reply Mark Burnett <unstained gmail.com> writes:
Manfred Nowak Wrote:

 Mark Burnett wrote
 
 only one of them isn't an Apple and you have undefined behavior. 

From the specs: | Multiple dynamic arrays can share all or parts of the array data. With the asssignment: justsomefruits = lotsofapples; you did pointer assignments and thereby declared that the array data can be interpreted as both: Fruits and Apples. The handling of this declaration is up to your intelligence. If you fail, then you might have tricked out yourself. If you wanted a componentwise array copy you should have written: justsomefruits[] = lotsofapples; // observe the [] and the compiler would have served you with appropriate error messages. -manfred

Certainly making a copy prevents this problem, but what I'm really curious about is the motivation for including these imlicit conversions (they are mentioned specifically on the Array description page on digitalmars.com). They're not safe in general, so I imainge library designers will end up using their own user-defined array/vector type just as in c++, so that such errors are caught at compile time. This seems to limit the usefulness of these built in array types. Sure they're bounds-safe, but they don't seem 100% type-safe. For example: std::vector<derived> ad; std::vector<base> ab; ab = ad; // c++ compiler error A compiler error is more what I would expect. Treating a container of derived as a container of base is an error. Not to speculate too much on the reasons, but is it just too much overhead for a built-in type to disallow this behavior? Am I underestimating the usefulness of these built-in arrays in library design? Mark PS Thanks for your replies so far ;)
Apr 28 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Mark Burnett wrote:
 Certainly making a copy prevents this problem, but what I'm really curious
about is the motivation for including these imlicit conversions (they are
mentioned specifically on the Array description page on digitalmars.com).
 
 They're not safe in general,

Why not?
 so I imainge library designers will end up using their own user-defined
array/vector type just as in c++, so that such errors are caught at compile
time.  This seems to limit the usefulness of these built in array types.  Sure
they're bounds-safe, but they don't seem 100% type-safe.
 
 For example:
 
 std::vector<derived> ad;
 std::vector<base> ab;
 
 ab = ad; // c++ compiler error
 
 A compiler error is more what I would expect.  Treating a container of derived
as a container of base is an error.

But that isn't what is happening with D. base[] is an array of *references* to base, so the slicing problem one has in C++ is not possible in D.
 
 Not to speculate too much on the reasons, but is it just too much overhead for
a built-in type to disallow this behavior?  Am I underestimating the usefulness
of these built-in arrays in library design?
 
 Mark
 
 PS Thanks for your replies so far ;)

Apr 28 2007
next sibling parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 Mark Burnett wrote:
 Certainly making a copy prevents this problem, but what I'm really
 curious about is the motivation for including these imlicit
 conversions (they are mentioned specifically on the Array description
 page on digitalmars.com).

 They're not safe in general,

Why not?
 so I imainge library designers will end up using their own
 user-defined array/vector type just as in c++, so that such errors are
 caught at compile time.  This seems to limit the usefulness of these
 built in array types.  Sure they're bounds-safe, but they don't seem
 100% type-safe.

 For example:

 std::vector<derived> ad;
 std::vector<base> ab;

 ab = ad; // c++ compiler error

 A compiler error is more what I would expect.  Treating a container of
 derived as a container of base is an error.

But that isn't what is happening with D. base[] is an array of *references* to base, so the slicing problem one has in C++ is not possible in D.

Slicing isn't the big issue. The big issue is semantics; an array of derived is not an array of base, by LSP. An array of (pointers/references to) derived is usable as an *immutable* array of base (for suitable English meaning of immutable, matching C++'s notion of the array (equivalently, the pointers it contains) being const. Java has runtime checks required because it allows conversion from array of Derived to array of Base, and that (as you know) also uses reference semantics. The conversion is widely viewed as a mistake in Java; if I pass a Derived[] around, the language should not silently allow one of its elements to refer to a Base object. -- James
Apr 28 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
James Dennett wrote:
 An array of (pointers/references to) derived is usable
 as an *immutable* array of base (for suitable English
 meaning of immutable, matching C++'s notion of the
 array (equivalently, the pointers it contains) being
 const.
 
 Java has runtime checks required because it allows
 conversion from array of Derived to array of Base,
 and that (as you know) also uses reference semantics.
 The conversion is widely viewed as a mistake in Java;
 if I pass a Derived[] around, the language should
 not silently allow one of its elements to refer to
 a Base object.

But a derived reference can always be implicitly converted to a base reference anyway. That's the point of polymorphism.
Apr 28 2007
next sibling parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 James Dennett wrote:
 An array of (pointers/references to) derived is usable
 as an *immutable* array of base (for suitable English
 meaning of immutable, matching C++'s notion of the
 array (equivalently, the pointers it contains) being
 const.

 Java has runtime checks required because it allows
 conversion from array of Derived to array of Base,
 and that (as you know) also uses reference semantics.
 The conversion is widely viewed as a mistake in Java;
 if I pass a Derived[] around, the language should
 not silently allow one of its elements to refer to
 a Base object.

But a derived reference can always be implicitly converted to a base reference anyway. That's the point of polymorphism.

That's not the issue here either. One more level of indirection is present when dealing with arrays of references or references to references. The point is that a reference to a derived reference must *not* be converted to a reference to a base reference, just as an array of derived references must not be converted to an array of base references in case any is changed to a reference to an object that is not a derived. -- James
Apr 28 2007
parent jovo <jovo at.home> writes:
James Dennett Wrote:
 
 The point is that a reference to a derived reference
 must *not* be converted to a reference to a base
 reference, just as an array of derived references
 must not be converted to an array of base references
 in case any is changed to a reference to an object
 that is not a derived.
 

Interestingly, given: class B{} class D: B{} void f(inout B x){ x = new B(); } void main(){ D[] a1 = new D[3]; B[] a2 = a1; f(a1[1]); // Error: cast(B)(a1[1u]) is not an lvalue f(a2[1]); // naturally works jovo
Apr 29 2007
prev sibling parent Mark Burnett <unstained gmail.com> writes:
Walter Bright Wrote:

 James Dennett wrote:
 An array of (pointers/references to) derived is usable
 as an *immutable* array of base (for suitable English
 meaning of immutable, matching C++'s notion of the
 array (equivalently, the pointers it contains) being
 const.
 
 Java has runtime checks required because it allows
 conversion from array of Derived to array of Base,
 and that (as you know) also uses reference semantics.
 The conversion is widely viewed as a mistake in Java;
 if I pass a Derived[] around, the language should
 not silently allow one of its elements to refer to
 a Base object.

But a derived reference can always be implicitly converted to a base reference anyway. That's the point of polymorphism.

Java can allow treating D[] as a mutable B[] only because of its runtime checks. That way it can just throw an exception when you do things like call DerivedA.foo on DerviedB. D doesn't have this, and so it's arrays are just as type unsafe as C++'s. You really should have a look at Chapter 24 of Marshall Cline's excellent FAQ again. He describes the issue perhaps better than I can. I am actually a little surprised that there is a difference in the way conversions from D[] -> B[] and D[] -> I[] work. It seems that the easiest way to fix this is to remove the implicit D[] -> B[]. Though as James suggests, it would be safe (and useful) to pass D[] as an immutable B[] *or* immutable I[]. Mark
Apr 29 2007
prev sibling next sibling parent Mark Burnett <unstained gmail.com> writes:
Content-Type: text/plain

Walter Bright Wrote:

 
 They're not safe in general,

Why not?

You can end up performing operations associated with one type on an object that is not that type as demonstrated in fruit3.d with the core() function. I appreciate that silicing is not the problem. It's this (seemingly?) undefined behavior that is. I did, however, find how to achieve the behavior I was looking for with interfaces. Arrays of objects are not implicitly converted to arrays of their interfaces. Which brings me to a quick tangent question: Are there still plans to implement interface contracts? I was just reading an old usenet thread about the posibility. FYI, contracts and integrated unittest are two major feature draws for me (scientists often write awful code). All-in-all I am strongly leaning toward d for my project. I hope more people start cactching on to how good it looks ;) Thanks again, Mark
Apr 28 2007
prev sibling parent reply Mike Capp <mike.capp gmail.com> writes:
Walter Bright wrote:
 Mark Burnett wrote:

 Treating a container of derived as a container
 of base is an error.

But that isn't what is happening with D. base[] is an array of *references* to base, so the slicing problem one has in C++ is not possible in D

I don't think he's talking about slicing, I think he's talking about the type-system hole. It's not unreasonable to assume when reading/debugging code that any members of a Foo[] will be of type Foo or, if not, that a cast will have been required somewhere to indicate that fishy things are afoot (afin?). This conversion subverts that. OP: I raised this about a year ago and nobody seemed bothered then; I don't imagine that's changed. cheers Mike
Apr 28 2007
parent Mark Burnett <unstained gmail.com> writes:
Content-Type: text/plain

 Treating a container of derived as a container
 of base is an error.



  OP: I raised this about a year ago and nobody seemed bothered then; I don't
 imagine that's changed.

After playing around a bit more, I've discovered that (at least with gdc) rewriting the addfavoritefruit function to use the ~= operator to append institutes copy on write by default, while this is *not* the case for the previously posted indexed veresion. If the keyword "in" guaranteed that all internal writes were copies, then it would probably be unnecessary to change the implicit conversion behavior. No one in their right mind would pass derived[] to a f(inout base[]), so the only time the issue would arise is in the local scope, which at least narrows the potential bug down for the programmer. So basically, is there a reason for the following distinction? foo(in baseT [] base) { base[27] = new derived2; // does not create a copy -- not correct, see below base ~= new derived2; // creates a copy } If not, a consistent change to the implimentation would have practically no effect on the language specification, while (for practical purposes) fixing this problem. Correction: as I was experimenting with this some more I noticed that base[i] = ...; *does* seem to create a copy, but it does so *after* the assignment takes place (during the return?). If this is really the case, it's almost certainly unintended, and changing it would eliminate this problem for many purposes. Mark
Apr 28 2007
prev sibling next sibling parent janderson <askme me.com> writes:
Mark Burnett wrote:
 I have spent much of the last couple of weeks trying to choose a language in
which to write the code for my PhD thesis (in computational physics).  I had
very nearly decided on using c++, when yesterday I stumbled upon D.  So far I'm
ecstatic about it's feature set.
 
 Still there are one or two things that strike me as odd:  in particular that
arrays of a derived type can be converted to an array of a base type.  As
pointed out by Marshall Cline,
[http://www.parashift.com/c++-faq-lite/proper-inheritance.html#faq-21.4] this
is dangerous.  Is this possibly a holdover from c++?  It is explicitly
mentioned in the array page that they behave this way, so I am not convinced
that is the case.
 
 Fortunately not all of the problems associated with doing this in c++ exist in
d (see attached code).  What d seems to do is treat all derived[] as base[],
which is silly because if i want a base[], I would just declare it that way. 
Asking for a derived[] is how I say that I  *only* want derived objects in
there.
 
 The attached code generates this output using gdc 0.23 on OSX:
 Here are the different apples we have:
 A P P L E -- Red
 A P P L E -- Red
 A P P L E -- Red
 Orange -- Orange
 A P P L E -- Red
 
 Please, keep in mind that this test is the first d I have written, and I don't
claim to understand the language.  Array type promotion just seems odd to
include, and I would like to understand the motivation for doing so.

What is happening is a pointer copy from justsomefruits to lotsofapples. I think that D should really enforce that the programmer writes: justsomefruits.ptr = lotsofapples.ptr. Other then that though, I don't really have a problem with this type of conversion since it is really useful for polymorphisms. Consider that you may want to write some sort of generic function that takes the derived class like: void sort(Fruit [] basket) { } Fruit [] justsomefruits; //I only have an array of fruits here because this particular class only wants to work on fruits (ie fruit has extra properties I know about). You start to see how beneficial it can be. In my option its a neat feature. -Joel
Apr 28 2007
prev sibling next sibling parent reply janderson <askme me.com> writes:
Mark Burnett wrote:
 I have spent much of the last couple of weeks trying to choose a language in
which to write the code for my PhD thesis (in computational physics).  I had
very nearly decided on using c++, when yesterday I stumbled upon D.  So far I'm
ecstatic about it's feature set.
 
 Still there are one or two things that strike me as odd:  in particular that
arrays of a derived type can be converted to an array of a base type.  As
pointed out by Marshall Cline,
[http://www.parashift.com/c++-faq-lite/proper-inheritance.html#faq-21.4] this
is dangerous.  Is this possibly a holdover from c++?  It is explicitly
mentioned in the array page that they behave this way, so I am not convinced
that is the case.
 
 Fortunately not all of the problems associated with doing this in c++ exist in
d (see attached code).  What d seems to do is treat all derived[] as base[],
which is silly because if i want a base[], I would just declare it that way. 
Asking for a derived[] is how I say that I  *only* want derived objects in
there.
 
 The attached code generates this output using gdc 0.23 on OSX:
 Here are the different apples we have:
 A P P L E -- Red
 A P P L E -- Red
 A P P L E -- Red
 Orange -- Orange
 A P P L E -- Red
 
 Please, keep in mind that this test is the first d I have written, and I don't
claim to understand the language.  Array type promotion just seems odd to
include, and I would like to understand the motivation for doing so.

Ok looking at your example again: I think the real issue is this: // Code to test array type promotion. import std.stdio; class Fruit { enum color { Red, Orange, Fuchsia }; static char[] [color] colorlist; color mycolor; char [] name; static this() // Love these :) { colorlist[color.Red] = "Red"; colorlist[color.Orange] = "Orange"; colorlist[color.Fuchsia] = "Fuchsia"; } this() { mycolor = color.Fuchsia; name = "Generic Fruit"; } void whatkind() { writefln("%s -- %s", name, colorlist[mycolor]); } } class Apple : Fruit { this() { mycolor = color.Red; name = "A P P L E"; } void DropApple() { writefln("DropApple"); } } class Orange : Fruit { struct orangesarebiggerthanapples { int numberofbumps = 42; double ph = 5.3; } this() { mycolor = color.Orange; name = "Orange"; } void EatOrange() { writefln("EatOrange"); } } void addfavoritefruit(Fruit [] basket, int index) { basket[index] = new Orange; } void main(char [] [] args) { Apple [] lotsofapples; Fruit [] justsomefruits; lotsofapples.length = 5; lotsofapples[0] = new Apple; lotsofapples[1] = new Apple; lotsofapples[2] = new Apple; justsomefruits = lotsofapples; // Dangerous! justsomefruits.addfavoritefruit(3); lotsofapples[4] = new Apple; writefln("Here are the different apples we have:"); foreach (apple; lotsofapples) { apple.DropApple(); } while(true) {} } DropApple DropApple DropApple EatOrange //What the hell, I never called this function. DropApple //2 //Even worse, remove the EatOrange DropApple DropApple DropApple Error: Access Violation The problem is the refcopy. I'm not sure this sort of conversion should be band. However I'm not sure what the correct type of checking should be used. Maybe it should only be converted when passed into the function (otherwise require a .ptr qualifier). It would still have potential issues but I think it would be less error prone. -Joel
Apr 28 2007
parent Mark Burnett <unstained gmail.com> writes:
janderson Wrote:

 DropApple
 DropApple
 DropApple
 EatOrange  //What the hell, I never called this function.
 DropApple
 
 
 //2
 
 //Even worse, remove the EatOrange
 
 DropApple
 DropApple
 DropApple
 Error: Access Violation
 
 
 The problem is the refcopy.  I'm not sure this sort of conversion should 
 be band.  However I'm not sure what the correct type of checking should 
 be used.  Maybe it should only be converted when passed into the 
 function (otherwise require a .ptr qualifier).  It would still have 
 potential issues but I think it would be less error prone.
 
 -Joel

Bingo. D does not exhibit this behavior with interfaces, however, so designers are safe using arrays of interfaces (only!).
Apr 28 2007
prev sibling parent reply Thomas Kuehne <thomas-dloop kuehne.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark Burnett schrieb am 2007-04-28:
 Still there are one or two things that strike me as odd:  in particular
 that arrays of a derived type can be converted to an array of a base type.

[...] Below is a simplified sample: # import std.stdio; # # class Base{ # int x; # } # # class Derived : Base{ # long y; # } # # void main(){ # Derived[] derived = new Derived[1]; # Base[] base; # # derived[0] = new Derived(); # derived[0].x = 1; # derived[0].y = 2; # writefln("derieved[0] -> (x:%s, y:%s)", derived[0].x, derived[0].y); # # base = derived; // <- this is the issue # writefln("base[0] -> (x:%s)", base[0].x); # # base[0] = new Base(); # base[0].x = 3; # writefln("base[0] -> (x:%s)", base[0].x); # # writefln("derieved[0] -> (x:%s, y:%s) !!!random y!!!", derived[0].x, derived[0].y); # } Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFGNcY1LK5blCcjpWoRArn8AJ9yGL1zyYJZRea2odm0ZPNzebpGnQCeI219 X6rZ2SXWKt1ZF3dGxMol+Ag= =EClm -----END PGP SIGNATURE-----
Apr 30 2007
parent Manfred Nowak <svv1999 hotmail.com> writes:
Thomas Kuehne wrote
 Below is a simplified sample:
 #    base  = derived; // <- this is the issue

That assignment above is totally useless, unless one wants to drop some data, that can only be hold by derived. But if one drops some of that data, every access through derived might behave unpredictable. The obvious fault is, that derived is not nulled immeditely after that assignment in order to prevent every further access. Is that what the compiler is supposed to do automatically: null out the RHS as a side effect? -manfred
Apr 30 2007