www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is this a desing rationale? (static array object member)

reply Brian Hsu <brianhsu.hsu gmail.com> writes:
I posted a similar post at D.learn newsgroup and got a explanation about this.
But I have a question after got that answer: is this reasonable? 

Here is my code:

class Test
{
    int [] z = [1,1,1,1,1];
    void addByOne () { //Increase all elements in x by 1}
}

void fun1 (int y)
{
    int [] x = [1,1,1,1,1];
    foreach (int i, int v; x ) {
        x[i] = x[i]+y;
    }
}

void fun2 ()
{
    int [] y = [1,1,1,1,1];

    foreach (int i, int v; y ) {
        Stdout.format ("{} ", v);
    }

    Stdout.newline;
}

void main ()
{
    Test a = new Test();
    Test b = new Test();
    a.addByOne(); 
    // Now b.z will be [2,2,2,2,2]
    
    fun1(3);
    fun2(); // Still print [1,1,1,1,1]
}

So a.z and b.z pointed to same array, but int [] y and int [] z are still
different array instance.

Regan mentioned that this is because the array literal [1,1,1,1,1] create only
one instance of array*, so at first time I suspect that when compiler see
[1,1,1,1,1], it would translate that to a fixed memory address of something
like that. 

* http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=9580

But it clearly that int [] x and int [] y is still a different array instance
even they have same array literal initialization.  

So, after all, when is a.z and b.z point to same array instance exactly? Since
I believe it should be runtime, not compile time to do initialization of object
member, then why a.z and b.z is same array instance but int [] x and int [] y
is different array instance? 

Finally, is this behavior reasonable? Since I didn't declare that int [] as a
static class member, even though they have same initialization array literal,
but I would expect that a.z/b.z they should be different array have same
content. (As in Java or C++)

Or is there special reasons of this strange behaver?   
Sep 29 2007
next sibling parent reply downs <default_357-line yahoo.de> writes:
Brian Hsu wrote:
[snip]
 So a.z and b.z pointed to same array, but int [] y and int [] z are still
different array instance.
 
 Regan mentioned that this is because the array literal [1,1,1,1,1] create only
one instance of array*, so at first time I suspect that when compiler see
[1,1,1,1,1], it would translate that to a fixed memory address of something
like that. 
 

array literal, whereas y and z were initialized with _different_ array literals.
 Finally, is this behavior reasonable? Since I didn't declare that int [] as a
static class member, even though they have same initialization array literal,
but I would expect that a.z/b.z they should be different array have same
content. (As in Java or C++)

Different arrays with the same content. Of course, it helps to know that in D, an array is basically this: struct array(T) { T *ptr; size_t length; } So as you see, the memory area that the array uses _is_ the content :)
 Or is there special reasons of this strange behaver?   

It's really quite consistent, once you understand what arrays _are_ in D. --downs
Sep 29 2007
next sibling parent Regan Heath <regan netmail.co.nz> writes:
downs wrote:
 Brian Hsu wrote:
 [snip]
 So a.z and b.z pointed to same array, but int [] y and int [] z are still
different array instance.

 Regan mentioned that this is because the array literal [1,1,1,1,1] create only
one instance of array*, so at first time I suspect that when compiler see
[1,1,1,1,1], it would translate that to a fixed memory address of something
like that. 

array literal, whereas y and z were initialized with _different_ array literals.

My understanding is that a literal gets stored in the exe when it is compiled/linked. So, every literal you code exists somewhere in the exe (perhaps this isn't true for numeric literals but I believe it's true for strings and probably arrays). Further, the compiler could choose to load the literal data into read only memory upon execution. Also, the compiler might notice that one or more literals are identical and optimise by using the same literal loaded into the same memory for all X occurances in code. So, when dealing with literals: 1. Always assume they are read-only - D 2.0 string literals are invariant(char) so the compiler actually reminds/enforces this. -- perhaps there is a bug here that all array literals should be invariant types(1)? 2. Always copy literal data before attempting to modify it. Regan (1) eg. invariant(int)[] my Literal = [0,0,0,0,0,0,0,0,0,0]; class Foo { //error cannot implicitly convert 'invariant(int)' to 'int' //int[] myArr = [0,0,0,0,0,0,0,0,0,0]; //so you have to say... int[] myArr; this() { myArr = myLiteral.dup; //dup //OR myArr.length = myLiteral.length; myArr[] = myLiteral[]; //copy } }
Sep 29 2007
prev sibling next sibling parent reply Brian Hsu <brianhsu.hsu gmail.com> writes:
downs Wrote:

 Finally, is this behavior reasonable? Since I didn't declare that int [] as a
static class member, even though they have same initialization array literal,
but I would expect that a.z/b.z they should be different array have same
content. (As in Java or C++)

Different arrays with the same content.

Sorry for ambiguous, what I mean is different array _instance_ of same value instead of different pointers point to same array instance. Like same code in Java/C++ that a.z and b.z would point to different array instance even they have same initialization, change element at a.z won't have side effect to b.z. Since I write Java/C++ programing before learning D, it is just a little bit strange for me that object member array acts like class member array when semantic level they should be different array instance. So, is the following statement true? The compiler will create 3 array instance (Test.z, fun1.x, fun2.x) since they appear 3 time, and they will be at last translate to memory address at compile time. So in facts the code is looks like: class Test { int [] z = 0xFFFFF; // memory address of the array create by array literal } That's why different instance of class Test will have object member z point to same array instance. Is this correctly? If it is, I will add a page at Wiki4D to mention this behavior since it is a little bit strange for people who programming Java/C++ code before.
Sep 29 2007
parent downs <default_357-line yahoo.de> writes:
 Brian Hsu wrote:
 The compiler will create 3 array instance (Test.z, fun1.x, fun2.x)

address at compile time. So in facts the code is looks like:
 class Test
 {
     int [] z = 0xFFFFF; // memory address of the array create by array

 }

 That's why different instance of class Test will have object member z

Eh, I kinda assumed this was the normal behavior, but the other replies are indicating that this is merely a compiler artifact. So it would appear that the compiler _currently_ relates each literal to one area of memory, but that might change. The only safe way is probably to do what the others said - always dup literals before use. Sorry I can't help you further. --downs ._.
Sep 29 2007
prev sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
downs wrote:
 Brian Hsu wrote:
 [snip]
 So a.z and b.z pointed to same array, but int [] y and int [] z are still
different array instance.

 Regan mentioned that this is because the array literal [1,1,1,1,1] create only
one instance of array*, so at first time I suspect that when compiler see
[1,1,1,1,1], it would translate that to a fixed memory address of something
like that. 

array literal, whereas y and z were initialized with _different_ array literals.
 Finally, is this behavior reasonable? Since I didn't declare that int [] as a
static class member, even though they have same initialization array literal,
but I would expect that a.z/b.z they should be different array have same
content. (As in Java or C++)

Different arrays with the same content. Of course, it helps to know that in D, an array is basically this: struct array(T) { T *ptr; size_t length; } So as you see, the memory area that the array uses _is_ the content :)
 Or is there special reasons of this strange behaver?   

It's really quite consistent, once you understand what arrays _are_ in D. --downs

Except when you get to the part where arrays are initialized like classes instead of structs (x = new ....) and treated like class references when compared with null. In other words the array _syntax_ looks mostly like a class, but the _usage_ looks mostly like a struct. I don't think its even possible for user created types to act like this. It seems a little odd to me. But maybe I'm missing something obvious. --bb
Sep 29 2007
prev sibling next sibling parent reply janderson <askme me.com> writes:
Brian Hsu wrote:
 I posted a similar post at D.learn newsgroup and got a explanation about this.
But I have a question after got that answer: is this reasonable? 
 
 Here is my code:
 
 class Test
 {
     int [] z = [1,1,1,1,1];
     void addByOne () { //Increase all elements in x by 1}
 }
 
 void fun1 (int y)
 {
     int [] x = [1,1,1,1,1];
     foreach (int i, int v; x ) {
         x[i] = x[i]+y;
     }
 }
 
 void fun2 ()
 {
     int [] y = [1,1,1,1,1];
 
     foreach (int i, int v; y ) {
         Stdout.format ("{} ", v);
     }
 
     Stdout.newline;
 }
 
 void main ()
 {
     Test a = new Test();
     Test b = new Test();
     a.addByOne(); 
     // Now b.z will be [2,2,2,2,2]
     
     fun1(3);
     fun2(); // Still print [1,1,1,1,1]
 }
 
 So a.z and b.z pointed to same array, but int [] y and int [] z are still
different array instance.
 
 Regan mentioned that this is because the array literal [1,1,1,1,1] create only
one instance of array*, so at first time I suspect that when compiler see
[1,1,1,1,1], it would translate that to a fixed memory address of something
like that. 
 
 * http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=9580
 
 But it clearly that int [] x and int [] y is still a different array instance
even they have same array literal initialization.  
 
 So, after all, when is a.z and b.z point to same array instance exactly? Since
I believe it should be runtime, not compile time to do initialization of object
member, then why a.z and b.z is same array instance but int [] x and int [] y
is different array instance? 
 
 Finally, is this behavior reasonable? Since I didn't declare that int [] as a
static class member, even though they have same initialization array literal,
but I would expect that a.z/b.z they should be different array have same
content. (As in Java or C++)
 
 Or is there special reasons of this strange behaver?   

Maybe D should do a kinda lazy .dup in these situations. That is, sections like [1,1,1,1,1] would be marked as const somehow in the GC (or somewhere). If a modification occurs it would make a call to dup the first time.
Sep 29 2007
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 9/30/07, janderson <askme me.com> wrote:
 sections like [1,1,1,1,1] would be marked as const somehow in the GC (or
 somewhere).  If a modification occurs it would make a call to dup the
 first time.

This is all solved in D2.0, so really it's a non-issue. int[] x = [1,1,1,1,1]; /* won't compile */ const(int)[] x = [1,1,1,1,1]; /* OK, but now you can't accidently modify the array */ int[] x = [1,1,1,1,1].dup; /* OK */ The fact that the first version won't compile is precisely why const correctness is a good thing. Without const, there's nothing to stop you typing that line of code into a program and hitting undefined behavior, and not even realising it. const saves the day.
Sep 29 2007
next sibling parent Brian Hsu <brianhsu.hsu gmail.com> writes:
Janice Caron Wrote:

 On 9/30/07, janderson <askme me.com> wrote:
 sections like [1,1,1,1,1] would be marked as const somehow in the GC (or
 somewhere).  If a modification occurs it would make a call to dup the
 first time.

This is all solved in D2.0, so really it's a non-issue.

Here is my experiment result using DMD 2.004, it seems doesn't mark array literal constant default now, hope it will in future version. import std.stdio; class Test { //OK, all class instance.x point to same array. int [] x = [1,1,1,1,1]; void addByOne () { foreach (int i, int value; x) { x[i]++; } } } void main () { int [] x = [1,1,1,1,1]; // Compile OK const(int) [] y = [1,1,1,1,1]; // Compile OK // Coimpile Error // int [] z = [1,1,1,1,1].dup; int [] z = ([1,1,1,1,1]).dup; // Compile Error, can't do implicitly converting Test a = new Test(); Test b = new Test(); a.addByOne(); // 2,2,2,2,2 foreach (int value; b.x) { writef ("%d ", value); } writefln(); }
Sep 30 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
Janice Caron wrote:
 On 9/30/07, janderson <askme me.com> wrote:
 sections like [1,1,1,1,1] would be marked as const somehow in the GC (or
 somewhere).  If a modification occurs it would make a call to dup the
 first time.

This is all solved in D2.0, so really it's a non-issue. int[] x = [1,1,1,1,1]; /* won't compile */

As you've discovered, actually it will. This however wont (in D 2.0): char[] x = "test"; and that is because string literals are invariant(char). I made a suggestion in another thread that array literals should be invariant(T) as well, then you would get the behaviour you describe below.
 const(int)[] x = [1,1,1,1,1]; /* OK, but now you can't accidently
 modify the array */
 
 int[] x = [1,1,1,1,1].dup; /* OK */
 
 The fact that the first version won't compile is precisely why const
 correctness is a good thing. Without const, there's nothing to stop
 you typing that line of code into a program and hitting undefined
 behavior, and not even realising it. const saves the day.

Not "const" specifically but "invariant" which indicates data which can never ever change (even if nasty pointer tricks are used - and these will likely cause a seg fault). This differs (as I'm sure you're aware) from "const" which is a read-only view of data which may change via another reference/pointer/etc. Note; I am referring to D 2.0's current const implementation. Note; I realise that you know all this I replied for the benefit of others :) Regan
Sep 30 2007
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/30/07, Brian Hsu <brianhsu.hsu gmail.com> wrote:
 Here is my experiment result using DMD 2.004, it seems doesn't mark array
literal constant default now, hope it will in future version.

<snip>

     //OK, all class instance.x point to same array.
     int [] x = [1,1,1,1,1];

Bugger! Well, I guess D2.0 is experimental and we're beta testing it. I guess that should be considered a bug. Const should be required, just as it is with strings.
Sep 30 2007
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Brian Hsu" <brianhsu.hsu gmail.com> wrote in message 
news:fdkvv1$2a8q$1 digitalmars.com...
 Finally, is this behavior reasonable? Since I didn't declare that int [] 
 as a static class member, even though they have same initialization array 
 literal, but I would expect that a.z/b.z they should be different array 
 have same content. (As in Java or C++)

If you examine the difference between this and the Java syntax, there lies your problem: class Test { int[] z = new int[] {1,1,1,1,1}; ... } notice the new keyword. This means you are making a new array. In your example, you are not making a new array, you are setting a pointer to an existing array. In java, it is more akin to this: class Test { static int[] _z = new int[] {1,1,1,1,1}; int[] z = _z; } (BTW, haven't done java coding in a while, this might be syntactically wrong) In java, to get the behavior you want, you have to declare a new array. The same is for D. The only issue is that default initializers outside constructers are evaluated at compile-time. So to get the equivalent, you must do this: class Test { int[] z; Test() { z = ([1,1,1,1,1]).dup; } } Note also that you are using a dynamic array. If you wanted to use a static array (one whose length is constant), this works as it copies the contents from the other array. class Test { int[5] z = [1,1,1,1,1]; } BTW, I agree with Janice's and other's suggestions that array literals should be constant. Because this example is even more disturbing: import tango.io.Stdout; class Test { int[] z = [1,1,1,1,1]; void addOne() { for(int i = 0; i < z.length; i++) { z[i]++; } } } int main(char[][] args) { Test a = new Test; a.addOne(); Test b = new Test; // outputs 22222 foreach(int x; b.z) { Stdout(x); } Stdout.newline(); return 0; } I can see bugs creeping in too. Imagine if you originally coded this exact example with the following declaration of z: int[5] z = [1,1,1,1,1]; Then later on you decide it is better to have z be a dynamic array. There are no compiler warnings/errors! By just removing the 5, I've introduced a super-subtle far-reaching bug. -Steve
Oct 01 2007
parent Brian Hsu <brianhsu.hsu gmail.com> writes:
Steven Schveighoffer Wrote:

 If you examine the difference between this and the Java syntax, there lies 
 your problem:
 
 class Test
 {
    int[] z = new int[] {1,1,1,1,1};
    ...
 }
 
 notice the new keyword.  This means you are making a new array.
 

It's valid when initial object member array in class using just array literal in Java (C++ seems forbid this, sorry that I'm wrong before). Following is an correct Java programing which does not have same problem, so the behavior of D is a little bit strange for me before. public class Test { int [] x = {1,1,1,1,1}; public void addByOne () { for (int i = 0; i < x.length; i++) { x[i]++; } } public static void main (String [] args) { Test a = new Test(); Test b = new Test(); a.addByOne (); System.out.println ("b[0]:" + b.x[0]); } } But now I could understand why this happened, and I think array literal should mark as invariant(T) [] in future compiler as others said. Then it should be able to avoid this problem for others who using Java as their programming language and want to learn D.
Oct 01 2007