digitalmars.D.bugs - [Issue 5603] New: Initialization syntax for dynamic arrays
- d-bugmail puremagic.com (102/102) Feb 16 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
- d-bugmail puremagic.com (39/39) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
- d-bugmail puremagic.com (32/64) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
- d-bugmail puremagic.com (30/70) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
- d-bugmail puremagic.com (14/34) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
http://d.puremagic.com/issues/show_bug.cgi?id=5603 Summary: Initialization syntax for dynamic arrays Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc --- Comment #0 from bearophile_hugs eml.cc 2011-02-16 16:23:03 PST --- Fixed-sized arrays allow to specify an initialization value, or to not specify one, or to leave the stack memory untouched, for special situations where performance matters a lot: // program #1 void main() { int[5] a2 = void; int[5] a1 = 1; int[5][5] m2 = void; int[5][5] m1 = 1; } Dynamic arrays don't allow to specify an initialization value (expecially after the deprecation of 'typedef', that used to allow the definition of a new int type with a different init value). DMD has no syntax to allocate an unitialized array, and currently it is not able to avoid double initializations of dynamic arrays, an example: // program #2 void main() { auto a1 = new int[5]; a1[] = 1; } Asm of program #2 (optimized built). __d_newarrayT performs a first initialization to zero, __memset32 performs a second initialization: __Dmain comdat L0: sub ESP,01Ch mov EAX,offset FLAT:_D11TypeInfo_Ai6__initZ push 5 push EAX call near ptr __d_newarrayT mov 0Ch[ESP],EAX mov 010h[ESP],EDX push dword ptr 0Ch[ESP] push 1 push EDX call near ptr __memset32 add ESP,014h add ESP,01Ch xor EAX,EAX ret So to translate the program #1 for dynamic arrays you need something like this (I am not sure this is fully correct. GC.disable are used because m1/m2 contain uninitialized pointers): // program #3 import core.memory: GC; void main() { uint ba1 = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE; int n1 = 5; int[] a1 = (cast(int*)GC.malloc(int.sizeof * n1, ba1))[0 .. n1]; uint ba2 = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE; int n2 = 5; int[] a2 = (cast(int*)GC.malloc(int.sizeof * n2, ba2))[0 .. n2]; a2[] = 1; uint ba3a = GC.BlkAttr.APPENDABLE; uint ba3b = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE; int n3 = 5; GC.disable(); int[][] m1 = (cast(int[]*)GC.malloc((int[]).sizeof * n3, ba3a))[0 .. n3]; foreach (ref row; m1) row = (cast(int*)GC.malloc(int.sizeof * n3, ba3b))[0 .. n3]; GC.enable(); uint ba4a = GC.BlkAttr.APPENDABLE; uint ba4b = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE; int n4 = 5; GC.disable(); int[][] m2 = (cast(int[]*)GC.malloc((int[]).sizeof * n4, ba4a))[0 .. n4]; foreach (ref row; m2) { row = (cast(int*)GC.malloc(int.sizeof * n4, ba4b))[0 .. n4]; row[] = 1; } GC.enable(); } So to avoid all that bug-prone mess I suggest to allow the fixed-sized array syntax for dynamic arrays too: // program #4 void main() { auto a2 = new int[5] = void; auto a1 = new int[5] = 1; auto m2 = new int[][](5, 5) = void; auto m1 = new int[][](5, 5) = 1; } An usage of unitialized memory: http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html "An Efficient Representation for Sparse Sets" (1993), by Preston Briggs, Linda Torczon: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.7319 From programming pearls book: http://books.google.it/books?id=kse_7qbWbjsC&pg=PA207&lpg=PA207&dq=programming+pearls+uninitialized&source=bl&ots=DfAXDLwT5z&sig=X53xYgD0wdn_Rwl7tFNeCiRt4No&hl=en&ei=HWVcTa35EYOdOsLI5OYL&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBUQ6AEwAA -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 16 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5603 Steven Schveighoffer <schveiguy yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |schveiguy yahoo.com --- Comment #1 from Steven Schveighoffer <schveiguy yahoo.com> 2011-02-17 04:30:23 PST --- This does not need to be a language thing, library could suffice: auto a = createArray!(int[][])(5, 5, 1); // initialize 5x5 array with the value 1 in each cell. auto a = createUninitArray!(int[][])(5, 5); // name needs work... In which we can hide your shown implementation (this can be factored out a bit). BTW, your code does not work properly for array appending. It does not initialize the hidden "allocated length" field, which would likely result in reallocation on append. Some other functions probably needed: a.extendUninit(size_t newlength); which is like a.length = newlength but does not initialize the new area. ---------------- I agree a syntax change would be more in line with current array allocation operations (which are currently all syntax based), but I don't really like your proposed syntax. I would propose if we wanted to do a syntax change to do: auto a = new int[][](5, 5, 1); auto a = new int[][](5, 5, void); Where the optional final argument determines the initial value. This fits perfectly with the current array creation syntax: new T(dim1, dim2, ..., dimN) where T is a N dimensional array. We can just add an extra parameter for the value. ------------------ One problem with this whole proposal is the issue with struct semantics. That is, let's say a struct has a postblit, and you wanted to create an array of those structs with a default value. Should the runtime call the postblit for each element? I'd say it should. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 17 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5603 --- Comment #2 from bearophile_hugs eml.cc 2011-02-17 05:00:47 PST --- (In reply to comment #1)This does not need to be a language thing, library could suffice:Of course, you may create template functions that do what I have shown (and better).BTW, your code does not work properly for array appending. It does not initialize the hidden "allocated length" field, which would likely result in reallocation on append.I see, thank you. That code I have written is clearly bug-prone, that's why a built-in syntax (or functions in Phobos) are useful.Some other functions probably needed: a.extendUninit(size_t newlength); which is like a.length = newlength but does not initialize the new area.This is a possible thing to add. But it looks less useful because when you want uninitialized memory, you want max performance, so you probably don't want to change the array length.I agree a syntax change would be more in line with current array allocation operations (which are currently all syntax based), but I don't really like your proposed syntax. I would propose if we wanted to do a syntax change to do: auto a = new int[][](5, 5, 1); auto a = new int[][](5, 5, void); Where the optional final argument determines the initial value. This fits perfectly with the current array creation syntax: new T(dim1, dim2, ..., dimN) where T is a N dimensional array. We can just add an extra parameter for the value.I am strongly against this idea because it's too much bug-prone. It's too much easy to add or remove a [] by mistake, or add or remove the initialization value by mistake, so you may end with the wrong number of dimensions, etc. auto a = new int[][](5, 5, 2); auto a = new int[][][](5, 5, 2); auto a = new int[][][](5, 5, 5, 2);One problem with this whole proposal is the issue with struct semantics. That is, let's say a struct has a postblit, and you wanted to create an array of those structs with a default value. Should the runtime call the postblit for each element? I'd say it should.This exactly same problem is present in the initialization syntax for fixed-sized arrays, so the best solution is to just copy that semantics: struct Foo { int x; this(this) { x++; } } void main() { Foo[2] foos = Foo(1); assert(foos[0].x == 2); assert(foos[1].x == 2); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 17 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5603 --- Comment #3 from Steven Schveighoffer <schveiguy yahoo.com> 2011-02-17 05:36:45 PST --- (In reply to comment #2)(In reply to comment #1)I agree, a method to do this correctly would be good to have to avoid people doing it incorrectly.BTW, your code does not work properly for array appending. It does not initialize the hidden "allocated length" field, which would likely result in reallocation on append.I see, thank you. That code I have written is clearly bug-prone, that's why a built-in syntax (or functions in Phobos) are useful.I'm thinking of the case where I want to add N elements, but I'm going to assign them one at a time. This saves the initialization of the N elements before I write them (a useless operation).Some other functions probably needed: a.extendUninit(size_t newlength); which is like a.length = newlength but does not initialize the new area.This is a possible thing to add. But it looks less useful because when you want uninitialized memory, you want max performance, so you probably don't want to change the array length.First, this only really happens when the type is numerical. For example, a string array would fail to compile with an integral initializer. Also, a void initializer cannot be mistaken for a dimension size. Second, I can see what you are saying, but I don't think this error will affect much in practice. It isn't often that one changes the number of dimensions. Readability-wise, however, it's not obvious whether the last element is an initializer (an IDE might make this clearer with syntax coloring). Your proposal clearly separates the value from the dimensions, but it probably is unacceptable due to parsing requirements. Plus it looks very bizarre. If we are doing syntax changes, I think we need something unorthodox if we want to make this clear. What about: auto a = new int[][](5, 5; 2); auto a = new int[][](5, 5, =2); auto a = new int[][](5, 5 : 2); auto a = new int[][](5, 5) : 2; I still think the original syntax I proposed is not different from functions that contain default parameters, it should be able to be dealt with for most people. It is also advantageous to try and come up with a reasonable solution that would be acceptable to the language author. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------I agree a syntax change would be more in line with current array allocation operations (which are currently all syntax based), but I don't really like your proposed syntax. I would propose if we wanted to do a syntax change to do: auto a = new int[][](5, 5, 1); auto a = new int[][](5, 5, void); Where the optional final argument determines the initial value. This fits perfectly with the current array creation syntax: new T(dim1, dim2, ..., dimN) where T is a N dimensional array. We can just add an extra parameter for the value.I am strongly against this idea because it's too much bug-prone. It's too much easy to add or remove a [] by mistake, or add or remove the initialization value by mistake, so you may end with the wrong number of dimensions, etc. auto a = new int[][](5, 5, 2); auto a = new int[][][](5, 5, 2); auto a = new int[][][](5, 5, 5, 2);
Feb 17 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5603 --- Comment #4 from bearophile_hugs eml.cc 2011-02-17 10:13:58 PST --- (In reply to comment #3)First, this only really happens when the type is numerical. For example, a string array would fail to compile with an integral initializer. Also, a void initializer cannot be mistaken for a dimension size. Second, I can see what you are saying, but I don't think this error will affect much in practice. It isn't often that one changes the number of dimensions. Readability-wise, however, it's not obvious whether the last element is an initializer (an IDE might make this clearer with syntax coloring). Your proposal clearly separates the value from the dimensions, but it probably is unacceptable due to parsing requirements. Plus it looks very bizarre.It looks somewhat like the fixed-sized initialization syntax.If we are doing syntax changes, I think we need something unorthodox if we want to make this clear. What about: auto a = new int[][](5, 5; 2); auto a = new int[][](5, 5, =2); auto a = new int[][](5, 5 : 2); auto a = new int[][](5, 5) : 2;The last line is very close to my suggested syntax, and it has the advantage to be intuitive for D programmers, because it's a copy of the fixed-sized initialization syntax: auto a = new int[][](5, 5) = void; instead of: int[5][5] a = void;I still think the original syntax I proposed is not different from functions that contain default parameters,Named arguments (as Python ones) are safer than default ones. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 17 2011