www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - The demise of T[new]

reply Walter Bright <newshound1 digitalmars.com> writes:
The purpose of T[new] was to solve the problems T[] had with passing T[] 
to a function and then the function resizes the T[]. What happens with 
the original?

The solution we came up with was to create a third array type, T[new], 
which was a reference type.

Andrei had the idea that T[new] could be dispensed with by making a 
"builder" library type to handle creating arrays by doing things like 
appending, and then delivering a finished T[] type. This is similar to 
what std.outbuffer and std.array.Appender do, they just need a bit of 
refining.

The .length property of T[] would then become an rvalue only, not an 
lvalue, and ~= would no longer be allowed for T[].

We both feel that this would simplify D, make it more flexible, and 
remove some awkward corner cases like the inability to say a.length++.

What do you think?
Oct 18 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.
I like how Andrei keeps trying to invent new possible ideas. Sometimes situations can be improved adding things, and some other times removing things. So arrays will keep being kinda values, so if inside a function you use an appender to add items to an array, outside the function, when the function returns, you will keep seeing the original shorter array. This is not fully intuitive (but you can get used to it just because you use arrays all the time). The following code is not possible any more because you can't change the length: import std.stdio: writefln; void main() { int[] a1 = new int[5]; a1[] = 1; int[] a2 = a1[1..3]; writefln(a1, " ", a2); // [1,1,1,1,1] [1,1] a2.length = a2.length + 4; writefln(a1, " ", a2); // [1,1,1,1,1] [1,1,0,0,0,0] } But how do you change the length of an array (because appending many single items isn't always what you need to do)? And how does that interacts with the slicing, as in this code example? Bye, bearophile
Oct 18 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 01:05:39 +0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing T[]  
 to a function and then the function resizes the T[]. What happens with  
 the original?

 The solution we came up with was to create a third array type, T[new],  
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a  
 "builder" library type to handle creating arrays by doing things like  
 appending, and then delivering a finished T[] type. This is similar to  
 what std.outbuffer and std.array.Appender do, they just need a bit of  
 refining.

 The .length property of T[] would then become an rvalue only, not an  
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and  
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Well, I personally don't feel much of a need for T[new], and I agree T[] needs to be "fixed" the way you describe (i.e. make a shrink-only range out of it). But I believe a change like this coupled with a demise of T[new] would be way too restricting. Given a T[] array; What type would result of these operations be? 1) auto x = array.dup; 2) auto y = array ~ array; T[] is a view into someone else's container. But when you create a new array (by dup'ing or concatenating), you get a fresh copy. It points to a beginning of data sequence, and there is nothing wrong to expand it. I believe it should be T[new]. A reference type, yes, a type that would allow appending and would render Appender unnecessary.
Oct 18 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Mon, 19 Oct 2009 01:05:39 +0400, Walter Bright 
 <newshound1 digitalmars.com> wrote:
 
 The purpose of T[new] was to solve the problems T[] had with passing 
 T[] to a function and then the function resizes the T[]. What happens 
 with the original?

 The solution we came up with was to create a third array type, T[new], 
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.

 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Well, I personally don't feel much of a need for T[new], and I agree T[] needs to be "fixed" the way you describe (i.e. make a shrink-only range out of it). But I believe a change like this coupled with a demise of T[new] would be way too restricting. Given a T[] array; What type would result of these operations be? 1) auto x = array.dup; 2) auto y = array ~ array; T[] is a view into someone else's container. But when you create a new array (by dup'ing or concatenating), you get a fresh copy. It points to a beginning of data sequence, and there is nothing wrong to expand it. I believe it should be T[new]. A reference type, yes, a type that would allow appending and would render Appender unnecessary.
That was the exact plan. I even have half a chapter written about it with nice figures and all, that I can make available. The problem was, it sucked. Returning a distinct type from .dup and ~ makes slices not closed over these operations, a source of complication, confusion, and bloating. Andrei
Oct 18 2009
parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 01:55:28 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Mon, 19 Oct 2009 01:05:39 +0400, Walter Bright  
 <newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing  
 T[] to a function and then the function resizes the T[]. What happens  
 with the original?

 The solution we came up with was to create a third array type, T[new],  
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a  
 "builder" library type to handle creating arrays by doing things like  
 appending, and then delivering a finished T[] type. This is similar to  
 what std.outbuffer and std.array.Appender do, they just need a bit of  
 refining.

 The .length property of T[] would then become an rvalue only, not an  
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and  
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Well, I personally don't feel much of a need for T[new], and I agree T[] needs to be "fixed" the way you describe (i.e. make a shrink-only range out of it). But I believe a change like this coupled with a demise of T[new] would be way too restricting. Given a T[] array; What type would result of these operations be? 1) auto x = array.dup; 2) auto y = array ~ array; T[] is a view into someone else's container. But when you create a new array (by dup'ing or concatenating), you get a fresh copy. It points to a beginning of data sequence, and there is nothing wrong to expand it. I believe it should be T[new]. A reference type, yes, a type that would allow appending and would render Appender unnecessary.
That was the exact plan. I even have half a chapter written about it with nice figures and all, that I can make available. The problem was, it sucked. Returning a distinct type from .dup and ~ makes slices not closed over these operations, a source of complication, confusion, and bloating. Andrei
What's problem with implicit cast of T[new] to T[]? T[new] container = [1, 2, 3].dup; T[] range1 = container; // implicit container[] call via alias this T[] range2 = [1, 2, 3].dup; // same here An other option would be to declare // duplicates a range T[] rdup(T[] range) { return range.dup[]; } but it's less likely to happen.
Oct 18 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
How would this remove this corner case?
 What do you think?
Whether T[new] is bultin (mostly implemented in the runtime), or a library type (implemented in Phobos) doesn't make much a difference; it's pretty much the same, isn't it? I don't understand what the exact reasons for this decision. If anything, the array type should be available without additional Phobos imports (i.e. it should be declared in object.d). Ease of access and ease of use is the key thing here.
Oct 18 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
Oct 18 2009
parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
Oct 18 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Andrei Alexandrescu Wrote:
 
 grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible,
 and remove some awkward corner cases like the inability to say
 a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
Oct 18 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Jason House wrote:
 Andrei Alexandrescu Wrote:

 grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible,
 and remove some awkward corner cases like the inability to say
 a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
So basically, ArrayBuilder would be like T[new], except: 1. It would be a plain old library type, and the core language would know nothing about it. 2. T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new]. Is this basically correct?
Oct 18 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Jason House wrote:
 Andrei Alexandrescu Wrote:

 grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible,
 and remove some awkward corner cases like the inability to say
 a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
So basically, ArrayBuilder would be like T[new], except: 1. It would be a plain old library type, and the core language would know nothing about it. 2. T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new]. Is this basically correct?
Yes, that's the plan. I hope you're not setting me up or something :o). Andrei
Oct 18 2009
next sibling parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Jason House wrote:
 Andrei Alexandrescu Wrote:

 grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible,
 and remove some awkward corner cases like the inability to say
 a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
So basically, ArrayBuilder would be like T[new], except: 1. It would be a plain old library type, and the core language would know nothing about it. 2. T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new]. Is this basically correct?
Yes, that's the plan. I hope you're not setting me up or something :o). Andrei
I was ;) Well, only sort of. I'm undecided if array-like syntactic sugar makes sense for array builder. Afterall, AA's fit into a similar category. I did notice you wanted to make array builder a struct which implies it won't be used in the same manner as T[new]
Oct 18 2009
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Jason House wrote:
 Andrei Alexandrescu Wrote:

 grauzone wrote:
 Walter Bright wrote:
 We both feel that this would simplify D, make it more flexible,
 and remove some awkward corner cases like the inability to say
 a.length++.
How would this remove this corner case?
Can't increment length if it's read-only. Andrei
removing syntactic sugar doesn't really remove corner cases. T[new] is being replaced with an array builder. What usage semantics are given up with an array builder? How will implicit conversions be handled? (alias this, etc...). How will the Phobos array buiilder support user specified lengths (for efficiency)?
I think ArrayBuilder could a logic similar to Appender for "~=" and otherwise offer regular array primitives. I don't think implicit conversion to T[] is an essential feature, but it can be done. Andrei
So basically, ArrayBuilder would be like T[new], except: 1. It would be a plain old library type, and the core language would know nothing about it. 2. T[].dup, T[] ~ T[], new T[3], etc. would return T[], not ArrayBuilder/T[new]. Is this basically correct?
Yes, that's the plan. I hope you're not setting me up or something :o). Andrei
Given the issues surrounding T[new] and how ugly some cases of the implicit conversion can get, you've convinced me that this may be a good solution as long as: 1. ArrayBuilder implements full array semantics and is usable as a "real" array as you suggest. I wasn't aware that this was what was being proposed, as ArrayBuilder implied to me that it's only good for building arrays as opposed to being a full-fledged array library type. Ideally, for convenience it should be implicitly convertible to T[]. 2. The semantics are such that bug 2093 (http://d.puremagic.com/issues/show_bug.cgi?id=2093) is fixed, while still allowing appending to arrays of immutables. This would probably require some casting under the hood, but the standard lib. is only one step above the core language, so it's acceptable. The semantics of the length property would probably have to be reference semantics, any block of array memory would have to be owned exclusively by one ArrayBuilder, etc. just like with T[new].
Oct 18 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 The purpose of T[new] was to solve the problems T[] had with passing T[]
 to a function and then the function resizes the T[]. What happens with
 the original?
 The solution we came up with was to create a third array type, T[new],
 which was a reference type.
 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to
 what std.outbuffer and std.array.Appender do, they just need a bit of
 refining.
 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~= would no longer be allowed for T[].
 We both feel that this would simplify D, make it more flexible, and
 remove some awkward corner cases like the inability to say a.length++.
 What do you think?
This is ridiculous. The status quo works well most of the time and just has a few really ugly corner cases. As long as you're only appending to one array at a time, appending isn't even that slow anymore now that bug 2900 (http://d.puremagic.com/issues/show_bug.cgi?id=2900) is fixed. I frankly think it's absurd to make working with arrays an order of magnitude harder and less elegant in the 90+% of cases where they work fine just to fix a few corner case bugs. Don't get me wrong, the corner case bugs should be fixed because they're pretty nasty safety issues. They just shouldn't be fixed in a way that makes arrays substantially harder to use, or even syntactically uglier, in the cases where they already work well. As good a programmer as Andrei is, I'm sure whatever he comes up with will be much less syntactially pleasing and easy to use than something the core language understands. Here's my proposal for how T[new] should work: 1. It should be a reference type to be consistent with slices. Yes, slices are kind of a hybrid, but they're more semantically similar to reference types than value types. If you can't modify the length of a slice anymore, then for all practical purposes it will be a reference type. 2. A T[new] should support all the same operations as a T[] with semantics as similar as common sense will allow, including indexing, ~, .dup, .idup, slice assign, etc. Basically, it should have, to the greatest degree possible without defeating the purpose, the same compile time interface. 3. A T[new] should be implicitly convertible to a slice. For example: auto foo = someFunctionThatReturnsTnew(); // foo is a T[new]. T[] bar = someFunctionThatReturnsTnew(); // Works. bar is a T[]. The T[new] went into oblivion. This solves the problem of slices not being closed over .dup and ~. 4. It should be guaranteed that no block of memory is ever referenced by more than one T[new] instance. This is needed to guarantee safety when appending to immutable arrays, etc. 5. Assigning a T[new] to another T[new] should be by reference, just like assigning a class instance to another class instance. Assigning a T[] to a T[new] should duplicate the memory block referenced by the T[] because this is probably the only way to guarantee (4). 6. Since T[new] guarantees unique access to a memory block, it should have an assumeUnique() method that returns an immutable slice and sets the T[new]'s reference to the memory block to null. This solves the problem of building immutable arrays without the performance penalty of not being able to pre-allocate or the unsafeness of having to cowboy cast it to immutable. 7. As long as the GC is conservative, there absolutely *must* be a method of manually freeing the memory block referenced by a T[new] provided that the GC supports this operation, though it doesn't have to be particularly pretty. In general, since D is a systems language, T[new] should not be too opaque. A good way to do this might be to make all of the fields of the T[new] public but undocumented. If you *really* want to mess with it, you'll read the source code and figure it out. 8. The first call to opSlice on a T[new] should set a flag that indicates that there may be multiple pointers to the underlying memory block. Before that flag is set, appends to a T[new] should result in calls to GC.free() to free the old block whenever it needs to be expanded (since we can guarantee that we own it exclusively). This will help deal with false pointer issues, since D's GC looks like it will remain conservative for the foreseeable future.
Oct 18 2009
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from dsimcha (dsimcha yahoo.com)'s article
 8.  The first call to opSlice on a T[new] should set a flag that indicates that
 there may be multiple pointers to the underlying memory block.  Before that
flag
 is set, appends to a T[new] should result in calls to GC.free() to free the old
 block whenever it needs to be expanded (since we can guarantee that we own it
 exclusively).  This will help deal with false pointer issues, since D's GC
looks
 like it will remain conservative for the foreseeable future.
Scratch this one. It breaks on the following: T[new] stuff = somethingThatReturnsTnew(); T* ptrToStuffMemory = &(stuff[8]); stuff ~= otherStuff; // May delete memory referred to by ptrToStuffMemory.
Oct 18 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 Here's my proposal for how T[new] should work:
 
 1.  It should be a reference type to be consistent with slices.  Yes, slices
are
 kind of a hybrid, but they're more semantically similar to reference types than
 value types.  If you can't modify the length of a slice anymore, then for all
 practical purposes it will be a reference type.
Check.
 2.  A T[new] should support all the same operations as a T[] with semantics as
 similar as common sense will allow, including indexing, ~, .dup, .idup, slice
 assign, etc.  Basically, it should have, to the greatest degree possible
without
 defeating the purpose, the same compile time interface.
Check.
 3.  A T[new] should be implicitly convertible to a slice.  For example:
 
 auto foo = someFunctionThatReturnsTnew();
 // foo is a T[new].
 T[] bar = someFunctionThatReturnsTnew();
 // Works.  bar is a T[].  The T[new] went into oblivion.
 
 This solves the problem of slices not being closed over .dup and ~.
Check.
 4.  It should be guaranteed that no block of memory is ever referenced by more
 than one T[new] instance.  This is needed to guarantee safety when appending to
 immutable arrays, etc.
That doesn't go with reference semantics. Uncheck.
 5.  Assigning a T[new] to another T[new] should be by reference, just like
 assigning a class instance to another class instance.
Check. (BTW contradicts 4)
 Assigning a T[] to a T[new]
 should duplicate the memory block referenced by the T[] because this is
probably
 the only way to guarantee (4).
No check, but could have been done.
 6.  Since T[new] guarantees unique access to a memory block, it should have an
 assumeUnique() method that returns an immutable slice and sets the T[new]'s
 reference to the memory block to null.  This solves the problem of building
 immutable arrays without the performance penalty of not being able to
pre-allocate
 or the unsafeness of having to cowboy cast it to immutable.
Uncheck.
 7.  As long as the GC is conservative, there absolutely *must* be a method of
 manually freeing the memory block referenced by a T[new] provided that the GC
 supports this operation, though it doesn't have to be particularly pretty.  In
 general, since D is a systems language, T[new] should not be too opaque.  A
good
 way to do this might be to make all of the fields of the T[new] public but
 undocumented.  If you *really* want to mess with it, you'll read the source
code
 and figure it out.
Check while delete still exists. Please use malloc for that stuff.
 8.  The first call to opSlice on a T[new] should set a flag that indicates that
 there may be multiple pointers to the underlying memory block.  Before that
flag
 is set, appends to a T[new] should result in calls to GC.free() to free the old
 block whenever it needs to be expanded (since we can guarantee that we own it
 exclusively).  This will help deal with false pointer issues, since D's GC
looks
 like it will remain conservative for the foreseeable future.
Uncheck. So now: every place I've said "check" means there was implementation and book-quality illustrated documentation that Walter and I have done with the sweat of our brow. At the end we looked at the result and concluded we should throw away all that work. Andrei
Oct 18 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 3.  A T[new] should be implicitly convertible to a slice.  For example:

 auto foo = someFunctionThatReturnsTnew();
 // foo is a T[new].
 T[] bar = someFunctionThatReturnsTnew();
 // Works.  bar is a T[].  The T[new] went into oblivion.

 This solves the problem of slices not being closed over .dup and ~.
Check.
So then why is slices not being closed over .dup, ~, etc. still a problem? With implicit conversion, they for all practical purposes are.
 4.  It should be guaranteed that no block of memory is ever referenced by more
 than one T[new] instance.  This is needed to guarantee safety when appending to
 immutable arrays, etc.
That doesn't go with reference semantics. Uncheck.
 5.  Assigning a T[new] to another T[new] should be by reference, just like
 assigning a class instance to another class instance.
Check. (BTW contradicts 4)
There was a slight misunderstanding here. When I say an instance, I mean one T[new] heap object == one instance, no matter how many pointers/references to this instance you have. The assumption is that T[new] objects live entirely on the heap and have semantics similar to classes. For example: uint[new] foo = new uint[100]; uint[new] bar = foo; // Really just a pointer assignment. Now, both foo and bar point to the same uint[new] instance. If anything is modified via foo, it is seen when viewed from bar and vice-versa. To clarify, no *main block of memory that holds the data in the array* is ever referenced by more than one T[new] *heap object*.
 Assigning a T[] to a T[new]
 should duplicate the memory block referenced by the T[] because this is
probably
 the only way to guarantee (4).
No check, but could have been done.
Actually, this can even be done by COW. Initially, assigning a T[] to a T[new] could just be a pointer assignment and setting a flag, as long as a copy of the data is made when you try to increase the length of the T[new] or append to it. Basically, as long as you use the T[new] as if it were a T[], no copying needs to be done.
 6.  Since T[new] guarantees unique access to a memory block, it should have an
 assumeUnique() method that returns an immutable slice and sets the T[new]'s
 reference to the memory block to null.  This solves the problem of building
 immutable arrays without the performance penalty of not being able to
pre-allocate
 or the unsafeness of having to cowboy cast it to immutable.
Uncheck.
Now that I've explained my uniqueness thing better, does this sound feasible?
 7.  As long as the GC is conservative, there absolutely *must* be a method of
 manually freeing the memory block referenced by a T[new] provided that the GC
 supports this operation, though it doesn't have to be particularly pretty.  In
 general, since D is a systems language, T[new] should not be too opaque.  A
good
 way to do this might be to make all of the fields of the T[new] public but
 undocumented.  If you *really* want to mess with it, you'll read the source
code
 and figure it out.
Check while delete still exists. Please use malloc for that stuff.
As long as I can get at the pointer to the memory block held by T[new] I can pass it to GC.free. All that would really be needed is a .ptr property that does something like what .ptr does for T[]s.
 So now: every place I've said "check" means there was implementation and
 book-quality illustrated documentation that Walter and I have done with
 the sweat of our brow. At the end we looked at the result and concluded
 we should throw away all that work.
 Andrei
This begs the question: Why? Walter's post on the subject was rather brief and I can't understand for the life of me why you guys would throw away such an elegant solution. Given that we already agree that a T[new] can be implicitly cast to a T[], the lack of closure under ~, .dup, etc. seems like a non-issue. When I was a beginner, before I got into templates, a major attraction to D was that it was a fast, compiled language where basic things like arrays (mostly) just worked. To take away all the syntactic sugar for appending and lengthening from arrays and push this into a separate library type that doesn't feel like a first class object or (I assume) even support most array semantics would be a massive, massive kludge no matter how well-implemented that library type was.
Oct 18 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 3.  A T[new] should be implicitly convertible to a slice.  For example:

 auto foo = someFunctionThatReturnsTnew();
 // foo is a T[new].
 T[] bar = someFunctionThatReturnsTnew();
 // Works.  bar is a T[].  The T[new] went into oblivion.

 This solves the problem of slices not being closed over .dup and ~.
Check.
So then why is slices not being closed over .dup, ~, etc. still a problem? With implicit conversion, they for all practical purposes are.
The problems are with auto and template argument deduction.
 4.  It should be guaranteed that no block of memory is ever referenced by more
 than one T[new] instance.  This is needed to guarantee safety when appending to
 immutable arrays, etc.
That doesn't go with reference semantics. Uncheck.
 5.  Assigning a T[new] to another T[new] should be by reference, just like
 assigning a class instance to another class instance.
Check. (BTW contradicts 4)
There was a slight misunderstanding here. When I say an instance, I mean one T[new] heap object == one instance, no matter how many pointers/references to this instance you have. The assumption is that T[new] objects live entirely on the heap and have semantics similar to classes. For example: uint[new] foo = new uint[100]; uint[new] bar = foo; // Really just a pointer assignment. Now, both foo and bar point to the same uint[new] instance. If anything is modified via foo, it is seen when viewed from bar and vice-versa. To clarify, no *main block of memory that holds the data in the array* is ever referenced by more than one T[new] *heap object*.
Oh yah. Check that sucker.
 Assigning a T[] to a T[new]
 should duplicate the memory block referenced by the T[] because this is
probably
 the only way to guarantee (4).
No check, but could have been done.
Actually, this can even be done by COW. Initially, assigning a T[] to a T[new] could just be a pointer assignment and setting a flag, as long as a copy of the data is made when you try to increase the length of the T[new] or append to it. Basically, as long as you use the T[new] as if it were a T[], no copying needs to be done.
 6.  Since T[new] guarantees unique access to a memory block, it should have an
 assumeUnique() method that returns an immutable slice and sets the T[new]'s
 reference to the memory block to null.  This solves the problem of building
 immutable arrays without the performance penalty of not being able to
pre-allocate
 or the unsafeness of having to cowboy cast it to immutable.
Uncheck.
Now that I've explained my uniqueness thing better, does this sound feasible?
Nah, there are still problems with Ts that themselves are elaborate types containing references, aliasing etc. assumeUnique makes a rather strong claim.
 7.  As long as the GC is conservative, there absolutely *must* be a method of
 manually freeing the memory block referenced by a T[new] provided that the GC
 supports this operation, though it doesn't have to be particularly pretty.  In
 general, since D is a systems language, T[new] should not be too opaque.  A
good
 way to do this might be to make all of the fields of the T[new] public but
 undocumented.  If you *really* want to mess with it, you'll read the source
code
 and figure it out.
Check while delete still exists. Please use malloc for that stuff.
As long as I can get at the pointer to the memory block held by T[new] I can pass it to GC.free. All that would really be needed is a .ptr property that does something like what .ptr does for T[]s.
Alrighty. Check that.
 So now: every place I've said "check" means there was implementation and
 book-quality illustrated documentation that Walter and I have done with
 the sweat of our brow. At the end we looked at the result and concluded
 we should throw away all that work.
 Andrei
This begs the question: Why? Walter's post on the subject was rather brief and I can't understand for the life of me why you guys would throw away such an elegant solution.
Here's what I wrote to Walter: ==================== I'm going to suggest something terrible - let's get rid of T[new]. I know it's difficult to throw away work you've already done, but really things with T[new] start to look like a Pyrrhic victory. Here are some issues: * The abstraction doesn't seem to come off as crisp and clean as we both wanted; * There are efficiency issues, such as the two allocations that you valiantly tried to eliminate in a subset of cases; * Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it); * Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with. Instead of all T[new] stuff, I suggest the following: 1. We stay with T[] and we define a struct ArrayBuilder that replaces T[new] with a much more clear name and charter. Phobos already has Appender which works very well. We can beef that up to allow array-like primitives. 2. Assigning to a slice's .length allocates a new slice if growth is needed. 3. Disallow ~= for slices. ArrayBuilder will define it. 4. That's it. Java got away with a similar approach using StringBuilder: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html Scala has something very similar called ArrayBuffer: http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx So it looks like many programmers coming from other languages will already be familiar with the idea that you use a "builder" to grow an array, and then you use a non-growable array. One thing that Appender has and is really cool is that it can grow an already-existing slice. So you can grow a slice, play with it for a while, and then grow it again at low cost. I don't think the other languages allow that. I understand how you must feel about having implemented T[new] and all, but really please please try to detach for a minute and think back. Does what we've got now with T[new] make D a much better place? Between the increase of the language, the difficulty to explain the minute subtleties, and the annoying corner cases and oddities, I think it might be good to reconsider. ================
 Given that we already agree that a T[new] can be implicitly cast to a
 T[], the lack of closure under ~, .dup, etc. seems like a non-issue.  When I
was a
 beginner, before I got into templates, a major attraction to D was that it was
a
 fast, compiled language where basic things like arrays (mostly) just worked. 
To
 take away all the syntactic sugar for appending and lengthening from arrays and
 push this into a separate library type that doesn't feel like a first class
object
 or (I assume) even support most array semantics would be a massive, massive
kludge
 no matter how well-implemented that library type was.
I think the language will be in considerably better shape without T[new] than with it. I understand this is a subjective point. You can argue against on virtually every one of my objections. Andrei
Oct 18 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 3.  A T[new] should be implicitly convertible to a slice.  For example:

 auto foo = someFunctionThatReturnsTnew();
 // foo is a T[new].
 T[] bar = someFunctionThatReturnsTnew();
 // Works.  bar is a T[].  The T[new] went into oblivion.

 This solves the problem of slices not being closed over .dup and ~.
Check.
So then why is slices not being closed over .dup, ~, etc. still a problem? With implicit conversion, they for all practical purposes are.
The problems are with auto and template argument deduction.
Ok, now I can see how this would be a legitimate problem, but only in a few corner cases: void doStuff(T)(T someArrayLikeObject) { someArrayLikeObject = someArrayLikeObject[0..$ - 1]; } int[] foo = someFunction(); int[] bar = someFunction(); auto baz = foo ~ bar; // baz is an int[new]. This doesn't usually matter, since you can use it // just like an int[] and if you care about the small performance difference, // you should be more careful. doStuff(baz); // Are the effects of doStuff() visible in this scope? // Depends if baz is an int[] or an int[new]. However, this is really, really, *really* a corner case. Any reasonable template would either slice the array to make sure it owns the slice information or pass by reference to make sure the change is propagated, so that there is no ambiguity.
 Here's what I wrote to Walter:
 ====================
 I'm going to suggest something terrible - let's get rid of T[new]. I
 know it's difficult to throw away work you've already done, but really
 things with T[new] start to look like a Pyrrhic victory. Here are some
 issues:
 * The abstraction doesn't seem to come off as crisp and clean as we both
 wanted;
 * There are efficiency issues, such as the two allocations that you
 valiantly tried to eliminate in a subset of cases;
Once you've opened the can of worms of having to perform allocations, one versus two allocations isn't very important.
 * Explaining two very similar but subtly different types to newcomers is
 excruciatingly difficult (I'll send you a draft of the chapter - it
 looks like a burn victim who didn't make it);
This is admittedly a legitimate concern. However, you can get off the ground in D, or any language for that matter, without being a full-fledged language lawyer. I frankly don't think it's important for beginners to understand every subtlety and corner case. Heck, I've been using D for a while and have done some non-trivial projects in it and there are definitely dark corners that I don't understand. (Complex numbers come to mind.) I just don't care because I figure I'll learn them if/when I need to know them.
 * Furthermore, explaining people when to use one vs. the other is much
 more difficult than it seems. On the surface, it goes like this: "If you
 need to append stuff, use T[new]. If not, use T[]." Reality is much more
 subtle. For one thing, T[new] does not allow contraction from the left,
 whereas T[] does. That puts T[] at an advantage. So if you want to
 append stuff and also contract from the left, there's nothing our
 abstractions can help you with.
Why can't a T[new] contract from the left? As far as I can tell, you could do something like: struct TNew(T) { typeof(this) opAssign(T[] slice) { if(slice.ptr >= this.ptr && slice.ptr < this.ptr + capacity) { // Then we own this block of memory and can assign // a slice by reference. // Adjust effective capacity. capacity -= cast(size_t) (slice.ptr - this.ptr); length = slice.length; this.ptr = slice.ptr; } else { // Assign the slice by copying. } } // Other stuff. } You would then contract from the left the same way as for a slice: int[new] foo = new int[5]; foo = foo[1..$]; It would simply require a few integer/pointer comparisons and still be reasonably efficient.
 Instead of all T[new] stuff, I suggest the following:
 1. We stay with T[] and we define a struct ArrayBuilder that replaces
 T[new] with a much more clear name and charter. Phobos already has
 Appender which works very well. We can beef that up to allow array-like
 primitives.
 2. Assigning to a slice's .length allocates a new slice if growth is needed.
 3. Disallow ~= for slices. ArrayBuilder will define it.
 4. That's it.
 Java got away with a similar approach using StringBuilder:
 http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
 Scala has something very similar called ArrayBuffer:
 http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html

 http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx
 So it looks like many programmers coming from other languages will
 already be familiar with the idea that you use a "builder" to grow an
 array, and then you use a non-growable array. One thing that Appender
 has and is really cool is that it can grow an already-existing slice. So
 you can grow a slice, play with it for a while, and then grow it again
 at low cost. I don't think the other languages allow that.
 I understand how you must feel about having implemented T[new] and all,
 but really please please try to detach for a minute and think back. Does
 what we've got now with T[new] make D a much better place? Between the
 increase of the language, the difficulty to explain the minute
 subtleties, and the annoying corner cases and oddities, I think it might
 be good to reconsider.
Yes, Java, etc. get away with this because Java has no pretense of being a make simple things simple kind of language, but one big thing that D has over Java, etc. is that it can be used almost like a scripting language for simple stuff. It is about the only language that scales well all the way from simple scripts to uber-complicated metaprogramming. If we get rid of nice builtin arrays that support everything with clean syntax, we're throwing out making simple things simple in exchange for fixing a few corner cases. IMHO this is a terrible tradeoff. If you and Walter *really* despise T[new], I would prefer to see builtin arrays kept exactly the way they are, bugs and all, and for array builders/appenders/whatever to be improved but still be considered purely a performance hack. The bugs in the current arrays are pretty nasty from a theoretical safety/purity point of view (esp. the one that's a hole in immutability), but are seldom run into in practice.
Oct 18 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 The bugs in the current arrays are pretty nasty from a
 theoretical safety/purity point of view (esp. the one that's a hole in
 immutability), but are seldom run into in practice.
We'd like to get D to the point where guarantees can be offered. This really sets it apart from C++, which offers no guarantees, and safety only happens if the programmer carefully follows convention. We need to do something about those nasty corner cases. Another issue is that D currently has two array types. Adding T[new] makes for 3 array types, and that starts to look like we couldn't figure out what we were doing. And lastly, the reason to make a type a built-in one is in order to offer substantial advantages over a library one. If a library one can do a good job, it is better to push it into the library. Making for a smaller core language makes it easier for people to get into D and feel comfortable with it. I was getting hard pressed to find significant advantages for T[new] over a library type. In particular, I find few uses for resizeable arrays as opposed to slices which are everywhere. When resizeable arrays are needed, they are usually isolated in one function, and when the function returns the resizeable array no longer needs to be resized - it can be converted to a slice type. Another interesting advantage to the library type for T[new] is it solves the problem of creating a safe immutable string. When the library T[new] is converted to an immutable string, the T[new] contents are removed - hence it won't be possible to create a mutable alias of an immutable string without a user-applied cast, which would not be allowed in safe mode.
Oct 18 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 05:16:45 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  
 article
 3.  A T[new] should be implicitly convertible to a slice.  For  
 example:

 auto foo = someFunctionThatReturnsTnew();
 // foo is a T[new].
 T[] bar = someFunctionThatReturnsTnew();
 // Works.  bar is a T[].  The T[new] went into oblivion.

 This solves the problem of slices not being closed over .dup and ~.
Check.
So then why is slices not being closed over .dup, ~, etc. still a problem? With implicit conversion, they for all practical purposes are.
The problems are with auto and template argument deduction.
Could you please bring any example? You talk a lot about corner cases and inconsistencies, but didn't bring a single example.
  This begs the question:  Why?  Walter's post on the subject was rather  
 brief and I
 can't understand for the life of me why you guys would throw away such  
 an elegant
 solution.
Here's what I wrote to Walter: ==================== I'm going to suggest something terrible - let's get rid of T[new]. I know it's difficult to throw away work you've already done, but really things with T[new] start to look like a Pyrrhic victory. Here are some issues: * The abstraction doesn't seem to come off as crisp and clean as we both wanted; * There are efficiency issues, such as the two allocations that you valiantly tried to eliminate in a subset of cases; * Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it); * Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with.
An Array!(T) is really just a different name to a T[new]. You'll have the same problem explaining difference between Array!(T) and T[]. But you are also creating a nightmare for CTFE. Since you can't use "a ~= b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not only it is syntactically less pleasant, this way you render this function useless at run-time - who in the sane mind will use such an inefficient stuff?
 Instead of all T[new] stuff, I suggest the following:

 1. We stay with T[] and we define a struct ArrayBuilder that replaces  
 T[new] with a much more clear name and charter. Phobos already has  
 Appender which works very well. We can beef that up to allow array-like  
 primitives.

 2. Assigning to a slice's .length allocates a new slice if growth is  
 needed.

 3. Disallow ~= for slices. ArrayBuilder will define it.

 4. That's it.

 Java got away with a similar approach using StringBuilder:

 http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html

 Scala has something very similar called ArrayBuffer:

 http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html



 http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx
This is not a fair comparison. StringBuilders only exists because strings are immutable and any operation on them create a new object. They are just a performance optimization, nothing else. StringBuilder is great, but its functionality doesn't really overlap with a general-purpose array.
 So it looks like many programmers coming from other languages will  
 already be familiar with the idea that you use a "builder" to grow an  
 array, and then you use a non-growable array. One thing that Appender  
 has and is really cool is that it can grow an already-existing slice. So  
 you can grow a slice, play with it for a while, and then grow it again  
 at low cost. I don't think the other languages allow that.

 I understand how you must feel about having implemented T[new] and all,  
 but really please please try to detach for a minute and think back. Does  
 what we've got now with T[new] make D a much better place? Between the  
 increase of the language, the difficulty to explain the minute  
 subtleties, and the annoying corner cases and oddities, I think it might  
 be good to reconsider.
 ================

 Given that we already agree that a T[new] can be implicitly cast to a
 T[], the lack of closure under ~, .dup, etc. seems like a non-issue.   
 When I was a
 beginner, before I got into templates, a major attraction to D was that  
 it was a
 fast, compiled language where basic things like arrays (mostly) just  
 worked.  To
 take away all the syntactic sugar for appending and lengthening from  
 arrays and
 push this into a separate library type that doesn't feel like a first  
 class object
 or (I assume) even support most array semantics would be a massive,  
 massive kludge
 no matter how well-implemented that library type was.
I think the language will be in considerably better shape without T[new] than with it. I understand this is a subjective point. You can argue against on virtually every one of my objections.
It is a shame that D will have built-in associative arrays and no dynamic arrays. After all, which one of them do you use more frequently?
Oct 19 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
This discussion originated in the T[new] thread, but I think it deserves its own
thread.

== Quote from Denis Koroskin (2korden gmail.com)'s article
 An Array!(T) is really just a different name to a T[new]. You'll have the
 same problem explaining difference between Array!(T) and T[].
 But you are also creating a nightmare for CTFE. Since you can't use "a ~=
 b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not
 only it is syntactically less pleasant, this way you render this function
 useless at run-time - who in the sane mind will use such an inefficient
 stuff?
Maybe what we need is a version(ctfe) statement. Stuff inside such a block would be executed only if a function is being compile time evaluated. When code is generated for runtime evaluation the else block would be used. This would allow problems like this to be solved in a well-encapsulated way. Example: uint[] findPrimes(uint maxPrime) { version(ctfe) { uint[] ret; } else { ArrayBuilder!uint ret; } foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } version(ctfe) { ret = ret ~ i; } else { ret ~= i; } } } Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
Oct 19 2009
next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha yahoo.com> wrote:

 This discussion originated in the T[new] thread, but I think it deserves  
 its own
 thread.

 == Quote from Denis Koroskin (2korden gmail.com)'s article
 An Array!(T) is really just a different name to a T[new]. You'll have  
 the
 same problem explaining difference between Array!(T) and T[].
 But you are also creating a nightmare for CTFE. Since you can't use "a  
 ~=
 b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates.  
 Not
 only it is syntactically less pleasant, this way you render this  
 function
 useless at run-time - who in the sane mind will use such an inefficient
 stuff?
Maybe what we need is a version(ctfe) statement. Stuff inside such a block would be executed only if a function is being compile time evaluated. When code is generated for runtime evaluation the else block would be used. This would allow problems like this to be solved in a well-encapsulated way. Example: uint[] findPrimes(uint maxPrime) { version(ctfe) { uint[] ret; } else { ArrayBuilder!uint ret; } foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } version(ctfe) { ret = ret ~ i; } else { ret ~= i; } } } Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
It was suggested before and IIRC Walter said it is next to impossible to implement. What you suggest is almost the same as writing 2 different functions. The killer feature of CTFE is that you write it just once and use both at compile and run-time without modifications. I'd like to write it as immutable(uint)[] findPrimes(uint maxPrime) { ArrayOfInt ret; foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } ret ~= i; } return assumeUnique(ret[]); } ArrayOfInt may be an alias to Array!(int), int[] or int[new], or whatever else. All I care is it should be valid and efficient for both run- and compile-time cases.
Oct 19 2009
parent reply Don <nospam nospam.com> writes:
Denis Koroskin wrote:
 On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha yahoo.com> wrote:
 
 This discussion originated in the T[new] thread, but I think it 
 deserves its own
 thread.

 == Quote from Denis Koroskin (2korden gmail.com)'s article
 An Array!(T) is really just a different name to a T[new]. You'll have 
 the
 same problem explaining difference between Array!(T) and T[].
 But you are also creating a nightmare for CTFE. Since you can't use 
 "a ~=
 b;" anymore, you'll have to use "a = a ~ b;" which *always* 
 allocates. Not
 only it is syntactically less pleasant, this way you render this 
 function
 useless at run-time - who in the sane mind will use such an inefficient
 stuff?
Maybe what we need is a version(ctfe) statement. Stuff inside such a block would be executed only if a function is being compile time evaluated. When code is generated for runtime evaluation the else block would be used. This would allow problems like this to be solved in a well-encapsulated way. Example: uint[] findPrimes(uint maxPrime) { version(ctfe) { uint[] ret; } else { ArrayBuilder!uint ret; } foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } version(ctfe) { ret = ret ~ i; } else { ret ~= i; } } } Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
It was suggested before and IIRC Walter said it is next to impossible to implement.
I had a bit of an attempt at it. version(ctfe) seems to be nearly impossible. But I'm almost certain I could get a magic bool __ctfe to work: if (__ctfe) { ... // only contains D code } else { asm { .... } } It would only be accessible inside unsafe modules (which are the only place you'd need it).
 What you suggest is almost the same as writing 2 different functions. 
 The killer feature of CTFE is that you write it just once and use both 
 at compile and run-time without modifications.
Yes, but there are problems. A few low-level, high-speed functions can't be used in CTFE because they use asm or nasty unions. eg, std.math.poly()
Oct 19 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Don (nospam nospam.com)'s article
 Denis Koroskin wrote:
 On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha yahoo.com> wrote:

 This discussion originated in the T[new] thread, but I think it
 deserves its own
 thread.

 == Quote from Denis Koroskin (2korden gmail.com)'s article
 An Array!(T) is really just a different name to a T[new]. You'll have
 the
 same problem explaining difference between Array!(T) and T[].
 But you are also creating a nightmare for CTFE. Since you can't use
 "a ~=
 b;" anymore, you'll have to use "a = a ~ b;" which *always*
 allocates. Not
 only it is syntactically less pleasant, this way you render this
 function
 useless at run-time - who in the sane mind will use such an inefficient
 stuff?
Maybe what we need is a version(ctfe) statement. Stuff inside such a block would be executed only if a function is being compile time evaluated. When code is generated for runtime evaluation the else block would be used. This would allow problems like this to be solved in a well-encapsulated way. Example: uint[] findPrimes(uint maxPrime) { version(ctfe) { uint[] ret; } else { ArrayBuilder!uint ret; } foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } version(ctfe) { ret = ret ~ i; } else { ret ~= i; } } } Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
It was suggested before and IIRC Walter said it is next to impossible to implement.
I had a bit of an attempt at it. version(ctfe) seems to be nearly impossible. But I'm almost certain I could get a magic bool __ctfe to work: if (__ctfe) { ... // only contains D code } else { asm { .... } } It would only be accessible inside unsafe modules (which are the only place you'd need it).
 What you suggest is almost the same as writing 2 different functions.
 The killer feature of CTFE is that you write it just once and use both
 at compile and run-time without modifications.
Yes, but there are problems. A few low-level, high-speed functions can't be used in CTFE because they use asm or nasty unions. eg, std.math.poly()
So this magic bool would be like a regular if statement in that it would create a new scope? I guess if you could do it such that it worked with static if, you'd be able to get it to work with version.
Oct 19 2009
parent Don <nospam nospam.com> writes:
dsimcha wrote:
 == Quote from Don (nospam nospam.com)'s article
 Denis Koroskin wrote:
 On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha yahoo.com> wrote:

 This discussion originated in the T[new] thread, but I think it
 deserves its own
 thread.

 == Quote from Denis Koroskin (2korden gmail.com)'s article
 An Array!(T) is really just a different name to a T[new]. You'll have
 the
 same problem explaining difference between Array!(T) and T[].
 But you are also creating a nightmare for CTFE. Since you can't use
 "a ~=
 b;" anymore, you'll have to use "a = a ~ b;" which *always*
 allocates. Not
 only it is syntactically less pleasant, this way you render this
 function
 useless at run-time - who in the sane mind will use such an inefficient
 stuff?
Maybe what we need is a version(ctfe) statement. Stuff inside such a block would be executed only if a function is being compile time evaluated. When code is generated for runtime evaluation the else block would be used. This would allow problems like this to be solved in a well-encapsulated way. Example: uint[] findPrimes(uint maxPrime) { version(ctfe) { uint[] ret; } else { ArrayBuilder!uint ret; } foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } version(ctfe) { ret = ret ~ i; } else { ret ~= i; } } } Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
It was suggested before and IIRC Walter said it is next to impossible to implement.
I had a bit of an attempt at it. version(ctfe) seems to be nearly impossible. But I'm almost certain I could get a magic bool __ctfe to work: if (__ctfe) { ... // only contains D code } else { asm { .... } } It would only be accessible inside unsafe modules (which are the only place you'd need it).
 What you suggest is almost the same as writing 2 different functions.
 The killer feature of CTFE is that you write it just once and use both
 at compile and run-time without modifications.
Yes, but there are problems. A few low-level, high-speed functions can't be used in CTFE because they use asm or nasty unions. eg, std.math.poly()
So this magic bool would be like a regular if statement in that it would create a new scope? I guess if you could do it such that it worked with static if, you'd be able to get it to work with version.
It'd just be a variable, which the CTFE interpreter would evaluate as 'true', but would be a normal bool variable everywhere else. Similar to the magic variable __dollar. It'd only become false just before being sent to the code generator. Dead code elimination would remove the CTFE part. The important thing is that the AST must be identical for both CTFE and run-time.
Oct 19 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 This discussion originated in the T[new] thread, but I think it deserves its
own
 thread.
 
 == Quote from Denis Koroskin (2korden gmail.com)'s article
 An Array!(T) is really just a different name to a T[new]. You'll have the
 same problem explaining difference between Array!(T) and T[].
 But you are also creating a nightmare for CTFE. Since you can't use "a ~=
 b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not
 only it is syntactically less pleasant, this way you render this function
 useless at run-time - who in the sane mind will use such an inefficient
 stuff?
Maybe what we need is a version(ctfe) statement. Stuff inside such a block would be executed only if a function is being compile time evaluated. When code is generated for runtime evaluation the else block would be used. This would allow problems like this to be solved in a well-encapsulated way. Example: uint[] findPrimes(uint maxPrime) { version(ctfe) { uint[] ret; } else { ArrayBuilder!uint ret; } foreach(i; 0..maxPrime) { if(!isPrime(i)) { continue; } version(ctfe) { ret = ret ~ i; } else { ret ~= i; } } } Given that CTFE will likely never support everything that is supported at runtime, this will likely make it much more useful.
version(ctfe) would be helpful in many other situations (e.g. asm-optimized routines etc.) I thought more about it and realized that ~= can be still implemented for arrays if the GC cooperates. For relatively large chunks of memory, the GC keeps a control block. We could add a member size_t requestedSize that keeps the size that was requested with an allocation/reallocation call. The GC can take initiative in overallocating memory and expose primitives such as requestedSize (how much did the program ask for?) and capacity (how much is really allocated?). With that API, it is possible to implement ~= efficiently. Andrei
Oct 19 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
 For relatively large chunks of memory, the GC keeps a control block.
 We could add a member size_t requestedSize that keeps the size that
 was requested with an allocation/reallocation call. The GC can take
 initiative in overallocating memory and expose primitives such as
 requestedSize (how much did the program ask for?) and capacity (how
 much is really allocated?).  With that API, it is possible to
 implement ~= efficiently.
That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea... Having that information could be used if present though, for example, to avoid scanning chunks of memory that are not used. That could help avoiding false positives (if that chunks of memory are not zeroed already). -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Yo soy Peperino, mártir latino, venid al asado pero traed el vino. -- Peperino Pómoro
Oct 19 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
 For relatively large chunks of memory, the GC keeps a control block.
 We could add a member size_t requestedSize that keeps the size that
 was requested with an allocation/reallocation call. The GC can take
 initiative in overallocating memory and expose primitives such as
 requestedSize (how much did the program ask for?) and capacity (how
 much is really allocated?).  With that API, it is possible to
 implement ~= efficiently.
That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't store enough info. That info could get in the memory control block.
 Having that information could be used if present though, for example, to
 avoid scanning chunks of memory that are not used. That could help
 avoiding false positives (if that chunks of memory are not zeroed
 already).
Good point. Andrei
Oct 19 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
 For relatively large chunks of memory, the GC keeps a control block.
 We could add a member size_t requestedSize that keeps the size that
 was requested with an allocation/reallocation call. The GC can take
 initiative in overallocating memory and expose primitives such as
 requestedSize (how much did the program ask for?) and capacity (how
 much is really allocated?).  With that API, it is possible to
 implement ~= efficiently.
That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't store enough info. That info could get in the memory control block.
This is the part of this thread I'm not understanding: Isn't this info stored already? The only problem is querying it is slow. See GC.sizeOf(), and bug 2900. (2900 is fixed, but it's still slow, and there's still bug 2093.) What are you proposing that's different from the status quo?
Oct 19 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
 For relatively large chunks of memory, the GC keeps a control block.
 We could add a member size_t requestedSize that keeps the size that
 was requested with an allocation/reallocation call. The GC can take
 initiative in overallocating memory and expose primitives such as
 requestedSize (how much did the program ask for?) and capacity (how
 much is really allocated?).  With that API, it is possible to
 implement ~= efficiently.
That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't store enough info. That info could get in the memory control block.
This is the part of this thread I'm not understanding: Isn't this info stored already? The only problem is querying it is slow. See GC.sizeOf(), and bug 2900. (2900 is fixed, but it's still slow, and there's still bug 2093.) What are you proposing that's different from the status quo?
Sorry, I got confused a bit: you have the size in the array and you can get the capacity from the GC, just slowly. Let's sleep on this some more, it may not be impossible to find a growth scheme that works fast. Andrei
Oct 19 2009
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2009-10-19 18:19:34 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
 For relatively large chunks of memory, the GC keeps a control block.
 We could add a member size_t requestedSize that keeps the size that
 was requested with an allocation/reallocation call. The GC can take
 initiative in overallocating memory and expose primitives such as
 requestedSize (how much did the program ask for?) and capacity (how
 much is really allocated?).  With that API, it is possible to
 implement ~= efficiently.
That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't store enough info. That info could get in the memory control block.
This is the part of this thread I'm not understanding: Isn't this info stored already? The only problem is querying it is slow. See GC.sizeOf(), and bug 2900. (2900 is fixed, but it's still slow, and there's still bug 2093.) What are you proposing that's different from the status quo?
Sorry, I got confused a bit: you have the size in the array and you can get the capacity from the GC, just slowly. Let's sleep on this some more, it may not be impossible to find a growth scheme that works fast. Andrei
The problem with the current state, apart slow access to the capacity, is an ownership problem: which of the possibly multiple slices owns the data after the slice? what happens when multiple slices grow? In some way the current state is the optimal solution with slice=array. To improve on this the only way is (I think) to have slice!=array, and this was basically what T[new] was all about (at least in my understanding). Ownership of data is a difficult problem with arrays, and clearly T[new] and slices is more complex than having slice=array, I would be tempted to say that on the whole having T[new] (or even a library vector type, if T[new] does not improve upon what a library type does), might be a good idea. Fawzi
Oct 19 2009
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 19 de octubre a las 10:33 me escribiste:
 Leandro Lucarella wrote:
Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
For relatively large chunks of memory, the GC keeps a control block.
We could add a member size_t requestedSize that keeps the size that
was requested with an allocation/reallocation call. The GC can take
initiative in overallocating memory and expose primitives such as
requestedSize (how much did the program ask for?) and capacity (how
much is really allocated?).  With that API, it is possible to
implement ~= efficiently.
That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't store enough info. That info could get in the memory control block.
The control block is not part of the object's memory, so I don't think it's something you want to manipulate frequently (because of locality issues). This works well for the GC because when you manipute control information often is when you are collecting, so for collection most of the control info should be cached and have good locality. In collections you have the exact opposite scheme, you don't look at the data, only the control info. When you manipulate an array you will be probably touching both. That's why I think you should keep all the array data in one place. Well, that and because I don't think is a good idea to transfer array complexities to the GC. It's just not the place for that. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si por el chancho fuera, se autocomería con chimichurri Worshestershire!
Oct 19 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
 Here's what I wrote to Walter:
 
 ====================
 I'm going to suggest something terrible - let's get rid of T[new]. I
 know it's difficult to throw away work you've already done, but
 really things with T[new] start to look like a Pyrrhic victory. Here
 are some issues:
 
 * The abstraction doesn't seem to come off as crisp and clean as we
 both wanted;
 
 * There are efficiency issues, such as the two allocations that you
 valiantly tried to eliminate in a subset of cases;
 
 * Explaining two very similar but subtly different types to
 newcomers is excruciatingly difficult (I'll send you a draft of the
 chapter - it looks like a burn victim who didn't make it);
 
 * Furthermore, explaining people when to use one vs. the other is
 much more difficult than it seems. On the surface, it goes like
 this: "If you need to append stuff, use T[new]. If not, use T[]."
 Reality is much more subtle. For one thing, T[new] does not allow
 contraction from the left, whereas T[] does. That puts T[] at an
 advantage. So if you want to append stuff and also contract from the
 left, there's nothing our abstractions can help you with.
I think this is getting overcomplicated. I don't see it as complex, I see it like this: 2 types should be provided: array and slice. array is a *real* type, storing and owning memory, it should be something like this (conceptually): class array(T) { size_t length; size_t capacity; T[capacity] elements; } 1) a pure reference type. 2) 1 allocation only (interior pointers are not a problem, the GC have to support them anyways). 3) easily appendable (capacity field). slice should be something like this: struct slice(T) { size_t length; T* ptr; } 1) a pure value type. 2) no allocation at all, *ever*. 3) not appendable at all. 4) You can change both ptr and length, and you can mutate the elements (if not immutable). So a slice is a window to peek a chunk of data. Then you have the syntax. In this discussion, T[new] is array and T[] is slice. I find that syntax very confusing. I think it could be even better to just put those 2 types in object.d (well, do public imports of those 2 types in object.d) and let the people write: array!T a; slice!T s; The array literals should be immutable chunks of memory in the static memory, as ClassInfo.init. The type of [1,2,3] should be, for example, slice!(immutable int), so: auto s = [1,2,3]; should create a slice!(immutable int) with length 3 and ptr pointing to the static memory where the [1,2,3] was stored (this is what happens with strings, right? So no surprises here, they are consistent). slice.dup should return an array. If you want to just copy the length and ptr members, use assignment (it's a value type, remember)? auto s = [1,2,3]; auto t = s; assert(s.length == t.length); assert(s.ptr == t.ptr); assert(&s != &t); Assignment of arrays is just a pointer assignment (is a reference type): auto a = [1,2,3].dup; auto b = a; assert(a is b); array.dup returns another array. array[] returns a slice though (you can allow implicit casting from array to slice but I don't see the point as array[] is short enough and more explicit). slices should not be convertible to arrays (use slice.dup for that). Back to the syntax, I think T[new] is *really* confusing. It tell me nothing about the type, it provides the same information as calling it it wazzzaaap!T for me. I *really* think it would be better to call it array!T. But if we have both types, T[] is not clear anymore that is a slice, again, you can't figure that out. So maybe T[] should be used for arrays, not slices; if you want some syntax sugar I think T[] is more guessable to be an array than a slice, and it would be a little more backward compatible. Maybe slices can be T[..], but I'm not sure is clear enough, again I think we could live with something more explicit like array!T and slice!T. But at least slices should be part of the language, because that should be the type of "array literals" (damn! that didn't sound right =P). I'm missing something? Why this shouldn't work? -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- - Que hacés, ratita? - Espero un ratito...
Oct 19 2009
next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 18:57:45 +0400, Leandro Lucarella <llucax gmail.com>  
wrote:

 Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
 Here's what I wrote to Walter:

 ====================
 I'm going to suggest something terrible - let's get rid of T[new]. I
 know it's difficult to throw away work you've already done, but
 really things with T[new] start to look like a Pyrrhic victory. Here
 are some issues:

 * The abstraction doesn't seem to come off as crisp and clean as we
 both wanted;

 * There are efficiency issues, such as the two allocations that you
 valiantly tried to eliminate in a subset of cases;

 * Explaining two very similar but subtly different types to
 newcomers is excruciatingly difficult (I'll send you a draft of the
 chapter - it looks like a burn victim who didn't make it);

 * Furthermore, explaining people when to use one vs. the other is
 much more difficult than it seems. On the surface, it goes like
 this: "If you need to append stuff, use T[new]. If not, use T[]."
 Reality is much more subtle. For one thing, T[new] does not allow
 contraction from the left, whereas T[] does. That puts T[] at an
 advantage. So if you want to append stuff and also contract from the
 left, there's nothing our abstractions can help you with.
I think this is getting overcomplicated. I don't see it as complex, I see it like this: 2 types should be provided: array and slice. array is a *real* type, storing and owning memory, it should be something like this (conceptually): class array(T) { size_t length; size_t capacity; T[capacity] elements; } 1) a pure reference type. 2) 1 allocation only (interior pointers are not a problem, the GC have to support them anyways). 3) easily appendable (capacity field). slice should be something like this: struct slice(T) { size_t length; T* ptr; } 1) a pure value type. 2) no allocation at all, *ever*. 3) not appendable at all. 4) You can change both ptr and length, and you can mutate the elements (if not immutable). So a slice is a window to peek a chunk of data. Then you have the syntax. In this discussion, T[new] is array and T[] is slice. I find that syntax very confusing. I think it could be even better to just put those 2 types in object.d (well, do public imports of those 2 types in object.d) and let the people write: array!T a; slice!T s; The array literals should be immutable chunks of memory in the static memory, as ClassInfo.init. The type of [1,2,3] should be, for example, slice!(immutable int), so: auto s = [1,2,3]; should create a slice!(immutable int) with length 3 and ptr pointing to the static memory where the [1,2,3] was stored (this is what happens with strings, right? So no surprises here, they are consistent). slice.dup should return an array. If you want to just copy the length and ptr members, use assignment (it's a value type, remember)? auto s = [1,2,3]; auto t = s; assert(s.length == t.length); assert(s.ptr == t.ptr); assert(&s != &t); Assignment of arrays is just a pointer assignment (is a reference type): auto a = [1,2,3].dup; auto b = a; assert(a is b); array.dup returns another array. array[] returns a slice though (you can allow implicit casting from array to slice but I don't see the point as array[] is short enough and more explicit). slices should not be convertible to arrays (use slice.dup for that). Back to the syntax, I think T[new] is *really* confusing. It tell me nothing about the type, it provides the same information as calling it it wazzzaaap!T for me. I *really* think it would be better to call it array!T. But if we have both types, T[] is not clear anymore that is a slice, again, you can't figure that out. So maybe T[] should be used for arrays, not slices; if you want some syntax sugar I think T[] is more guessable to be an array than a slice, and it would be a little more backward compatible. Maybe slices can be T[..], but I'm not sure is clear enough, again I think we could live with something more explicit like array!T and slice!T. But at least slices should be part of the language, because that should be the type of "array literals" (damn! that didn't sound right =P). I'm missing something? Why this shouldn't work?
That's what was initially planned (except some nuances). I also think we need a precedent of mapping built-in language type to a library type, starting with Array!(T) and Range!(T). We could then map V[K] to AArray!(K,V), T? to Nullable!(T) etc. There is one big issue, though: classes and allocations via 'new' don't work with CTFE. I believe this is something that is planned (Walter once said that an entire subset of D called SafeD should work with CTFE), but it is *very* hard to implement, not feasible in a nearest future for sure.
Oct 19 2009
parent Don <nospam nospam.com> writes:
Denis Koroskin wrote:
 On Mon, 19 Oct 2009 18:57:45 +0400, Leandro Lucarella <llucax gmail.com> 
 wrote:
 
 Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
 Here's what I wrote to Walter:

 ====================
 I'm going to suggest something terrible - let's get rid of T[new]. I
 know it's difficult to throw away work you've already done, but
 really things with T[new] start to look like a Pyrrhic victory. Here
 are some issues:

 * The abstraction doesn't seem to come off as crisp and clean as we
 both wanted;

 * There are efficiency issues, such as the two allocations that you
 valiantly tried to eliminate in a subset of cases;

 * Explaining two very similar but subtly different types to
 newcomers is excruciatingly difficult (I'll send you a draft of the
 chapter - it looks like a burn victim who didn't make it);

 * Furthermore, explaining people when to use one vs. the other is
 much more difficult than it seems. On the surface, it goes like
 this: "If you need to append stuff, use T[new]. If not, use T[]."
 Reality is much more subtle. For one thing, T[new] does not allow
 contraction from the left, whereas T[] does. That puts T[] at an
 advantage. So if you want to append stuff and also contract from the
 left, there's nothing our abstractions can help you with.
I think this is getting overcomplicated. I don't see it as complex, I see it like this: 2 types should be provided: array and slice. array is a *real* type, storing and owning memory, it should be something like this (conceptually): class array(T) { size_t length; size_t capacity; T[capacity] elements; } 1) a pure reference type. 2) 1 allocation only (interior pointers are not a problem, the GC have to support them anyways). 3) easily appendable (capacity field). slice should be something like this: struct slice(T) { size_t length; T* ptr; } 1) a pure value type. 2) no allocation at all, *ever*. 3) not appendable at all. 4) You can change both ptr and length, and you can mutate the elements (if not immutable). So a slice is a window to peek a chunk of data. Then you have the syntax. In this discussion, T[new] is array and T[] is slice. I find that syntax very confusing. I think it could be even better to just put those 2 types in object.d (well, do public imports of those 2 types in object.d) and let the people write: array!T a; slice!T s; The array literals should be immutable chunks of memory in the static memory, as ClassInfo.init. The type of [1,2,3] should be, for example, slice!(immutable int), so: auto s = [1,2,3]; should create a slice!(immutable int) with length 3 and ptr pointing to the static memory where the [1,2,3] was stored (this is what happens with strings, right? So no surprises here, they are consistent). slice.dup should return an array. If you want to just copy the length and ptr members, use assignment (it's a value type, remember)? auto s = [1,2,3]; auto t = s; assert(s.length == t.length); assert(s.ptr == t.ptr); assert(&s != &t); Assignment of arrays is just a pointer assignment (is a reference type): auto a = [1,2,3].dup; auto b = a; assert(a is b); array.dup returns another array. array[] returns a slice though (you can allow implicit casting from array to slice but I don't see the point as array[] is short enough and more explicit). slices should not be convertible to arrays (use slice.dup for that). Back to the syntax, I think T[new] is *really* confusing. It tell me nothing about the type, it provides the same information as calling it it wazzzaaap!T for me. I *really* think it would be better to call it array!T. But if we have both types, T[] is not clear anymore that is a slice, again, you can't figure that out. So maybe T[] should be used for arrays, not slices; if you want some syntax sugar I think T[] is more guessable to be an array than a slice, and it would be a little more backward compatible. Maybe slices can be T[..], but I'm not sure is clear enough, again I think we could live with something more explicit like array!T and slice!T. But at least slices should be part of the language, because that should be the type of "array literals" (damn! that didn't sound right =P). I'm missing something? Why this shouldn't work?
That's what was initially planned (except some nuances). I also think we need a precedent of mapping built-in language type to a library type, starting with Array!(T) and Range!(T). We could then map V[K] to AArray!(K,V), T? to Nullable!(T) etc. There is one big issue, though: classes and allocations via 'new' don't work with CTFE. I believe this is something that is planned (Walter once said that an entire subset of D called SafeD should work with CTFE), but it is *very* hard to implement, not feasible in a nearest future for sure.
Don't worry. It's not hugely difficult to implement, though it'll be a while before all the bugs are ironed out. Most of the difficult stuff is already in place. (The CTFE memory bug is the difficult one).
Oct 19 2009
prev sibling next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Leandro Lucarella, el 19 de octubre a las 11:57 me escribiste:
 slice.dup should return an array.
To avoid making the language aware of the array type, slice.dup can be removed and use an array constructor instead: auto a = slice.dup; should be: auto a = array!T(slice); This way, I think the language only have to know about slices. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- He cometido pecados, he hecho el mal, he sido víctima de la envidia, el egoísmo, la ambición, la mentira y la frivolidad, pero siempre he sido un padre argentino que quiere que su hijo triunfe en la vida. -- Ricardo Vaporeso
Oct 19 2009
parent Leandro Lucarella <llucax gmail.com> writes:
Leandro Lucarella, el 19 de octubre a las 12:13 me escribiste:
 Leandro Lucarella, el 19 de octubre a las 11:57 me escribiste:
 slice.dup should return an array.
To avoid making the language aware of the array type, slice.dup can be removed and use an array constructor instead: auto a = slice.dup; should be: auto a = array!T(slice); This way, I think the language only have to know about slices.
Unless you want to have an appendable array type in CTFE. I guess the language should know about array anyways :S (I should read the entire thread before posting) -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si el enemigo se siente abatido, recuérdele que su familia está lejos y que mientras siga la pelea él se estará perdiendo el crecimiento de sus hijos. Si está feliz, recuerdele de la existencia de su familia política, en especial de los cuñados que tienen sexo con sus hermanas. -- Ricardo Vaporeso. De la desmoralización del enemigo.
Oct 19 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
 Here's what I wrote to Walter:

 ====================
 I'm going to suggest something terrible - let's get rid of T[new]. I
 know it's difficult to throw away work you've already done, but
 really things with T[new] start to look like a Pyrrhic victory. Here
 are some issues:

 * The abstraction doesn't seem to come off as crisp and clean as we
 both wanted;

 * There are efficiency issues, such as the two allocations that you
 valiantly tried to eliminate in a subset of cases;

 * Explaining two very similar but subtly different types to
 newcomers is excruciatingly difficult (I'll send you a draft of the
 chapter - it looks like a burn victim who didn't make it);

 * Furthermore, explaining people when to use one vs. the other is
 much more difficult than it seems. On the surface, it goes like
 this: "If you need to append stuff, use T[new]. If not, use T[]."
 Reality is much more subtle. For one thing, T[new] does not allow
 contraction from the left, whereas T[] does. That puts T[] at an
 advantage. So if you want to append stuff and also contract from the
 left, there's nothing our abstractions can help you with.
I think this is getting overcomplicated. I don't see it as complex, I see it like this: 2 types should be provided: array and slice. array is a *real* type, storing and owning memory, it should be something like this (conceptually): class array(T) { size_t length; size_t capacity; T[capacity] elements; }
At the point I introduce arrays I hadn't described classes. One major problem with writing a "The X Programming Language" book is sequencing. It's very easy to explain a complicated feature if the listener knows all others. It is very difficult to introduce them serially.
 1) a pure reference type.
 2) 1 allocation only (interior pointers are not a problem, the GC have to
    support them anyways).
 3) easily appendable (capacity field).
 
 slice should be something like this:
 
 struct slice(T)
 {
 	size_t length;
 	T* ptr;
 }
structs and pointers have also not been introduced yet.
 I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well. Andrei
Oct 19 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 19 de octubre a las 10:18 me escribiste:
 Leandro Lucarella wrote:
Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
Here's what I wrote to Walter:

====================
I'm going to suggest something terrible - let's get rid of T[new]. I
know it's difficult to throw away work you've already done, but
really things with T[new] start to look like a Pyrrhic victory. Here
are some issues:

* The abstraction doesn't seem to come off as crisp and clean as we
both wanted;

* There are efficiency issues, such as the two allocations that you
valiantly tried to eliminate in a subset of cases;

* Explaining two very similar but subtly different types to
newcomers is excruciatingly difficult (I'll send you a draft of the
chapter - it looks like a burn victim who didn't make it);

* Furthermore, explaining people when to use one vs. the other is
much more difficult than it seems. On the surface, it goes like
this: "If you need to append stuff, use T[new]. If not, use T[]."
Reality is much more subtle. For one thing, T[new] does not allow
contraction from the left, whereas T[] does. That puts T[] at an
advantage. So if you want to append stuff and also contract from the
left, there's nothing our abstractions can help you with.
I think this is getting overcomplicated. I don't see it as complex, I see it like this: 2 types should be provided: array and slice. array is a *real* type, storing and owning memory, it should be something like this (conceptually): class array(T) { size_t length; size_t capacity; T[capacity] elements; }
At the point I introduce arrays I hadn't described classes.
You are talking about the book, right? You don't have to! You can explain arrays without talking about classes at all! See the Python Tutorial for example: http://docs.python.org/tutorial/index.html You see all kind of complex data structures (lists, dictionaries, tuples, strings, sets), input and output, and even exceptions! before classes. And I read it, and was a very easy and pleasant reading. Talking about a class here was just to prove my point more easily because I *know* everybody here knows very well about classes semantics. I was not describing it for a book.
 One major problem with writing a "The X Programming Language" book
 is sequencing. It's very easy to explain a complicated feature if
 the listener knows all others. It is very difficult to introduce
 them serially.
I don't think an array would be the case, come on! =)
1) a pure reference type.
2) 1 allocation only (interior pointers are not a problem, the GC have to
   support them anyways).
3) easily appendable (capacity field).

slice should be something like this:

struct slice(T)
{
	size_t length;
	T* ptr;
}
structs and pointers have also not been introduced yet.
I don't know what this have to do with the book. Should be D designed based on how good are you to explain the concepts on the book? I think this is getting ridiculous... seriously. I don't want to be rude, I know you're putting a lot of work in the book and we all appreciate that, but you didn't mention any technical point about what I wrote.
I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find? -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- DESCARTAN BIDET VIRTUAL PORQUE NO LAVA -- Cronista TV
Oct 19 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 10:18 me escribiste:
 2 types should be provided: array and slice.

 array is a *real* type, storing and owning memory, it should be something
 like this (conceptually):

 class array(T)
 {
 	size_t length;
 	size_t capacity;
 	T[capacity] elements;
 }
At the point I introduce arrays I hadn't described classes.
You are talking about the book, right? You don't have to! You can explain arrays without talking about classes at all!
I was just replying to your suggestion.
 One major problem with writing a "The X Programming Language" book
 is sequencing. It's very easy to explain a complicated feature if
 the listener knows all others. It is very difficult to introduce
 them serially.
I don't think an array would be the case, come on! =)
 1) a pure reference type.
 2) 1 allocation only (interior pointers are not a problem, the GC have to
   support them anyways).
 3) easily appendable (capacity field).

 slice should be something like this:

 struct slice(T)
 {
 	size_t length;
 	T* ptr;
 }
structs and pointers have also not been introduced yet.
I don't know what this have to do with the book. Should be D designed based on how good are you to explain the concepts on the book? I think this is getting ridiculous... seriously. I don't want to be rude, I know you're putting a lot of work in the book and we all appreciate that, but you didn't mention any technical point about what I wrote.
 I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find?
I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar). That static decision concerns future appends to the array, which doesn't strike me as something you know from the get-go through future iterations of a design. Use of "auto" messes up things further: a nice function may choose to return T[new] because it just created an array (an implementation detail), but clients may find that unexpected. Andrei
Oct 19 2009
next sibling parent reply Johan Granberg <lijat.meREM OVEgmail.com> writes:
Andrei Alexandrescu wrote:

 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 10:18 me escribiste:
 2 types should be provided: array and slice.

 array is a *real* type, storing and owning memory, it should be
 something like this (conceptually):

 class array(T)
 {
 size_t length;
 size_t capacity;
 T[capacity] elements;
 }
At the point I introduce arrays I hadn't described classes.
You are talking about the book, right? You don't have to! You can explain arrays without talking about classes at all!
I was just replying to your suggestion.
 One major problem with writing a "The X Programming Language" book
 is sequencing. It's very easy to explain a complicated feature if
 the listener knows all others. It is very difficult to introduce
 them serially.
I don't think an array would be the case, come on! =)
 1) a pure reference type.
 2) 1 allocation only (interior pointers are not a problem, the GC have
 to
   support them anyways).
 3) easily appendable (capacity field).

 slice should be something like this:

 struct slice(T)
 {
 size_t length;
 T* ptr;
 }
structs and pointers have also not been introduced yet.
I don't know what this have to do with the book. Should be D designed based on how good are you to explain the concepts on the book? I think this is getting ridiculous... seriously. I don't want to be rude, I know you're putting a lot of work in the book and we all appreciate that, but you didn't mention any technical point about what I wrote.
 I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find?
I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar). That static decision concerns future appends to the array, which doesn't strike me as something you know from the get-go through future iterations of a design. Use of "auto" messes up things further: a nice function may choose to return T[new] because it just created an array (an implementation detail), but clients may find that unexpected. Andrei
I think you are seeing a larger problem than their is. But consider this, who is D a language for, is it for beginers only? advanced users only? or everyone, if it is a language for everyone don't complicate the language for the advanced users by rejecting nice syntax just because it is hard to explain. The beginers already will just pick one and write ineficient code and I don't think one more type will manage to confuse them more (as most beginers I thaught is already maximaly confused).
Oct 19 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Johan Granberg wrote:
 I think you are seeing a larger problem than their is. But consider this,
 who is D a language for, is it for beginers only? advanced users only? or
 everyone, if it is a language for everyone don't complicate the language
 for the advanced users by rejecting nice syntax just because it is hard to
 explain.
It's not the nice syntax that's hard to explain. We all have the same goal - make the language easy to acquire, easy to use, and reasonably efficient. What I can say is that T[new] is not a good step towards achieving those goals. Andrei
Oct 19 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 20:16:50 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I thought I explained that above and in several other posts. I have the  
 feeling this is going in circles, but let me add one more thing. People  
 would want to have a reasonable way of choosing between T[new] and T[].  
 The differences between them are subtle (I have two tables showing the  
 primitives of T[] and T[new], which are very similar). That static  
 decision concerns future appends to the array, which doesn't strike me  
 as something you know from the get-go through future iterations of a  
 design. Use of "auto" messes up things further: a nice function may  
 choose to return T[new] because it just created an array (an  
 implementation detail), but clients may find that unexpected.
Put it simple: T[] is a range, and T[new] is a container. They belong to different leagues. Yes, there is a lot common between them, because T[] supports some subset of operations that T[new] support. We are going in circles because you couldn't convince anyone that we don't need dynamic array type in language core. Not unless library types are usable in CTFE (and have nice syntax).
Oct 19 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Mon, 19 Oct 2009 20:16:50 +0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 I thought I explained that above and in several other posts. I have 
 the feeling this is going in circles, but let me add one more thing. 
 People would want to have a reasonable way of choosing between T[new] 
 and T[]. The differences between them are subtle (I have two tables 
 showing the primitives of T[] and T[new], which are very similar). 
 That static decision concerns future appends to the array, which 
 doesn't strike me as something you know from the get-go through future 
 iterations of a design. Use of "auto" messes up things further: a nice 
 function may choose to return T[new] because it just created an array 
 (an implementation detail), but clients may find that unexpected.
Put it simple: T[] is a range, and T[new] is a container. They belong to different leagues.
Define ranges and define containers.
 Yes, there is a lot common between them, because T[] supports some 
 subset of operations that T[new] support.
No. You see, even you who have a close perspective on the issue have gotten confused. T[new] does not support assignment from range. So T[] and T[new] are not in a subtyping relationship. They support subtly different primitives.
 We are going in circles because you couldn't convince anyone that we 
 don't need dynamic array type in language core. Not unless library types 
 are usable in CTFE (and have nice syntax).
"Anyone" is a bit assuming I think. Andrei
Oct 19 2009
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2009-10-19 19:34:04 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Denis Koroskin wrote:
 Put it simple: T[] is a range, and T[new] is a container. They belong 
 to different leagues.
Define ranges and define containers.
 Yes, there is a lot common between them, because T[] supports some 
 subset of operations that T[new] support.
No. You see, even you who have a close perspective on the issue have gotten confused. T[new] does not support assignment from range. So T[] and T[new] are not in a subtyping relationship. They support subtly different primitives.
unless I miss something a_new[]=slice; works just as a[]=slice; as in any case you have a slice on the left a=slice; // should redefine the slice and yes that is (correctly) not supported by T[new] a=range; should not be used in general imho because a slice should be a valid range, and I don't want similar operations to have very different results (changing the data or just changing the slice object itself). Fawzi
Oct 19 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 19 de octubre a las 11:16 me escribiste:
I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find?
I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar).
If T[new] is "array" and T[] is "slice" as I described them, I see no subtlety, they are huge differences, one is a dynamic array, the other is a view on somebody else's memory. I know you probably hate when people says this (that's what I avoided doing it before :): I works very well in Python. You have arrays (well, lists, but the interface is the same as an array) and slices. I never heard anybody complaining about that being confusing.
 That static decision concerns future appends to the array,
 which doesn't strike me as something you know from the get-go
 through future iterations of a design. Use of "auto" messes up
 things further: a nice function may choose to return T[new] because
 it just created an array (an implementation detail), but clients may
 find that unexpected.
I don't see much problem. You should always return an array (T[new]) if you have one, because you can get an slice from it (the inverse is not true). Because of this, implicit conversion from array to slice can be a good idea, so people expecting a slice when an array is returned will never know an array was returned anyways. If you need to keep appending to the array, you can have the original array and keep appending to it without any performance hit (no copying, no new allocations). What's wrong with that? -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Cuando intenté arrimarle mi brazo Se puso a hablar de Miller, de Anais Nin y Picasso Y si osaba intentar robarle un beso Se ponía a leer de Neruda unos versos Me hizo mucho mal la cumbiera intelectual
Oct 19 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 11:16 me escribiste:
 I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find?
I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar).
If T[new] is "array" and T[] is "slice" as I described them, I see no subtlety, they are huge differences, one is a dynamic array, the other is a view on somebody else's memory.
Their primitives overlap 90% in syntax and semantics.
 I know you probably hate when people says this (that's what I avoided
 doing it before :):
 I works very well in Python. You have arrays (well, lists, but the
 interface is the same as an array) and slices. I never heard anybody
 complaining about that being confusing.
Python lists and slices are quite different from D arrays.
 That static decision concerns future appends to the array,
 which doesn't strike me as something you know from the get-go
 through future iterations of a design. Use of "auto" messes up
 things further: a nice function may choose to return T[new] because
 it just created an array (an implementation detail), but clients may
 find that unexpected.
I don't see much problem. You should always return an array (T[new]) if you have one, because you can get an slice from it (the inverse is not true). Because of this, implicit conversion from array to slice can be a good idea, so people expecting a slice when an array is returned will never know an array was returned anyways. If you need to keep appending to the array, you can have the original array and keep appending to it without any performance hit (no copying, no new allocations). What's wrong with that?
People expecting a slice and using "auto" or templates are in for a rude awakening. Andrei
Oct 19 2009
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 19 de octubre a las 12:40 me escribiste:
 Leandro Lucarella wrote:
Andrei Alexandrescu, el 19 de octubre a las 11:16 me escribiste:
I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find?
I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar).
If T[new] is "array" and T[] is "slice" as I described them, I see no subtlety, they are huge differences, one is a dynamic array, the other is a view on somebody else's memory.
Their primitives overlap 90% in syntax and semantics.
So what? Ranges and slices too.
I know you probably hate when people says this (that's what I avoided
doing it before :):
I works very well in Python. You have arrays (well, lists, but the
interface is the same as an array) and slices. I never heard anybody
complaining about that being confusing.
Python lists and slices are quite different from D arrays.
Not the API.
I don't see much problem. You should always return an array (T[new]) if
you have one, because you can get an slice from it (the inverse is not
true). Because of this, implicit conversion from array to slice can be
a good idea, so people expecting a slice when an array is returned will
never know an array was returned anyways. If you need to keep appending to
the array, you can have the original array and keep appending to it
without any performance hit (no copying, no new allocations).

What's wrong with that?
People expecting a slice and using "auto" or templates are in for a rude awakening.
Ok, you are in one of those days when you don't listen to reasons. I give up! =) -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si ella es el sol, yo soy la luna Si ella es el mar, soy el desierto Y estamos en eclipse total Y estamos en eclipse total
Oct 19 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 I don't see much problem. You should always return an array (T[new]) if
 you have one, because you can get an slice from it (the inverse is not
 true). Because of this, implicit conversion from array to slice can be
 a good idea, so people expecting a slice when an array is returned will
 never know an array was returned anyways. If you need to keep appending to
 the array, you can have the original array and keep appending to it
 without any performance hit (no copying, no new allocations).

 What's wrong with that?
People expecting a slice and using "auto" or templates are in for a rude awakening.
Ok, you are in one of those days when you don't listen to reasons. I give up! =)
At least in that case I had an objective point! I'd agree many others are subjective. Andrei
Oct 19 2009
prev sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Andrei Alexandrescu wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 19 de octubre a las 11:16 me escribiste:
 I'm missing something? Why this shouldn't work?
It may work, but I was unable to pull it off reasonably well.
What problems did you find?
I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar).
If T[new] is "array" and T[] is "slice" as I described them, I see no subtlety, they are huge differences, one is a dynamic array, the other is a view on somebody else's memory.
Their primitives overlap 90% in syntax and semantics.
 I know you probably hate when people says this (that's what I avoided
 doing it before :):
 I works very well in Python. You have arrays (well, lists, but the
 interface is the same as an array) and slices. I never heard anybody
 complaining about that being confusing.
Python lists and slices are quite different from D arrays.
 That static decision concerns future appends to the array,
 which doesn't strike me as something you know from the get-go
 through future iterations of a design. Use of "auto" messes up
 things further: a nice function may choose to return T[new] because
 it just created an array (an implementation detail), but clients may
 find that unexpected.
I don't see much problem. You should always return an array (T[new]) if you have one, because you can get an slice from it (the inverse is not true). Because of this, implicit conversion from array to slice can be a good idea, so people expecting a slice when an array is returned will never know an array was returned anyways. If you need to keep appending to the array, you can have the original array and keep appending to it without any performance hit (no copying, no new allocations). What's wrong with that?
People expecting a slice and using "auto" or templates are in for a rude awakening. Andrei
I would argue that, if they truly *needed* a slice and are relying on "auto" then they implicitly accept responsibility for putting the [] in place to guarantee that. In other words: auto foo = someFunc()[]; Or T[]'s and T[new]'s could have an alternative to .dup that does the same thing (return self for T[], and return self[] for T[new]). Either way, it would be something to point out in the book as a best practice: when expecting a slice, guarantee a slice with []'s. -- Chris Nicholson-Sauls
Oct 19 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Chris Nicholson-Sauls wrote:
 People expecting a slice and using "auto" or templates are in for a 
 rude awakening.


 Andrei
I would argue that, if they truly *needed* a slice and are relying on "auto" then they implicitly accept responsibility for putting the [] in place to guarantee that. In other words: auto foo = someFunc()[]; Or T[]'s and T[new]'s could have an alternative to .dup that does the same thing (return self for T[], and return self[] for T[new]). Either way, it would be something to point out in the book as a best practice: when expecting a slice, guarantee a slice with []'s.
I understand that, but I also believe it's a symptom of a problem instead of a solution. One of our goals is to make the best practices book as high-level as possible, and integrate many best practices in the language. The above code, therefore, I put in the bin "stuff that the language's user must keep in mind". At best that should go away. Andrei
Oct 19 2009
prev sibling parent Yigal Chripun <yigal100 gmail.com> writes:
On 19/10/2009 16:57, Leandro Lucarella wrote:
 Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
 Here's what I wrote to Walter:

 ====================
 I'm going to suggest something terrible - let's get rid of T[new]. I
 know it's difficult to throw away work you've already done, but
 really things with T[new] start to look like a Pyrrhic victory. Here
 are some issues:

 * The abstraction doesn't seem to come off as crisp and clean as we
 both wanted;

 * There are efficiency issues, such as the two allocations that you
 valiantly tried to eliminate in a subset of cases;

 * Explaining two very similar but subtly different types to
 newcomers is excruciatingly difficult (I'll send you a draft of the
 chapter - it looks like a burn victim who didn't make it);

 * Furthermore, explaining people when to use one vs. the other is
 much more difficult than it seems. On the surface, it goes like
 this: "If you need to append stuff, use T[new]. If not, use T[]."
 Reality is much more subtle. For one thing, T[new] does not allow
 contraction from the left, whereas T[] does. That puts T[] at an
 advantage. So if you want to append stuff and also contract from the
 left, there's nothing our abstractions can help you with.
I think this is getting overcomplicated. I don't see it as complex, I see it like this: 2 types should be provided: array and slice. array is a *real* type, storing and owning memory, it should be something like this (conceptually): class array(T) { size_t length; size_t capacity; T[capacity] elements; } 1) a pure reference type. 2) 1 allocation only (interior pointers are not a problem, the GC have to support them anyways). 3) easily appendable (capacity field). slice should be something like this: struct slice(T) { size_t length; T* ptr; } 1) a pure value type. 2) no allocation at all, *ever*. 3) not appendable at all. 4) You can change both ptr and length, and you can mutate the elements (if not immutable). So a slice is a window to peek a chunk of data. Then you have the syntax. In this discussion, T[new] is array and T[] is slice. I find that syntax very confusing. I think it could be even better to just put those 2 types in object.d (well, do public imports of those 2 types in object.d) and let the people write: array!T a; slice!T s; The array literals should be immutable chunks of memory in the static memory, as ClassInfo.init. The type of [1,2,3] should be, for example, slice!(immutable int), so: auto s = [1,2,3]; should create a slice!(immutable int) with length 3 and ptr pointing to the static memory where the [1,2,3] was stored (this is what happens with strings, right? So no surprises here, they are consistent). slice.dup should return an array. If you want to just copy the length and ptr members, use assignment (it's a value type, remember)? auto s = [1,2,3]; auto t = s; assert(s.length == t.length); assert(s.ptr == t.ptr); assert(&s !=&t); Assignment of arrays is just a pointer assignment (is a reference type): auto a = [1,2,3].dup; auto b = a; assert(a is b); array.dup returns another array. array[] returns a slice though (you can allow implicit casting from array to slice but I don't see the point as array[] is short enough and more explicit). slices should not be convertible to arrays (use slice.dup for that). Back to the syntax, I think T[new] is *really* confusing. It tell me nothing about the type, it provides the same information as calling it it wazzzaaap!T for me. I *really* think it would be better to call it array!T. But if we have both types, T[] is not clear anymore that is a slice, again, you can't figure that out. So maybe T[] should be used for arrays, not slices; if you want some syntax sugar I think T[] is more guessable to be an array than a slice, and it would be a little more backward compatible. Maybe slices can be T[..], but I'm not sure is clear enough, again I think we could live with something more explicit like array!T and slice!T. But at least slices should be part of the language, because that should be the type of "array literals" (damn! that didn't sound right =P). I'm missing something? Why this shouldn't work?
Vote += 1_000_000; Also, slices are Ranges so this should integrate with the Ranges in phobos.
Oct 19 2009
prev sibling next sibling parent reply downs <default_357-line yahoo.de> writes:
Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing T[]
 to a function and then the function resizes the T[]. What happens with
 the original?
 
 The solution we came up with was to create a third array type, T[new],
 which was a reference type.
 
 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to
 what std.outbuffer and std.array.Appender do, they just need a bit of
 refining.
 
 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~= would no longer be allowed for T[].
 
 We both feel that this would simplify D, make it more flexible, and
 remove some awkward corner cases like the inability to say a.length++.
 
 What do you think?
I think ArrayBuilder can work almost entirely transparently provided the following conditions are met: 1) when trying to cat to an array, the first array automatically promotes to ArrayBuilder 2) Setting .length is, depending on new length, either a slice or a cat operation with the length difference. 2) ArrayBuilders implicitly cast to string. This should allow a syntax that is almost entirely identical to the one used in D1, aside from typeof("a"~"b").stringof :) Is this feasible in D2?
Oct 18 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from downs (default_357-line yahoo.de)'s article
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing T[]
 to a function and then the function resizes the T[]. What happens with
 the original?

 The solution we came up with was to create a third array type, T[new],
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to
 what std.outbuffer and std.array.Appender do, they just need a bit of
 refining.

 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
I think ArrayBuilder can work almost entirely transparently provided the
following conditions are met:
 1) when trying to cat to an array, the first array automatically promotes to
ArrayBuilder
 2) Setting .length is, depending on new length, either a slice or a cat
operation with the length difference.
 2) ArrayBuilders implicitly cast to string.
 This should allow a syntax that is almost entirely identical to the one used in
D1, aside from typeof("a"~"b").stringof :)
 Is this feasible in D2?
Once you throw in all the implicit conversions to ArrayBuilder and having the core language know about ArrayBuilder, aren't you, for all practical purposes, implementing T[new] but calling it ArrayBuilder?
Oct 18 2009
parent downs <default_357-line yahoo.de> writes:
dsimcha wrote:
 == Quote from downs (default_357-line yahoo.de)'s article
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing T[]
 to a function and then the function resizes the T[]. What happens with
 the original?

 The solution we came up with was to create a third array type, T[new],
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to
 what std.outbuffer and std.array.Appender do, they just need a bit of
 refining.

 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
I think ArrayBuilder can work almost entirely transparently provided the
following conditions are met:
 1) when trying to cat to an array, the first array automatically promotes to
ArrayBuilder
 2) Setting .length is, depending on new length, either a slice or a cat
operation with the length difference.
 2) ArrayBuilders implicitly cast to string.
 This should allow a syntax that is almost entirely identical to the one used in
D1, aside from typeof("a"~"b").stringof :)
 Is this feasible in D2?
Once you throw in all the implicit conversions to ArrayBuilder and having the core language know about ArrayBuilder, aren't you, for all practical purposes, implementing T[new] but calling it ArrayBuilder?
Yes but changing the semantics of a library object is easier than changing the semantics of a language feature. Besides, I think the builder works differently internally than T[new], although I'm not sure - storing lists or trees of sub-arrays and concatenating them lazily? At least that's what I would do :p
Oct 18 2009
prev sibling next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing T[] 
 to a function and then the function resizes the T[]. What happens with 
 the original?
 
 The solution we came up with was to create a third array type, T[new], 
 which was a reference type.
 
 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.
 
 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].
 
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
 
 What do you think?
I remember seeing a lot of CTFE code that created a dynamic array and then appended stuff to it, like for example to build a list of prime numbers. Would that still work with ArrayBuilder?
Oct 19 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Ary Borenszweig wrote:
 I remember seeing a lot of CTFE code that created a dynamic array and 
 then appended stuff to it, like for example to build a list of prime 
 numbers. Would that still work with ArrayBuilder?
Probably not. But you can rewrite: a ~= stuff; as: a = a ~ stuff; to make it work.
Oct 19 2009
next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Mon, Oct 19, 2009 at 2:48 AM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Ary Borenszweig wrote:
 I remember seeing a lot of CTFE code that created a dynamic array and th=
en
 appended stuff to it, like for example to build a list of prime numbers.
 Would that still work with ArrayBuilder?
Probably not. But you can rewrite: =A0a ~=3D stuff; as: =A0a =3D a ~ stuff; to make it work.
Ouch. Gives me Matlab nightmares. --bb
Oct 19 2009
prev sibling parent reply downs <default_357-line yahoo.de> writes:
Walter Bright wrote:
 Ary Borenszweig wrote:
 I remember seeing a lot of CTFE code that created a dynamic array and
 then appended stuff to it, like for example to build a list of prime
 numbers. Would that still work with ArrayBuilder?
Probably not. But you can rewrite: a ~= stuff; as: a = a ~ stuff; to make it work.
Is there any reason the first can't be a short-hand for the second?
Oct 19 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from downs (default_357-line yahoo.de)'s article
 Walter Bright wrote:
 Ary Borenszweig wrote:
 I remember seeing a lot of CTFE code that created a dynamic array and
 then appended stuff to it, like for example to build a list of prime
 numbers. Would that still work with ArrayBuilder?
Probably not. But you can rewrite: a ~= stuff; as: a = a ~ stuff; to make it work.
Is there any reason the first can't be a short-hand for the second?
Devil's advocate because I somewhat agree with you: a = a ~ stuff; is so inefficient that it should be ugly.
Oct 19 2009
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-10-19 10:14:02 -0400, dsimcha <dsimcha yahoo.com> said:

 == Quote from downs (default_357-line yahoo.de)'s article
 Walter Bright wrote:
 
 Probably not. But you can rewrite:
 
 a ~= stuff;
 
 as:
 
 a = a ~ stuff;
 
 to make it work.
Is there any reason the first can't be a short-hand for the second?
Devil's advocate because I somewhat agree with you: a = a ~ stuff; is so inefficient that it should be ugly.
I'd call that a premature optimization of the programmer's behavior. :-) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 19 2009
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 The purpose of T[new] was to solve the problems T[] had with passing T[] 
 to a function and then the function resizes the T[]. What happens with 
 the original?
Why do you need such feature built into language? So many proposals to implement this feature and that feature in a library, and unable to implement reftype array?
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
You write it as if you can do it with T[new].
Oct 19 2009
prev sibling next sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
Walter Bright Wrote:

 The purpose of T[new] was to solve the problems T[] had with passing T[] 
 to a function and then the function resizes the T[]. What happens with 
 the original?
 
 The solution we came up with was to create a third array type, T[new], 
 which was a reference type.
 
 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.
 
 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].
 
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
 
 What do you think?
I think that Arrays and AAs need to be removed from the language. I prefer to have a collections framework as part of Phobos without special cases in the language for specific containers. The only thing that needs to remain on the language side is support for user defined literals and operators so that it would be straight forward to use (as opposed to the clumsiness of C++). This syntax sugar is the *only* benefit of built in types and this should be possible to have without cluttering the language. such a framework would ideally support polymorphism. List!(Foo) list = new Array!(Foo); // instead of Foo[new] auto arr = new Array!(Foo); ... auto slice = arr[1..4]; // slice is a Range as in Andrei's Ranges in Phobos Foo[] a; // compile-error Foo[] is not a type, use Range!Foo instead Foo[3] a; // this is still perfectly legal
Oct 19 2009
parent reply BCS <none anon.com> writes:
Hello Yigal,

 I think that Arrays and AAs need to be removed from the language. I
 prefer to have a collections framework as part of Phobos without
 special cases in the language for specific containers.
If it can be assured that all the current array ops will get inlined is all cases and can result in identical asm I'd be willing to consider this.
 List!(Foo) list = new Array!(Foo); // instead of Foo[new]
For me to not dislike the idea you would also need to find a cleaner syntax.
Oct 19 2009
parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 19/10/2009 19:10, BCS wrote:
 Hello Yigal,

 I think that Arrays and AAs need to be removed from the language. I
 prefer to have a collections framework as part of Phobos without
 special cases in the language for specific containers.
If it can be assured that all the current array ops will get inlined is all cases and can result in identical asm I'd be willing to consider this.
 List!(Foo) list = new Array!(Foo); // instead of Foo[new]
For me to not dislike the idea you would also need to find a cleaner syntax.
See Leandro Lucarella's posts which describe the same idea. currently, AAs and arrays are implemented in the run-time. What I suggest is to move them to the stdlib. I don't see how such a move can impact performance which you're concerned with above. on the contrary, moving these types outside the runtime will make it easier to improve their implementations. regarding syntax, what's cleaner than calling an array an "Array"? T[new] is less clear and hackish.
Oct 19 2009
parent reply BCS <none anon.com> writes:
Hello Yigal,

 On 19/10/2009 19:10, BCS wrote:
 
 Hello Yigal,
 
 I think that Arrays and AAs need to be removed from the language. I
 prefer to have a collections framework as part of Phobos without
 special cases in the language for specific containers.
 
If it can be assured that all the current array ops will get inlined is all cases and can result in identical asm I'd be willing to consider this.
 List!(Foo) list = new Array!(Foo); // instead of Foo[new]
 
For me to not dislike the idea you would also need to find a cleaner syntax.
See Leandro Lucarella's posts which describe the same idea. currently, AAs and arrays are implemented in the run-time. What I suggest is to move them to the stdlib. I don't see how such a move can impact performance which you're concerned with above.
// a is an array of ints int b; if(a.length > 5) b = a[5]; else b = a[$/2]; No runtime in that at all now.
 on the contrary,
 moving these types outside the runtime will make it easier to improve
 their implementations.
 
 regarding syntax, what's cleaner than calling an array an "Array"?
 T[new] is less clear and hackish.
 
if T[new] gets done as Array!T, what does T[] look like? I'm not to worried about the fancy feature, it the bone simple stuff staying bone simple that I'm looking after.
Oct 19 2009
parent reply Yigal Chripun <yigal100 gmail.com> writes:
On 19/10/2009 22:53, BCS wrote:
 Hello Yigal,

 On 19/10/2009 19:10, BCS wrote:

 Hello Yigal,

 I think that Arrays and AAs need to be removed from the language. I
 prefer to have a collections framework as part of Phobos without
 special cases in the language for specific containers.
If it can be assured that all the current array ops will get inlined is all cases and can result in identical asm I'd be willing to consider this.
 List!(Foo) list = new Array!(Foo); // instead of Foo[new]
For me to not dislike the idea you would also need to find a cleaner syntax.
See Leandro Lucarella's posts which describe the same idea. currently, AAs and arrays are implemented in the run-time. What I suggest is to move them to the stdlib. I don't see how such a move can impact performance which you're concerned with above.
// a is an array of ints int b; if(a.length > 5) b = a[5]; else b = a[$/2]; No runtime in that at all now.
I don't see a reason why this can't be inlined by the compiler if Array is a lib type.
 on the contrary,
 moving these types outside the runtime will make it easier to improve
 their implementations.

 regarding syntax, what's cleaner than calling an array an "Array"?
 T[new] is less clear and hackish.
if T[new] gets done as Array!T, what does T[] look like?
T[] should be replaced by Ranges which are already implemented in Phobos.
 I'm not to worried about the fancy feature, it the bone simple stuff
 staying bone simple that I'm looking after.
Oct 19 2009
parent BCS <none anon.com> writes:
Hello Yigal,

 On 19/10/2009 22:53, BCS wrote:
 
 int b;
 if(a.length > 5) b = a[5];
 else b = a[$/2];
 No runtime in that at all now.
 
I don't see a reason why this can't be inlined by the compiler if Array is a lib type.
Yes, it should inline. What I'm saying is that unless it is /guaranteed/ that it will /always/ inline, I'm going to have an issue.
 if T[new] gets done as Array!T, what does T[] look like?
 
T[] should be replaced by Ranges which are already implemented in Phobos.
I think that T[] should be (at a minimum) the current slice semantics and implementation (the lvalue-length and ~/~= stuff is another question). If this happens to have the needed semantics to act as a range, all the better. Having a clean, light construct to deal with low level memory access is just to handy to give it up completely.
Oct 19 2009
prev sibling next sibling parent gzp <galap freemail.hu> writes:
Walter Bright írta:
 The purpose of T[new] was to solve the problems T[] had with passing T[] 
 to a function and then the function resizes the T[]. What happens with 
 the original?
I've tried to follow the T[], T[new], T+slice discussion, but i'm lost, I don't think if it's a good idea, to extend the language with all kind of decorated version of the same object ( scope (*T)[scope new][1..$] :) ) "It will tell you for instance how to describe something that was about to happen to you in the past before you avoided it by time-jumping forward two days in order to avoid it. The event will be described differently according to whether you are talking about it from the standpoint of your own natural time, from a time in the further future, or a time in the further past and is further complicated by the possibility of conducting conversations whilst you are actually travelling from one time to another with the intention of becoming your own father or mother." Douglas Adams - The Hitchhiker's Guide to the Galaxy Just an idea to simplify this whole thing, maybe it's whole wrong, but who knows... :) Instead of creating new types of arrays we have already two: static and dynamic. A dynamic version can be resized (restructured) any time, the static has a fixed size, thus the dynamic version works as a class and the static version works as a struct for either in,out,ref/etc. To create slices, slices those references the whole array i'd introduce a new type (as there is something like this): range!(T). A range is actually an inner class of the array implementing an IRange(T) interface with the usual popFront, front, etc. For a general class if one wants to add range, he just have to provide an onRange (onSlice) function. As the range interface itself provides the opIndex, opAssign, opXAssign etc., they can be implemented as desired and no more opSliceAddAssign and other decorated function's are required. Even more as the range is an inner class of the array (and knows about the array), a finer tune of what_shall_happen_when_my_referrer_array_is_resize policy can be created. As the range itself can work as an array, range[1] (or maybe even rang[1..2], calling the IRange!(T).onRange() ) it can be used as an array whose structure (length) cannot be altered as desired for T[] ex: T[] a; // is a dynamic array as before range(T) s1 = T[1..2]; // the slice from 1..2 range(T) s1 = T.opRange(); // the slice representing the whole range foo( T[new] a ) is same as foo( T[] ) foo( T[] ) would be replaced by foo( range(T) ) and thus cannot be resized furthermore foreach( a; array ) translates to something like -> foreach( a; a.onRange() )... foreach( a; array[1..2] ) translates to something like -> foreach( a; a.onRange(1,2) )... foreach( a; array[1..2,3..4] ) translates to something like -> foreach( a; a.onRange(1,2,3,4) )... array[1..2] += 1 translates to something like -> array.onRange(1,2).opAddAssign(1); It'd be also great to create more then one onRange implementation of a class ex: matrix.opRangeRowWise matrix.opRangeColumnWise, but i don't know how to select the appropriate one in a foreach loop "automatically": foreach( a; mx.rowWise ) -> mx.selectRange(name : string) { mixin( "onRange" ~ name ) } foreach( a; ma.selectRange("rowWise")(1,2) )... As a conclusion: please don't create any more array type, we have just enough. We have already all the features in the language to express both the desired and explained (above) behaviour. Only the syntax should be cleared out. Using the same notation multiple times with small decorations is not a good idea, and makes things confusing. Thanks.
Oct 19 2009
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing T[] 
 to a function and then the function resizes the T[]. What happens with 
 the original?
 
 The solution we came up with was to create a third array type, T[new], 
 which was a reference type.
 
 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.
 
 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].
 
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
 
 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
Oct 19 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing 
 T[] to a function and then the function resizes the T[]. What happens 
 with the original?

 The solution we came up with was to create a third array type, T[new], 
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.

 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
I'm relieved that somebody mentioned that :o). As soon as we gave up with T[new], people started to sell it to us. We should preemptively post about eliminating feature plans before actually implementing them. By the way: implementation of property has been canceled. Andrei
Oct 19 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 19 de octubre a las 14:31 me escribiste:
 By the way: implementation of  property has been canceled.
Keep throwing the good news :( -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Es mas posible, que un elefante maneje un cero km a que un camello habite un departamento de un ambiente. -- Peperino Pómoro
Oct 19 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 19 Oct 2009 23:31:37 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing  
 T[] to a function and then the function resizes the T[]. What happens  
 with the original?

 The solution we came up with was to create a third array type, T[new],  
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a  
 "builder" library type to handle creating arrays by doing things like  
 appending, and then delivering a finished T[] type. This is similar to  
 what std.outbuffer and std.array.Appender do, they just need a bit of  
 refining.

 The .length property of T[] would then become an rvalue only, not an  
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and  
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
I'm relieved that somebody mentioned that :o). As soon as we gave up with T[new], people started to sell it to us. We should preemptively post about eliminating feature plans before actually implementing them. By the way: implementation of property has been canceled. Andrei
I *really* hope you told it so that we encourage actually implementing it! I strongly believe attributes and properties are very important for D.
Oct 19 2009
parent Ary Borenszweig <ary esperanto.org.ar> writes:
Denis Koroskin wrote:
 On Mon, 19 Oct 2009 23:31:37 +0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Don wrote:
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing 
 T[] to a function and then the function resizes the T[]. What 
 happens with the original?

 The solution we came up with was to create a third array type, 
 T[new], which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things 
 like appending, and then delivering a finished T[] type. This is 
 similar to what std.outbuffer and std.array.Appender do, they just 
 need a bit of refining.

 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
I'm relieved that somebody mentioned that :o). As soon as we gave up with T[new], people started to sell it to us. We should preemptively post about eliminating feature plans before actually implementing them. By the way: implementation of property has been canceled. Andrei
I *really* hope you told it so that we encourage actually implementing it! I strongly believe attributes and properties are very important for D.
Not if the ones that implement it didn't use attributes before or don't know what they are useful for or how they could be implemented. :-)
Oct 19 2009
prev sibling parent reply grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 Don wrote:
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing 
 T[] to a function and then the function resizes the T[]. What happens 
 with the original?

 The solution we came up with was to create a third array type, 
 T[new], which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar 
 to what std.outbuffer and std.array.Appender do, they just need a bit 
 of refining.

 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
I'm relieved that somebody mentioned that :o). As soon as we gave up with T[new], people started to sell it to us. We should preemptively post about eliminating feature plans before actually implementing them. By the way: implementation of property has been canceled.
Yeah, let's just keep the language in the broken state it is, because we can't think of a better solution.
 Andrei
Oct 19 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with passing 
 T[] to a function and then the function resizes the T[]. What 
 happens with the original?

 The solution we came up with was to create a third array type, 
 T[new], which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things 
 like appending, and then delivering a finished T[] type. This is 
 similar to what std.outbuffer and std.array.Appender do, they just 
 need a bit of refining.

 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
I'm relieved that somebody mentioned that :o). As soon as we gave up with T[new], people started to sell it to us. We should preemptively post about eliminating feature plans before actually implementing them. By the way: implementation of property has been canceled.
Yeah, let's just keep the language in the broken state it is, because we can't think of a better solution.
Silly me, I was thinking the humor was all too obvious. Andrei
Oct 19 2009
parent grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 grauzone wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Walter Bright wrote:
 The purpose of T[new] was to solve the problems T[] had with 
 passing T[] to a function and then the function resizes the T[]. 
 What happens with the original?

 The solution we came up with was to create a third array type, 
 T[new], which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things 
 like appending, and then delivering a finished T[] type. This is 
 similar to what std.outbuffer and std.array.Appender do, they just 
 need a bit of refining.

 The .length property of T[] would then become an rvalue only, not 
 an lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
Since noone else seems to have said it: The fact that you're both willing to let it go, after having already invested a lot of time in it, is a good sign for the language. Well done.
I'm relieved that somebody mentioned that :o). As soon as we gave up with T[new], people started to sell it to us. We should preemptively post about eliminating feature plans before actually implementing them. By the way: implementation of property has been canceled.
Yeah, let's just keep the language in the broken state it is, because we can't think of a better solution.
Silly me, I was thinking the humor was all too obvious.
It was only a joke? That's a relief.
 Andrei
Oct 19 2009
prev sibling parent Piotrek <starpit tlen.pl> writes:
Don pisze:
 
 Since noone else seems to have said it: The fact that you're both 
 willing to let it go, after having already invested a lot of time in it, 
 is a good sign for the language. Well done.
Hey Don, you speak my words. Specially that I can't see good reason for T[new] with present array language support (but that's probably my poor knowledge). Walter's main post and one of the later are reflecting my feelings. We need some polishing and good library now. Cheers Piotrek
Oct 19 2009
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-10-18 17:05:39 -0400, Walter Bright <newshound1 digitalmars.com> said:

 The purpose of T[new] was to solve the problems T[] had with passing 
 T[] to a function and then the function resizes the T[]. What happens 
 with the original?
 
 The solution we came up with was to create a third array type, T[new], 
 which was a reference type.
 
 Andrei had the idea that T[new] could be dispensed with by making a 
 "builder" library type to handle creating arrays by doing things like 
 appending, and then delivering a finished T[] type. This is similar to 
 what std.outbuffer and std.array.Appender do, they just need a bit of 
 refining.
 
 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].
 
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
 
 What do you think?
I never liked T[new] much from the beginning, so good riddance. :-) But seriously, disallowing '~=' but not '~'? I can already see the newbies: Q: Why can't I write '~=' as a shortcut for '~' on a slice? It works fine for every other operator, everywhere else. A: Because appending is not very efficient. Q: So '~' is more efficient than '~='? A: Hum, no. They're both as inefficient, but... -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 19 2009
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing T[]  
 to a function and then the function resizes the T[]. What happens with  
 the original?

 The solution we came up with was to create a third array type, T[new],  
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a  
 "builder" library type to handle creating arrays by doing things like  
 appending, and then delivering a finished T[] type. This is similar to  
 what std.outbuffer and std.array.Appender do, they just need a bit of  
 refining.

 The .length property of T[] would then become an rvalue only, not an  
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and  
 remove some awkward corner cases like the inability to say a.length++.

 What do you think?
At the risk of sounding like bearophile -- I've proposed 2 solutions in the past for this that *don't* involve creating a T[new] type. 1. Store the allocated length in the GC structure, then only allow appending when the length of the array being appended matches the allocated length. 2. Store the allocated length at the beginning of the array, and use a bit in the array length to determine if it starts at the beginning of the block. The first solution has space concerns, and the second has lots more concerns, but can help in the case of having to do a GC lookup to determine if a slice can be appended (you'd still have to lock the GC to do an actual append or realloc). I prefer the first solution over the second. I like the current behavior *except* for appending. Most of the time it does what you want, and the syntax is beautiful. In regards to disallowing x ~= y, I'd propose you at least make it equivalent to x = x ~ y instead of removing it. -Steve
Oct 20 2009
parent reply Bill Baxter <wbaxter gmail.com> writes:
On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
 <newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing T[]
 to a function and then the function resizes the T[]. What happens with t=
he
 original?

 The solution we came up with was to create a third array type, T[new],
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to w=
hat
 std.outbuffer and std.array.Appender do, they just need a bit of refinin=
g.
 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~=3D would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and remo=
ve
 some awkward corner cases like the inability to say a.length++.

 What do you think?
At the risk of sounding like bearophile -- I've proposed 2 solutions in t=
he
 past for this that *don't* involve creating a T[new] type.

 1. Store the allocated length in the GC structure, then only allow append=
ing
 when the length of the array being appended matches the allocated length.

 2. Store the allocated length at the beginning of the array, and use a bi=
t
 in the array length to determine if it starts at the beginning of the blo=
ck.
 The first solution has space concerns, and the second has lots more
 concerns, but can help in the case of having to do a GC lookup to determi=
ne
 if a slice can be appended (you'd still have to lock the GC to do an actu=
al
 append or realloc). =A0I prefer the first solution over the second.

 I like the current behavior *except* for appending. =A0Most of the time i=
t
 does what you want, and the syntax is beautiful.

 In regards to disallowing x ~=3D y, I'd propose you at least make it
 equivalent to x =3D x ~ y instead of removing it.
If you're going to do ~=3D a lot then you should convert to the dynamic array type. If you're not going to do ~=3D a lot, then you can afford to write out x = =3D x ~ y. The bottom line is that it just doesn't make sense to append onto a "view" type. It's really a kind of constness. Having a view says the underlying memory locations you are looking at are fixed. It doesn't make sense to imply there's an operation that can change those memory locations (other than shrinking the window to view fewer of them). --bb
Oct 20 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 20 Oct 2009 11:10:20 -0400, Bill Baxter <wbaxter gmail.com> wrote:

 On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
 <newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing  
 T[]
 to a function and then the function resizes the T[]. What happens with  
 the
 original?

 The solution we came up with was to create a third array type, T[new],
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to  
 what
 std.outbuffer and std.array.Appender do, they just need a bit of  
 refining.

 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and  
 remove
 some awkward corner cases like the inability to say a.length++.

 What do you think?
At the risk of sounding like bearophile -- I've proposed 2 solutions in the past for this that *don't* involve creating a T[new] type. 1. Store the allocated length in the GC structure, then only allow appending when the length of the array being appended matches the allocated length. 2. Store the allocated length at the beginning of the array, and use a bit in the array length to determine if it starts at the beginning of the block. The first solution has space concerns, and the second has lots more concerns, but can help in the case of having to do a GC lookup to determine if a slice can be appended (you'd still have to lock the GC to do an actual append or realloc). I prefer the first solution over the second. I like the current behavior *except* for appending. Most of the time it does what you want, and the syntax is beautiful. In regards to disallowing x ~= y, I'd propose you at least make it equivalent to x = x ~ y instead of removing it.
If you're going to do ~= a lot then you should convert to the dynamic array type. If you're not going to do ~= a lot, then you can afford to write out x = x ~ y. The bottom line is that it just doesn't make sense to append onto a "view" type. It's really a kind of constness. Having a view says the underlying memory locations you are looking at are fixed. It doesn't make sense to imply there's an operation that can change those memory locations (other than shrinking the window to view fewer of them).
Having the append operation extend into already allocated memory is an optimization. In this case, it's an optimization that can corrupt memory. If we can make append extend into already allocated memory *and* not cause corruption, I don't see the downside. And then there is one less array type to deal with (, create functions that handle, etc.). Besides, I think Andrei's LRU solution is better than mine (and pretty much in line with it). I still think having an Appender object or struct is a worthwhile thing, the "pre-allocate array then set length to zero" model is a hack at best. -Steve
Oct 20 2009
next sibling parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 I still think having an Appender object or struct is a worthwhile thing, 
 the "pre-allocate array then set length to zero" model is a hack at best.
Would that work with Andrei's append cache at all? Setting the length to zero and then appending is like taking a slice of length 0 and then appending. Maybe introduce a write/readable .capacity property, that magically accesses the cache/GC?
 -Steve
Oct 20 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Steven Schveighoffer wrote:
 I still think having an Appender object or struct is a worthwhile 
 thing, the "pre-allocate array then set length to zero" model is a 
 hack at best.
Would that work with Andrei's append cache at all? Setting the length to zero and then appending is like taking a slice of length 0 and then appending. Maybe introduce a write/readable .capacity property, that magically accesses the cache/GC?
For my money, I'd get rid of that trick: a.length = 1000; a.length = 0; for (...) a ~= x; Andrei
Oct 20 2009
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 20 Oct 2009 21:01:17 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 grauzone wrote:
 Steven Schveighoffer wrote:
 I still think having an Appender object or struct is a worthwhile  
 thing, the "pre-allocate array then set length to zero" model is a  
 hack at best.
Would that work with Andrei's append cache at all? Setting the length to zero and then appending is like taking a slice of length 0 and then appending. Maybe introduce a write/readable .capacity property, that magically accesses the cache/GC?
For my money, I'd get rid of that trick: a.length = 1000; a.length = 0; for (...) a ~= x; Andrei
I agree it's ugly but that's the best we have in D, and it looks like things are getting even worse...
Oct 20 2009
prev sibling parent grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 grauzone wrote:
 Steven Schveighoffer wrote:
 I still think having an Appender object or struct is a worthwhile 
 thing, the "pre-allocate array then set length to zero" model is a 
 hack at best.
Would that work with Andrei's append cache at all? Setting the length to zero and then appending is like taking a slice of length 0 and then appending. Maybe introduce a write/readable .capacity property, that magically accesses the cache/GC?
For my money, I'd get rid of that trick: a.length = 1000; a.length = 0; for (...) a ~= x;
Yes, that's what we currently use for setting the capacity of an array. And it looks stupid and non-intuitive; someone new to D might think it's a no-op. Better way please?
 
 Andrei
Oct 20 2009
prev sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Tue, Oct 20, 2009 at 8:50 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 On Tue, 20 Oct 2009 11:10:20 -0400, Bill Baxter <wbaxter gmail.com> wrote=
:
 On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
 <newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing T=
[]
 to a function and then the function resizes the T[]. What happens with
 the
 original?

 The solution we came up with was to create a third array type, T[new],
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar to
 what
 std.outbuffer and std.array.Appender do, they just need a bit of
 refining.

 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~=3D would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and
 remove
 some awkward corner cases like the inability to say a.length++.

 What do you think?
At the risk of sounding like bearophile -- I've proposed 2 solutions in the past for this that *don't* involve creating a T[new] type. 1. Store the allocated length in the GC structure, then only allow appending when the length of the array being appended matches the allocated lengt=
h.
 2. Store the allocated length at the beginning of the array, and use a
 bit
 in the array length to determine if it starts at the beginning of the
 block.

 The first solution has space concerns, and the second has lots more
 concerns, but can help in the case of having to do a GC lookup to
 determine
 if a slice can be appended (you'd still have to lock the GC to do an
 actual
 append or realloc). =A0I prefer the first solution over the second.

 I like the current behavior *except* for appending. =A0Most of the time=
it
 does what you want, and the syntax is beautiful.

 In regards to disallowing x ~=3D y, I'd propose you at least make it
 equivalent to x =3D x ~ y instead of removing it.
If you're going to do ~=3D a lot then you should convert to the dynamic array type. If you're not going to do ~=3D a lot, then you can afford to write out x=
=3D x
 ~ y.

 The bottom line is that it just doesn't make sense to append onto a
 "view" type. =A0It's really a kind of constness. =A0Having a view says t=
he
 underlying memory locations you are looking at are fixed. =A0It doesn't
 make sense to imply there's an operation that can change those memory
 locations (other than shrinking the window to view fewer of them).
Having the append operation extend into already allocated memory is an optimization. =A0In this case, it's an optimization that can corrupt memo=
ry.
 If we can make append extend into already allocated memory *and* not caus=
e
 corruption, I don't see the downside. =A0And then there is one less array=
type
 to deal with (, create functions that handle, etc.).

 Besides, I think Andrei's LRU solution is better than mine (and pretty mu=
ch
 in line with it).

 I still think having an Appender object or struct is a worthwhile thing, =
the
 "pre-allocate array then set length to zero" model is a hack at best.
But you still have the problem Andrei posted. Code like this: void func(int[] x) { x ~=3D 3; x[0] =3D 42; } it'll compile and maybe run just fine, but there's no way to know if the caller will see the 42 or not. Unpredictable behavior like that is breeding grounds for subtle bugs. Perhaps that potential for bugs can be reduced by turning off the LRU stuff in debug builds, and just making ~=3D reallocate always there. Since, as you said, it's an optimization, makes sense to only turn it on in release or maybe optimized builds. To Andrei, do you really feel comfortable trying to explain this in your book? It seems like it will be difficult to explain that ~=3D is sometimes efficient for appending but not necessarily if you're working with a lot of arrays because it actually keeps this cache under the hood that may or may not remember the actual underlying capacity of the array you're appending to, so you should probably use ArrayBuilder if you can, despite the optimization. --bb
Oct 20 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 To Andrei, do you really feel comfortable trying to explain this in
 your book?  It seems like it will be difficult to explain that ~= is
 sometimes efficient for appending but not necessarily if you're
 working with a lot of arrays because it actually keeps this cache
 under the hood that may or may not remember the actual underlying
 capacity of the array you're appending to, so you should probably use
 ArrayBuilder if you can, despite the optimization.
I guess I'll try and let you all know. Andrei
Oct 20 2009
parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Oct 20, 2009 at 10:05 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 To Andrei, do you really feel comfortable trying to explain this in
 your book? =A0It seems like it will be difficult to explain that ~=3D is
 sometimes efficient for appending but not necessarily if you're
 working with a lot of arrays because it actually keeps this cache
 under the hood that may or may not remember the actual underlying
 capacity of the array you're appending to, so you should probably use
 ArrayBuilder if you can, despite the optimization.
I guess I'll try and let you all know.
I can also see this becoming an Effective D tip -- """ For common cases appending to slices is fast. However the performance depends on a hidden LRU cache to remember the capacities of the most recent N arrays. This works fine until you hit that N limit. Unfortunately as you compose code together it is easy to overflow that cache without realizing it, leading to sudden performance drops for no apparent reason. Thus we suggest you always use ArrayBuilder when appending to arrays rather than slices. """ Or not. This is one of those places where some data is really needed. It may be that 99.9% of code is only actively appending to 4 arrays or fewer. It just seems too tricky that this innocent-looking code: int[] i; foreach(k; 1..20_000) { i ~=3D some_function(k); } could hit a performance cliff based on how many arrays get used deep in the call chain of some_function(). Granted, cache issues can cause these kinds of cliffs for any kind of code, but I suspect this cliff would be particularly noticeable, given the slowness of allocations. --bb
Oct 20 2009
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 20 Oct 2009 12:30:57 -0400, Bill Baxter <wbaxter gmail.com> wrote:

 On Tue, Oct 20, 2009 at 8:50 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Tue, 20 Oct 2009 11:10:20 -0400, Bill Baxter <wbaxter gmail.com>  
 wrote:

 On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
 <newshound1 digitalmars.com> wrote:

 The purpose of T[new] was to solve the problems T[] had with passing  
 T[]
 to a function and then the function resizes the T[]. What happens  
 with
 the
 original?

 The solution we came up with was to create a third array type,  
 T[new],
 which was a reference type.

 Andrei had the idea that T[new] could be dispensed with by making a
 "builder" library type to handle creating arrays by doing things like
 appending, and then delivering a finished T[] type. This is similar  
 to
 what
 std.outbuffer and std.array.Appender do, they just need a bit of
 refining.

 The .length property of T[] would then become an rvalue only, not an
 lvalue, and ~= would no longer be allowed for T[].

 We both feel that this would simplify D, make it more flexible, and
 remove
 some awkward corner cases like the inability to say a.length++.

 What do you think?
At the risk of sounding like bearophile -- I've proposed 2 solutions in the past for this that *don't* involve creating a T[new] type. 1. Store the allocated length in the GC structure, then only allow appending when the length of the array being appended matches the allocated length. 2. Store the allocated length at the beginning of the array, and use a bit in the array length to determine if it starts at the beginning of the block. The first solution has space concerns, and the second has lots more concerns, but can help in the case of having to do a GC lookup to determine if a slice can be appended (you'd still have to lock the GC to do an actual append or realloc). I prefer the first solution over the second. I like the current behavior *except* for appending. Most of the time it does what you want, and the syntax is beautiful. In regards to disallowing x ~= y, I'd propose you at least make it equivalent to x = x ~ y instead of removing it.
If you're going to do ~= a lot then you should convert to the dynamic array type. If you're not going to do ~= a lot, then you can afford to write out x = x ~ y. The bottom line is that it just doesn't make sense to append onto a "view" type. It's really a kind of constness. Having a view says the underlying memory locations you are looking at are fixed. It doesn't make sense to imply there's an operation that can change those memory locations (other than shrinking the window to view fewer of them).
Having the append operation extend into already allocated memory is an optimization. In this case, it's an optimization that can corrupt memory. If we can make append extend into already allocated memory *and* not cause corruption, I don't see the downside. And then there is one less array type to deal with (, create functions that handle, etc.). Besides, I think Andrei's LRU solution is better than mine (and pretty much in line with it). I still think having an Appender object or struct is a worthwhile thing, the "pre-allocate array then set length to zero" model is a hack at best.
But you still have the problem Andrei posted. Code like this: void func(int[] x) { x ~= 3; x[0] = 42; }
depending on what you want, you then rewrite:
 void func(int[] x)
 {
      x[0] = 42;
      x ~= 3;
 }
or
 void func(int[] x)
 {
      x = x ~ 3;
      x[0] = 42;
 }
Generally when you are appending, you are not also changing the original data, so you don't care whether it's an optimization or not. Your code is obviously broken anyways, since *nobody* ever sees the 3.
 it'll compile and maybe run just fine, but there's no way to know if
 the caller will see the 42 or not.   Unpredictable behavior like that
 is breeding grounds for subtle bugs.
I'm sure we could spend days coming up with code that introduces subtle bugs. You can't fix all bugs that people may write. I don't think your scenario is very likely. More importantly, the problem with the current appending behavior is this: void foo(int[] x) { x ~= 3; ... } That may have just corrupted some data that you don't own, so defensively, you should write: void foo(int[] x) { x = x ~ 3; ... } But with Andrei's solution, you cannot possibly corrupt data with this line. Now, if you then go and set one of the values in the original array (like you did), then you may or may not change the original array. But as the function takes a mutable array, *you own the array* so it is a mistake to think when you pass in an array that's not const, you should expect it to remain unchanged. If your goal is to affect the original array, then you should accept a ref argument or not append to it. -Steve
Oct 20 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 If your goal is to affect the 
 original array, then you should accept a ref argument or not append to it.
I think that's an entirely reasonable (and easy to explain) stance. Andrei
Oct 20 2009
parent reply Bill Baxter <wbaxter gmail.com> writes:
On Tue, Oct 20, 2009 at 11:30 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Steven Schveighoffer wrote:
 If your goal is to affect the original array, then you should accept a ref
 argument or not append to it.
I think that's an entirely reasonable (and easy to explain) stance.
I've definitely spent time tracking down exactly such bugs, where I meant to make the argument a ref but didn't. If the above is to be the official stance, then I think it should be enforced by the compiler. Appending to non-ref slice args should be an error. --bb
Oct 20 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 20 Oct 2009 14:43:16 -0400, Bill Baxter <wbaxter gmail.com> wrote:

 On Tue, Oct 20, 2009 at 11:30 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Steven Schveighoffer wrote:
 If your goal is to affect the original array, then you should accept a  
 ref
 argument or not append to it.
I think that's an entirely reasonable (and easy to explain) stance.
I've definitely spent time tracking down exactly such bugs, where I meant to make the argument a ref but didn't. If the above is to be the official stance, then I think it should be enforced by the compiler. Appending to non-ref slice args should be an error.
Except when you are passing ownership of an array. Basically, there are four modes: T[] x : a) you are passing ownership to the function, and the array might not be deterministically usable after the function returns, e.g. T[] padAndCapitalize(T[]). Usually such functions' return values are deterministic, and the focus of what you care about. -or- b) you are lending ownership to the function, only the array elements will be altered, e.g. replace(). ref T[] x : You are lending ownership of the array to the function, but you get ownership back, e.g. push(). Fully deterministic altering. const(T)[] x : You retain ownership of the array, the function cannot alter it, e.g. toUpper(). So it's not possible to flag on the signature between the first a) and b) modes, we must currently rely on documentation. But the "append and modify" routines are pretty rare. Essentially, what you want is a type where the length is const, but the data is mutable. I think creating such a type should be possible in the library. But fixing the append problem alone is a huge step forward, since all of these cases could result in corruption of totally unrelated data if the arrays were appended to. Especially the const version is bad. -Steve
Oct 20 2009
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
I've just caught up on this thread.  Built-in dynamic arrays have been 
one of D's major features since the early days.  Now you're planning to 
remove this feature?

http://www.digitalmars.com/d/1.0/builtin.html
states that the C++ STL has many types that have been created to 
compensate for the limitations of the built-in array type, and the power 
of D's built-in arrays largely eliminates the need for these.  So now 
you're suggesting that we do away with this power, and create these 
library types that the point was to avoid?

Walter Bright wrote:
<snip>
 The .length property of T[] would then become an rvalue only, not an 
 lvalue, and ~= would no longer be allowed for T[].
I thought you were already moving that functionality into T[new]. I think T[new] versus T[] is actually a good design - it makes for a form of array length constancy as well as possibly getting rid of such nasties as bug 2093.
 We both feel that this would simplify D, make it more flexible, and 
 remove some awkward corner cases like the inability to say a.length++.
I must've missed the discussion - what's wrong with fixing such expressions to work in terms of property setters/getters in the natural way? Stewart.
Oct 30 2009