www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Enhanced array appending

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
All,

I created a new patched druntime that prevents array stomping and at the  
same time increases append performance for thread-local array appending.

The patch is attached to bugzilla entry 3637. (  
http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

For those of you unfamiliar with array stomping, here is code that  
demonstrates the problem:

int[] a = [1,2,3].dup;
int[] b = a[0..1];
b ~= 4;
assert(b == [1,4]);
assert(a == [1,2,3]); // fails with old code, passes with mine.

Note that my code still does not help with the problem of having a shared  
view of data when appending, as outlined in Bartosz's NG thread "D array  
expansion and non-deterministic re-allocation."  I don't think this  
problem can be solved without sacrificing performance of array appending.

For those of you unfamiliar with D's poor array append performance, please  
see  
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=103563

Please give it a try and see if you can find any issues with it.  I'm  
sorry that I don't have a binary distribution channel for this.  Maybe  
someone can build some binaries for others to download.

-Steve
Dec 21 2009
next sibling parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 All,
 
 I created a new patched druntime that prevents array stomping and at the 
 same time increases append performance for thread-local array appending.
 
 The patch is attached to bugzilla entry 3637. ( 
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

Nobody cares about it? (Except someone commenting in bugzilla, but where's Andrei and Walter?) Also, what again were the arguments against adding a "capacity" field in the slice struct to deal with the append problem? I think Go does this.
Dec 23 2009
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
grauzone Wrote:

 Steven Schveighoffer wrote:
 All,
 
 I created a new patched druntime that prevents array stomping and at the 
 same time increases append performance for thread-local array appending.
 
 The patch is attached to bugzilla entry 3637. ( 
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

Nobody cares about it? (Except someone commenting in bugzilla, but where's Andrei and Walter?)

I have already discussed this patch with Andrei and Walter (and the rest of the compiler/phobos crew), they are on board. I am a little disappointed with the response here, but I guess it could be attributed to the barrier to experimentation (you have to apply the patch and compile the runtime) and the likelihood that many are on vacation right now (as I will be as of tonight).
 Also, what again were the arguments against adding a "capacity" field in 
 the slice struct to deal with the append problem?

The problem is that adding a capacity field only works if the object is a reference type. A value type will make either choose to make copies of the capacity field, resulting in the same stomping problem (and maybe worse), or will zero out one of the copy's capacity field, resulting in an artificial limitation for appending. The only true way to make this work right is to have a capacity field shared among all the aliases to the same data, and the most logical place for that is in the allocated block. -Steve
Dec 23 2009
next sibling parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 grauzone Wrote:
 
 Steven Schveighoffer wrote:
 All,

 I created a new patched druntime that prevents array stomping and at the 
 same time increases append performance for thread-local array appending.

 The patch is attached to bugzilla entry 3637. ( 
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

where's Andrei and Walter?)

I have already discussed this patch with Andrei and Walter (and the rest of the compiler/phobos crew), they are on board. I am a little disappointed with the response here, but I guess it could be attributed to the barrier to experimentation (you have to apply the patch and compile the runtime) and the likelihood that many are on vacation right now (as I will be as of tonight).

Now Andrei replied and said everyone was "excited". I'm expecting most people to wait until there's a dmd release with the patch included (I'll do that).
 Also, what again were the arguments against adding a "capacity" field in 
 the slice struct to deal with the append problem?

The problem is that adding a capacity field only works if the object is a reference type. A value type will make either choose to make copies of the capacity field, resulting in the same stomping problem (and maybe worse), or will zero out one of the copy's capacity field, resulting in an artificial limitation for appending. The only true way to make this work right is to have a capacity field shared among all the aliases to the same data, and the most logical place for that is in the allocated block.

I see... Go makes slices value types (which references the array data), but I'm not sure how or if they handle this situation. Then, why use a cache to find the pointer to the capacity field inside the array memory block? You could just store a pointer to the capacity field in the slice. Something like this, I guess: struct slice { void* ptr; size_t length; array_info* array; } //can be located at the start of an array memory block struct array_info { size_t max_length; size_t capacity; } void slice_set_length(ref slice s, size_t newlen) { if (newlen < s.length) { s.length = newlen; return; } //was not allocated from the GC? if (!s.array) goto new_array; //danger of stomping? if (s.length != s.array.max_length) goto new_array; //memory block too small? if (newlen > s.array.capacity) goto new_array; s.length = newlen; s.array.max_length = max(s.array.max_length, s.length); return; new_array: ... allocate a memory block with array descriptor ... copy s[] into it ... set s to new array pointer, length, array descriptor } Would this work? (Except that I didn't include handling of array element default initialization.) Actually, I think that's the same idea (as Andrei's original post on the array append cache thing), just much shorter and no indeterminism due to a volatile cache.
 -Steve

Dec 23 2009
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
grauzone Wrote:

 Steven Schveighoffer wrote:
 The problem is that adding a capacity field only works if the object is a
reference type.  A value type will make either choose to make copies of the
capacity field, resulting in the same stomping problem (and maybe worse), or
will zero out one of the copy's capacity field, resulting in an artificial
limitation for appending.  The only true way to make this work right is to have
a capacity field shared among all the aliases to the same data, and the most
logical place for that is in the allocated block.

I see... Go makes slices value types (which references the array data), but I'm not sure how or if they handle this situation. Then, why use a cache to find the pointer to the capacity field inside the array memory block? You could just store a pointer to the capacity field in the slice. Something like this, I guess: struct slice { void* ptr; size_t length; array_info* array; } [snip] Would this work?

Yes, but the problem I see with it is you are passing around that array_info pointer with the slice for *all* slices, not just ones you intend to use appending on. There is plenty of code that uses arrays and slices without ever doing appending, so that extra pointer is wasted space, and also increases the cost of passing slices by value. Your idea is essentially the idea behind my patch, except you can get the array info without having an additional pointer. The cost of lookup for the block info is mitigated by the cache, making lookups cost very little. The patch is a compromise between performance for appending and performance for everything else. I think there will still be a need in specialized code for a fatter array type that allows appending without complicated GC lookups, but the patch provides safe appending for a reasonable performance hit without affecting other aspects of array usage. As with all compromises, there are always different ways of doing things, so time will tell if this patch is the best method. -Steve
Dec 24 2009
parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 grauzone Wrote:
 
 Steven Schveighoffer wrote:
 The problem is that adding a capacity field only works if the object is a
reference type.  A value type will make either choose to make copies of the
capacity field, resulting in the same stomping problem (and maybe worse), or
will zero out one of the copy's capacity field, resulting in an artificial
limitation for appending.  The only true way to make this work right is to have
a capacity field shared among all the aliases to the same data, and the most
logical place for that is in the allocated block.

but I'm not sure how or if they handle this situation. Then, why use a cache to find the pointer to the capacity field inside the array memory block? You could just store a pointer to the capacity field in the slice. Something like this, I guess: struct slice { void* ptr; size_t length; array_info* array; } [snip] Would this work?

Yes, but the problem I see with it is you are passing around that array_info pointer with the slice for *all* slices, not just ones you intend to use appending on. There is plenty of code that uses arrays and slices without ever doing appending, so that extra pointer is wasted space, and also increases the cost of passing slices by value.

Suddenly having an additional field in the slice is the problem? Wasn't semantics the problem (see your first reply to me above in this thread). Is having an additional field in the slice really a problem? You could just provide a "light" slice type. This wouldn't have the additional helper field, and would consist only of a pointer and length field. You could make it implicitly convertible to normal slices. These light slices could be used by someone who needs the additional performance so badly. Actually, they could simply be implemented as library type. I see no problem with adding an additional field to usual slices. There's also some benefit: clearer semantics (if you make guaranteed performance part of the semantics) and a much simpler runtime.
 Your idea is essentially the idea behind my patch, except you can get the
array info without having an additional pointer.  The cost of lookup for the
block info is mitigated by the cache, making lookups cost very little.

But the cache is volatile, and a cache entry could be easily flushed by doing many array append operations. A single string processing function could replace the entire cache with entries for string temporaries, and cause nasty performance regressions in unrelated parts of the program (e.g. in places where you really don't want each append operation to reallocate the full array). The worst is, this behavior will be relatively indeterministic. It's also hard to explain to "normal" programmers. Also, isn't it completely useless? If the programmer wants to append to an array, he'll use an append struct just to be sure, and if he doesn't want to append, inefficient array slice appending wouldn't be a problem. You could avoid these issues just by adding a new field to the slice struct. Needless to say, T[new] would be the correct solution here. It also has cleaner semantics. Yes, I heard it sucked and it was considered a failure by Walter/Andrei, but we still don't know why.
 The patch is a compromise between performance for appending and performance
for everything else.  I think there will still be a need in specialized code
for a fatter array type that allows appending without complicated GC lookups,
but the patch provides safe appending for a reasonable performance hit without
affecting other aspects of array usage.  As with all compromises, there are
always different ways of doing things, so time will tell if this patch is the
best method.
 
 -Steve

Dec 24 2009
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
grauzone Wrote:

 Steven Schveighoffer wrote:
 Yes, but the problem I see with it is you are passing around that array_info
pointer with the slice for *all* slices, not just ones you intend to use
appending on.  There is plenty of code that uses arrays and slices without ever
doing appending, so that extra pointer is wasted space, and also increases the
cost of passing slices by value.

Suddenly having an additional field in the slice is the problem? Wasn't semantics the problem (see your first reply to me above in this thread).

In my original reply, I focused on the fact that your original solution *doesn't work*. Yes, it also suffered from the extra baggage, but that was a lesser problem.
 
 Is having an additional field in the slice really a problem?
 
 You could just provide a "light" slice type. This wouldn't have the 
 additional helper field, and would consist only of a pointer and length 
 field. You could make it implicitly convertible to normal slices.

Yeah, all this is possible. Whether it's better is subject to opinion or testing.
 I see no problem with adding an additional field to usual slices. 
 There's also some benefit: clearer semantics (if you make guaranteed 
 performance part of the semantics) and a much simpler runtime.

Simpler runtime is not any benefit IMO. Nobody cares how array appending works, they just know the semantics. The semantics wouldn't be any different with your type, you still have all the same semantics, so I don't really get that point.
 Your idea is essentially the idea behind my patch, except you can get the
array info without having an additional pointer.  The cost of lookup for the
block info is mitigated by the cache, making lookups cost very little.

But the cache is volatile, and a cache entry could be easily flushed by doing many array append operations. A single string processing function could replace the entire cache with entries for string temporaries, and cause nasty performance regressions in unrelated parts of the program (e.g. in places where you really don't want each append operation to reallocate the full array).

I'm not sure you understand the semantics. When an array's block info is in the cache, no lookup of the block info from the GC is required, so the lock isn't taken, and the lookup is as simple as a linear search through 8 elements. But when it is not in the cache, it is looked up via the GC, requiring a lock and a look through all the memory pools (I'm unsure of the performance, but it clearly is slower). However, a cache miss does not mean an automatic reallocation, it just means the lookup of the block info is slower. It can *still* be extended into the block. Basically, the algorithm performance of appending with a cache miss is the same as a cache hit, just with a larger constant.
 The worst is, this behavior will be 
 relatively indeterministic. It's also hard to explain to "normal" 
 programmers.
 
 Also, isn't it completely useless? If the programmer wants to append to 
 an array, he'll use an append struct just to be sure, and if he doesn't 
 want to append, inefficient array slice appending wouldn't be a problem.

Again, I think these points may be fostered by a misunderstanding of the patch. The performance is much better than current performance, and people have had no problem dealing with the current runtime except in special circumstances. -Steve
Dec 24 2009
parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 I see no problem with adding an additional field to usual slices. 
 There's also some benefit: clearer semantics (if you make guaranteed 
 performance part of the semantics) and a much simpler runtime.

Simpler runtime is not any benefit IMO. Nobody cares how array appending works, they just know the semantics. The semantics wouldn't be any different with your type, you still have all the same semantics, so I don't really get that point.

Maybe...
 Your idea is essentially the idea behind my patch, except you can get the
array info without having an additional pointer.  The cost of lookup for the
block info is mitigated by the cache, making lookups cost very little.

doing many array append operations. A single string processing function could replace the entire cache with entries for string temporaries, and cause nasty performance regressions in unrelated parts of the program (e.g. in places where you really don't want each append operation to reallocate the full array).

I'm not sure you understand the semantics. When an array's block info is in the cache, no lookup of the block info from the GC is required, so the lock isn't taken, and the lookup is as simple as a linear search through 8 elements. But when it is not in the cache, it is looked up via the GC, requiring a lock and a look through all the memory pools (I'm unsure of the performance, but it clearly is slower). However, a cache miss does not mean an automatic reallocation, it just means the lookup of the block info is slower. It can *still* be extended into the block. Basically, the algorithm performance of appending with a cache miss is the same as a cache hit, just with a larger constant.

Sorry about that, I didn't have a close look at the patch. I guess I was more thinking about Andrei's original proposal (and how I thought it may be implemented). It seems you store the length field inside the array's memory block (instead of the cache, which just speeds up gc_query). That's awesome, but I'm going to complain again: now you have to keep a length field for *all* memory allocations, not only arrays! For most object allocations, this means 4 more bytes additional overhead. Also, if you use GC.malloc directly, and the user tries to append to slices to it, your code may break. GC.malloc doesn't seem to pad the memory block with a length field. I must say that I find your argumentation strange: didn't you say adding an additional field to the slice struct is too much overhead? Also, a solution which keeps the pointer to the array length in the slice struct would still be faster. The cache lookup cost is not zero. Anyway, now that semantics and performance are somewhat sane, none of these remaining issues are too important. Thanks Steve and Andrei!
 The worst is, this behavior will be 
 relatively indeterministic. It's also hard to explain to "normal" 
 programmers.

 Also, isn't it completely useless? If the programmer wants to append to 
 an array, he'll use an append struct just to be sure, and if he doesn't 
 want to append, inefficient array slice appending wouldn't be a problem.

Again, I think these points may be fostered by a misunderstanding of the patch. The performance is much better than current performance, and people have had no problem dealing with the current runtime except in special circumstances. -Steve

Dec 24 2009
parent grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 Also, if you use GC.malloc directly, and the user tries to append to 
 slices to it, your code may break. GC.malloc doesn't seem to pad the 
 memory block with a length field.

Yes, this is a possible problem. However, using GC.malloc directly and then treating the result as a normal array is probably extremely rare. At least it is not valid for safe D. It probably should be explained as a danger in GC.malloc, but I don't think this will adversely affect most code. There will be functions that should obviate the need for calling GC.malloc to allocate an array (specifically, allocating uninitialized data).

Maybe the whole "used size of memory block" (aka array length) thing should be built directly into the GC. Leaving GC.malloc this way would be a bit "shaky", and asks only for strange Heisenbugs.
 I must say that I find your argumentation strange: didn't you say 
 adding an additional field to the slice struct is too much overhead?

Overhead when passing around, not overhead for storage. For example, pushing 12 bytes onto the stack instead of 8 when calling a function with a slice. If you want safe appends, you need to store the allocation length somewhere, there's no way around that.

But would it really matter outside of microbrenchmarks? In cases where it _really_ matters, even slices with two members may be overhead, and programmers will split the slice into separate variables anyway as they optimize their code.
 developing stuff for Tango (Tango uses slices to get every ounce of 
 performance!), I'm convinced that as-fast-as-possible slice semantics 
 for passing around data is essential.

I've often seen Tango code mess with pointers directly, where slices could have been used for clearer, more readable code. I assume this was to get more performance.
Dec 25 2009
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Leandro Lucarella Wrote:

 This is making the half-value, half-reference semantics of arrays even
 worse, since now the capacity is "passed by reference" (at the end of the
 array data) but the length and ptr as values.

This is absolutely false. It does not make anything worse. It may not make it as good as you want, but it definitely is better than the current situation. The current situation is that the capacity is "invented" by the runtime to be exactly what you need if the slice is at the beginning of a block, or 0 if it's not. Such a scheme is totally unsafe. The proposed patch leaves all aspects of arrays and slices alone *except* for this one thing.
 
 I don't comment much about the patch (and the design) because I already
 did so in the several threads about it. I think is a very bad idea,
 I think dynamic arrays should be a proper reference type (with ptr, length
 and capacity) and slices should be a value type without appending (only
 with ptr and length).

I don't agree. I like that slices and arrays are the same type, it allows for much cleaner code, and much less function duplication. I see the "append extends into a block" as an optimization, one that currently is invalid, and with the patch is valid. Disallowing appending to slices means you must generate arrays whenever you want to do appending (which copies the entire slice), incurring a higher cost for little gain. I'd actually say no gain. Do you have an example where such a scheme is better than arrays with my patch? Note that nobody is stopping you from making such array types, you are free to add them. -Steve
Dec 24 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 Leandro Lucarella Wrote:
 
 This is making the half-value, half-reference semantics of arrays
 even worse, since now the capacity is "passed by reference" (at the
 end of the array data) but the length and ptr as values.

This is absolutely false. It does not make anything worse. It may not make it as good as you want, but it definitely is better than the current situation. The current situation is that the capacity is "invented" by the runtime to be exactly what you need if the slice is at the beginning of a block, or 0 if it's not. Such a scheme is totally unsafe. The proposed patch leaves all aspects of arrays and slices alone *except* for this one thing.
 I don't comment much about the patch (and the design) because I
 already did so in the several threads about it. I think is a very
 bad idea, I think dynamic arrays should be a proper reference type
 (with ptr, length and capacity) and slices should be a value type
 without appending (only with ptr and length).

I don't agree. I like that slices and arrays are the same type, it allows for much cleaner code, and much less function duplication. I see the "append extends into a block" as an optimization, one that currently is invalid, and with the patch is valid. Disallowing appending to slices means you must generate arrays whenever you want to do appending (which copies the entire slice), incurring a higher cost for little gain. I'd actually say no gain. Do you have an example where such a scheme is better than arrays with my patch? Note that nobody is stopping you from making such array types, you are free to add them.

At the risk of restarting the debate around T[new], let me note that Steve's intuition corroborates with our experience when we tried to define T[new]. Andrei
Dec 24 2009
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Leandro Lucarella Wrote:

 Steven Schveighoffer, el 24 de diciembre a las 10:15 me escribiste:
 This is absolutely false.  It does not make anything worse.  It may not
 make it as good as you want, but it definitely is better than the
 current situation.

I meant from the language semantics POV (you have more rules to remember about arrays), I agree it makes things better from a practical POV.

In fact, semantics are simpler. You can completely eliminate the explanation about stomping in the spec.
 I don't comment much about the patch (and the design) because I already
 did so in the several threads about it. I think is a very bad idea,
 I think dynamic arrays should be a proper reference type (with ptr, length
 and capacity) and slices should be a value type without appending (only
 with ptr and length).

I don't agree. I like that slices and arrays are the same type, it allows for much cleaner code, and much less function duplication. I see the "append extends into a block" as an optimization, one that currently is invalid, and with the patch is valid. Disallowing appending to slices means you must generate arrays whenever you want to do appending (which copies the entire slice), incurring a higher cost for little gain. I'd actually say no gain. Do you have an example where such a scheme is better than arrays with my patch?


No example?
 I said this several times but here it comes again: We agree that the
 current scheme is better in terms of performance than having a proper
 reference type dynamic array and a separated slice value type.
 
 The thing is, I think optimization is nice as long as it doesn't leak the
 abstractions to the user. The current scheme is clearly this scenario, to
 make an optimization (one that problably a lot of people wouldn't need),
 you are making things harder for the user and creating a monster half
 array, half slice that is harder to understand than 2 smaller and simpler
 types.

Really? You think splitting the features into 2 different types with several overlapping features is easier to explain? I think that is why Andrei nixed the T[new] idea, because when he tried to explain it in TDPL, it was very unnatural.
 You think this performance gain is more important than the added
 complexity, I think the other way. You think D should be *by default* as
 fast as possible, even when making the programmers life a little harder,
 and let the user create or use library types for easy use and I think it
 should be the exact opposite, make *default* things easier and not as
 fast, and let the user create/user library types if they need top
 performance/optimizations.

I think the whole basis of your argument is invalid -- the current array/slice combination type is easier to use and explain *and* performs better. -Steve
Dec 24 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Steven Schveighoffer, el 24 de diciembre a las 12:59 me escribiste:
 Leandro Lucarella Wrote:

 Steven Schveighoffer, el 24 de diciembre a las 10:15 me escribiste:
 This is absolutely false.  It does not make anything worse.  It may not
 make it as good as you want, but it definitely is better than the
 current situation.

about arrays), I agree it makes things better from a practical POV.

explanation about stomping in the spec.

What about the cache? You have to explain to the users that there is a possibility that their appends works blazing fast, except... well, sometimes if you fill the cache they will suck. As gruzone said, find this kind of weird behaviours will be very hard.

Fluctuations in the speed of an operation are commonplace and don't require explanation for the putative user. Caches exist in many places in today's computers, and I don't see how all of a sudden this one cache sticks like a sore thumb.
 You think splitting the features into 2 different types with
 several overlapping features is easier to explain?

Yes. And I think they have overlapping features just like any other 2 containers, hashes and arrays have overlapping features, but they are completely different beasts. Same goes for arrays and slices.

The problem with T[new] vs. T[] was that it was very difficult to explain succintly when and how each should be used. I had two tables in TDPL and they were for the most part identical - I had to highlight the subtle differences. It's a far cry from arrays vs. hashes.
 PHP have arrays and hashes combined in one type. You think that's a good
 idea because arrays and hashes have overlapping features? I really don't,
 and I do think arrays and slices are 2 completely different beasts that
 needs to be separated in the language. I think that's the natural good
 thing to do, not an optimization. I think D is broken by combining this
 2 types into one. You don't think that (and it looks like Andrei and
 Walter doesn't think that either) so there's no way we would agree on this
 topic.

I also think PHP's was not a brilliant idea, but I also think there's no comparison. There could be a way of agreeing if you came with a solid argument. For example in the thread about dropping binding "this" to an rvalue I had to acquiesce in wake of good counterexamples. Also, in the thread on "typedef" there was a good level of consensus that the feature doesn't pull its weight. Andrei
Dec 24 2009
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
Leandro Lucarella Wrote:

 Steven Schveighoffer, el 24 de diciembre a las 12:59 me escribiste:
 In fact, semantics are simpler.  You can completely eliminate the
 explanation about stomping in the spec.

What about the cache? You have to explain to the users that there is a possibility that their appends works blazing fast, except... well, sometimes if you fill the cache they will suck. As gruzone said, find this kind of weird behaviours will be very hard.

I don't think you do. We don't explain currently in the spec that if you append to 2 arrays in lockstep, performance drops off a cliff, I don't see why we should have to explain rare corner cases for the new code either. BTW, after you start appending to several arrays in lockstep, the bottleneck seems to shift away from the append function and to something else -- no amount of cache helps. I think its because the array pages are all interleaving and so cannot be extended.
 No example?

No, I said that I think arrays/slices as they are have better performance (when you know exactly what you're doing and pre-allocate and do that kind of tricks). Unless you are asking for some other kind of examples?

I meant *any* kind of gain -- performance or otherwise. If you have no examples where separated slices perform better or are not as awkward as combined slices/arrays, then I don't see why you'd want them. What I'm looking for is: This code sucks with combined arrays/slices: .... Now, look how good it is with split arrays/slices: ....
 You think splitting the features into 2 different types with
 several overlapping features is easier to explain?

Yes. And I think they have overlapping features just like any other 2 containers, hashes and arrays have overlapping features, but they are completely different beasts. Same goes for arrays and slices. PHP have arrays and hashes combined in one type. You think that's a good idea because arrays and hashes have overlapping features?

I hate to say this, and it has nothing to do with this argument, but I'm actually pretty impressed with php's arrays. The rest of php, not so much. They perform like hashes, but are ordered and sortable. I tried to find out how they are implemented, but the code is so buried in the zend engine that I didn't know where to look.
 I really don't,
 and I do think arrays and slices are 2 completely different beasts that
 needs to be separated in the language. I think that's the natural good
 thing to do, not an optimization. I think D is broken by combining this
 2 types into one.

They are only different because you say they are. To me they are no different. In fact, I don't even think we need to distinguish between arrays and slices, they should just all be arrays. The result of slicing an array is -- another array. I don't see the issue with that.
 You don't think that (and it looks like Andrei and
 Walter doesn't think that either) so there's no way we would agree on this
 topic.

Probably not.
 I think that is why Andrei nixed the T[new] idea, because when he tried
 to explain it in TDPL, it was very unnatural.

Well, I don't know what Andrei tried to do, but I think Python does pretty well with it, for example.

I don't have any experience with python, so I can't really say anything about it.
 I think the whole basis of your argument is invalid -- the current
 array/slice combination type is easier to use and explain *and* performs
 better.

I can say exactly the same about you. That's why I said I don't argue anymore about this (grrr! I don't know why I ended up doing it in this thread). This is a matter of taste, you think array/slice combined are easier to use and I don't.

I'd love to see an example where you think this is the case. If it's simply a matter of taste, then you have no argument. If that's the case, it's just a personal choice, and at that point it's up to Walter.
 About performance, they only spot where
 splitting arrays and slices would perform worse than the current
 slice/arrays combined is in that you add an extra indirection for arrays
 because of the reference semantics (but you need to pass around a word
 less than with slices, so I don't know how the real impact of that would
 be).

If you were to make arrays a true reference type, I don't think you would have an allocated block of metadata pointing to a separate allocated block of data, I'd expect all to be in one block. In that case, you don't have 2 levels of indirection. I don't necessarily think that reference based arrays would perform much worse than the current scheme for normal usages, but the performance problem I see is when you want to append to a slice, you not only have the runtime reallocating when it may not need to (if the slice is at the end of an allocated block), but you have to spell out that you're changing the slice to an array by changing the type *before* appending. Then it's on you to make sure the reallocation is necessary. e.g. if you are appending one slice to another, and the slice being appended is 0 length at runtime, in order to avoid a costly reallocation, you have to check for this at runtime. I just don't see why you think it's easier. -Steve
Dec 25 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Steven Schveighoffer wrote:
 All,

 I created a new patched druntime that prevents array stomping and at 
 the same time increases append performance for thread-local array 
 appending.

 The patch is attached to bugzilla entry 3637. ( 
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

Nobody cares about it? (Except someone commenting in bugzilla, but where's Andrei and Walter?)

The entire Phobos team discussed this change very thoroughly. We are all very excited about it.
 Also, what again were the arguments against adding a "capacity" field in 
 the slice struct to deal with the append problem?
 
 I think Go does this.

But Go is unfinished. When they'll finish it, they'll remove the capacity field :o). Stop being a curmudgeon and give us some love. Andrei
Dec 23 2009
parent reply grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 grauzone wrote:
 Steven Schveighoffer wrote:
 All,

 I created a new patched druntime that prevents array stomping and at 
 the same time increases append performance for thread-local array 
 appending.

 The patch is attached to bugzilla entry 3637. ( 
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

Nobody cares about it? (Except someone commenting in bugzilla, but where's Andrei and Walter?)

The entire Phobos team discussed this change very thoroughly. We are all very excited about it.

Did anyone try it and make initial tests?
 Also, what again were the arguments against adding a "capacity" field 
 in the slice struct to deal with the append problem?

 I think Go does this.

But Go is unfinished. When they'll finish it, they'll remove the capacity field :o).

I believe not even you can predict the future.
 Stop being a curmudgeon and give us some love.

Sorry. dmd compiler bugs made me this way. I mean no harm.
 
 Andrei

Dec 23 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

The new code beats the old runtime in both tests<

Are you testing Phobos or Tango? Because I think Tango may have a performance bug there that's missing in Phobos (another even worse performance bug that I think is present in Tango is on the isIn_r of built in associative arrays). Bye, bearophile
Dec 24 2009
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
bearophile Wrote:

 Steven Schveighoffer:
 
The new code beats the old runtime in both tests<

Are you testing Phobos or Tango? Because I think Tango may have a performance bug there that's missing in Phobos (another even worse performance bug that I think is present in Tango is on the isIn_r of built in associative arrays).

D2, so phobos. -Steve
Dec 25 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Steven Schveighoffer, el 23 de diciembre a las 08:44 me escribiste:
 grauzone Wrote:
 
 Steven Schveighoffer wrote:
 All,
 
 I created a new patched druntime that prevents array stomping and at the 
 same time increases append performance for thread-local array appending.
 
 The patch is attached to bugzilla entry 3637. ( 
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

Nobody cares about it? (Except someone commenting in bugzilla, but where's Andrei and Walter?)

I have already discussed this patch with Andrei and Walter (and the rest of the compiler/phobos crew), they are on board. I am a little disappointed with the response here, but I guess it could be attributed to the barrier to experimentation (you have to apply the patch and compile the runtime) and the likelihood that many are on vacation right now (as I will be as of tonight).
 Also, what again were the arguments against adding a "capacity" field in 
 the slice struct to deal with the append problem?

The problem is that adding a capacity field only works if the object is a reference type. A value type will make either choose to make copies of the capacity field, resulting in the same stomping problem (and maybe worse), or will zero out one of the copy's capacity field, resulting in an artificial limitation for appending. The only true way to make this work right is to have a capacity field shared among all the aliases to the same data, and the most logical place for that is in the allocated block.

This is making the half-value, half-reference semantics of arrays even worse, since now the capacity is "passed by reference" (at the end of the array data) but the length and ptr as values. I don't comment much about the patch (and the design) because I already did so in the several threads about it. I think is a very bad idea, I think dynamic arrays should be a proper reference type (with ptr, length and capacity) and slices should be a value type without appending (only with ptr and length). -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- You can do better than me. You could throw a dart out the window and hit someone better than me. I'm no good! -- George Constanza
Dec 23 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Steven Schveighoffer, el 24 de diciembre a las 10:15 me escribiste:
 Leandro Lucarella Wrote:
 
 This is making the half-value, half-reference semantics of arrays even
 worse, since now the capacity is "passed by reference" (at the end of the
 array data) but the length and ptr as values.

This is absolutely false. It does not make anything worse. It may not make it as good as you want, but it definitely is better than the current situation.

I meant from the language semantics POV (you have more rules to remember about arrays), I agree it makes things better from a practical POV.
 I don't comment much about the patch (and the design) because I already
 did so in the several threads about it. I think is a very bad idea,
 I think dynamic arrays should be a proper reference type (with ptr, length
 and capacity) and slices should be a value type without appending (only
 with ptr and length).

I don't agree. I like that slices and arrays are the same type, it allows for much cleaner code, and much less function duplication. I see the "append extends into a block" as an optimization, one that currently is invalid, and with the patch is valid. Disallowing appending to slices means you must generate arrays whenever you want to do appending (which copies the entire slice), incurring a higher cost for little gain. I'd actually say no gain. Do you have an example where such a scheme is better than arrays with my patch? Note that nobody is stopping you from making such array types, you are free to add them.

I said this several times but here it comes again: We agree that the current scheme is better in terms of performance than having a proper reference type dynamic array and a separated slice value type. The thing is, I think optimization is nice as long as it doesn't leak the abstractions to the user. The current scheme is clearly this scenario, to make an optimization (one that problably a lot of people wouldn't need), you are making things harder for the user and creating a monster half array, half slice that is harder to understand than 2 smaller and simpler types. You think this performance gain is more important than the added complexity, I think the other way. You think D should be *by default* as fast as possible, even when making the programmers life a little harder, and let the user create or use library types for easy use and I think it should be the exact opposite, make *default* things easier and not as fast, and let the user create/user library types if they need top performance/optimizations. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- MATAN AL PERRO: DICEN QUE ESTABA POSEIDO POR EL DEMONIO... -- Crónica TV
Dec 24 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 24 Dec 2009 13:39:21 -0500, grauzone <none example.net> wrote:


 Sorry about that, I didn't have a close look at the patch. I guess I was  
 more thinking about Andrei's original proposal (and how I thought it may  
 be implemented).

No problem.
 It seems you store the length field inside the array's memory block  
 (instead of the cache, which just speeds up gc_query). That's awesome,  
 but I'm going to complain again: now you have to keep a length field for  
 *all* memory allocations, not only arrays! For most object allocations,  
 this means 4 more bytes additional overhead.

Interestingly enough, the storage overhead is zero except for memory blocks > 256 bytes. I'll explain: A probably not well-known piece of trivia is that D allocates 1 extra byte for arrays. Why would it do this you say? Because of the GC. Imagine that it does not do this, what happens when you do something like this: ubyte[] array = new ubyte[16]; // allocates a 16-byte block for the array array = array[$..$]; If you look at array's ptr member, it now no longer points to the allocated block, but the next block. Although it isn't important for the GC to keep around the allocated block any more, it now will keep the next block from being collected. In addition, if you tried appending to array, it might start using unallocated memory! So the byte at the end already was in use. I sort of commandeered it for length use. For blocks up to and including 256 bytes, I use the last byte of the block as the length storage. For blocks of 512 to a half page (2048 bytes), I use the last 2 bytes, so there is one extra overhead byte compared to the current implementation. Blocks larger than that follow different rules, they are not stored in bins, but just a whole page at a time. With those blocks, it is possible to extend the block by adding more pages if they are free, so it's not OK to store the length at the end of the block, since the end of the block may change. So I store at the beginning. I use up a full 8 bytes, and the reason is alignment, I don't know what you're putting in the array, so I must keep the data 8 byte-aligned. For classes, I allocate the required extra data as if the class were an array of the class data size, and then set the "ghost" length at the end of the block to 0. If a class data exceeds a half page, the ghost length is at the beginning, where the vtable ptr is, so it's extremely unlikely to accidentally match that length. Note that it doesn't make a huge difference in most cases because the block used for the class is a power of 2 anyways, so in most cases there's plenty of wasted space at the end. I found out during testing that allocating a new struct is equivalent to allocating a new array of that struct of size 1, and returning it's pointer, so that aspect is already covered.
 Also, if you use GC.malloc directly, and the user tries to append to  
 slices to it, your code may break. GC.malloc doesn't seem to pad the  
 memory block with a length field.

Yes, this is a possible problem. However, using GC.malloc directly and then treating the result as a normal array is probably extremely rare. At least it is not valid for safe D. It probably should be explained as a danger in GC.malloc, but I don't think this will adversely affect most code. There will be functions that should obviate the need for calling GC.malloc to allocate an array (specifically, allocating uninitialized data).
 I must say that I find your argumentation strange: didn't you say adding  
 an additional field to the slice struct is too much overhead?

Overhead when passing around, not overhead for storage. For example, pushing 12 bytes onto the stack instead of 8 when calling a function with a slice. If you want safe appends, you need to store the allocation length somewhere, there's no way around that.
 Also, a solution which keeps the pointer to the array length in the  
 slice struct would still be faster. The cache lookup cost is not zero.

This is the trade-off I chose between performance when passing around slices and using them and performance for appending. If you focus all your performance concerns on appending, then other areas suffer. I don't know if what I chose will be the best solution, but I can't think of any way to have *both* speedy slice usage and near-zero append overhead. If someone can think of a better solution, I'll be happy to incorporate it, but after using and understanding slices while developing stuff for Tango (Tango uses slices to get every ounce of performance!), I'm convinced that as-fast-as-possible slice semantics for passing around data is essential. This is where a custom type that focuses performance on appending at the expense of other functions is good to have. I think such applications where you need the absolute best append performance without any other functionality are pretty rare.
 Anyway, now that semantics and performance are somewhat sane, none of  
 these remaining issues are too important. Thanks Steve and Andrei!

Cool! -Steve
Dec 24 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 23 Dec 2009 09:54:20 -0500, grauzone <none example.net> wrote:

 Andrei Alexandrescu wrote:
 grauzone wrote:
 Steven Schveighoffer wrote:
 All,

 I created a new patched druntime that prevents array stomping and at  
 the same time increases append performance for thread-local array  
 appending.

 The patch is attached to bugzilla entry 3637. (  
 http://d.puremagic.com/issues/show_bug.cgi?id=3637 )

Nobody cares about it? (Except someone commenting in bugzilla, but where's Andrei and Walter?)

all very excited about it.

Did anyone try it and make initial tests?

I did (obviously). I tried both a simple "append to N arrays in a loop" test and also the affinity test posted by dsimcha in an earlier post. The new code beats the old runtime in both tests *and* performs better on the affinity test with multiple cores than it does with a single core. Tests were done on linux. I realize testing my own code doesn't mean much, but I think the patch is close to being assimilated, and everyone can try it out at that point. I'd be very interested to see if others can find holes in the design. I don't have the numbers in front of me right now, but next week, I'll try to post some. AFAIK, only Sean tried to get it working, but had trouble. I was mostly concerned with if the patch *worked* on Windows and OSX, since I didn't have a mac or a Windows box to try it out. Since then, I've successfully built and tested both OSX and Windows. -Steve
Dec 24 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Steven Schveighoffer, el 24 de diciembre a las 12:59 me escribiste:
 Leandro Lucarella Wrote:
 
 Steven Schveighoffer, el 24 de diciembre a las 10:15 me escribiste:
 This is absolutely false.  It does not make anything worse.  It may not
 make it as good as you want, but it definitely is better than the
 current situation.

I meant from the language semantics POV (you have more rules to remember about arrays), I agree it makes things better from a practical POV.

In fact, semantics are simpler. You can completely eliminate the explanation about stomping in the spec.

What about the cache? You have to explain to the users that there is a possibility that their appends works blazing fast, except... well, sometimes if you fill the cache they will suck. As gruzone said, find this kind of weird behaviours will be very hard.
 I don't comment much about the patch (and the design) because I already
 did so in the several threads about it. I think is a very bad idea,
 I think dynamic arrays should be a proper reference type (with ptr, length
 and capacity) and slices should be a value type without appending (only
 with ptr and length).

I don't agree. I like that slices and arrays are the same type, it allows for much cleaner code, and much less function duplication. I see the "append extends into a block" as an optimization, one that currently is invalid, and with the patch is valid. Disallowing appending to slices means you must generate arrays whenever you want to do appending (which copies the entire slice), incurring a higher cost for little gain. I'd actually say no gain. Do you have an example where such a scheme is better than arrays with my patch?


No example?

No, I said that I think arrays/slices as they are have better performance (when you know exactly what you're doing and pre-allocate and do that kind of tricks). Unless you are asking for some other kind of examples?
 I said this several times but here it comes again: We agree that the
 current scheme is better in terms of performance than having a proper
 reference type dynamic array and a separated slice value type.
 
 The thing is, I think optimization is nice as long as it doesn't leak the
 abstractions to the user. The current scheme is clearly this scenario, to
 make an optimization (one that problably a lot of people wouldn't need),
 you are making things harder for the user and creating a monster half
 array, half slice that is harder to understand than 2 smaller and simpler
 types.

Really?

Yes.
 You think splitting the features into 2 different types with
 several overlapping features is easier to explain?

Yes. And I think they have overlapping features just like any other 2 containers, hashes and arrays have overlapping features, but they are completely different beasts. Same goes for arrays and slices. PHP have arrays and hashes combined in one type. You think that's a good idea because arrays and hashes have overlapping features? I really don't, and I do think arrays and slices are 2 completely different beasts that needs to be separated in the language. I think that's the natural good thing to do, not an optimization. I think D is broken by combining this 2 types into one. You don't think that (and it looks like Andrei and Walter doesn't think that either) so there's no way we would agree on this topic.
 I think that is why Andrei nixed the T[new] idea, because when he tried
 to explain it in TDPL, it was very unnatural.

Well, I don't know what Andrei tried to do, but I think Python does pretty well with it, for example.
 You think this performance gain is more important than the added
 complexity, I think the other way. You think D should be *by default* as
 fast as possible, even when making the programmers life a little harder,
 and let the user create or use library types for easy use and I think it
 should be the exact opposite, make *default* things easier and not as
 fast, and let the user create/user library types if they need top
 performance/optimizations.

I think the whole basis of your argument is invalid -- the current array/slice combination type is easier to use and explain *and* performs better.

I can say exactly the same about you. That's why I said I don't argue anymore about this (grrr! I don't know why I ended up doing it in this thread). This is a matter of taste, you think array/slice combined are easier to use and I don't. About performance, they only spot where splitting arrays and slices would perform worse than the current slice/arrays combined is in that you add an extra indirection for arrays because of the reference semantics (but you need to pass around a word less than with slices, so I don't know how the real impact of that would be). Besides that, you can make both schemes as fast, I think. It all depends on how you use those types. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Y2K <Aztech_> hmm, nothing major has happend, what an anticlimax <CaPS> yeah <CaPS> really sucks <CaPS> I expected for Australia to sink into the sea or something <CaPS> but nnoooooooo
Dec 24 2009
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Steven Schveighoffer, el 25 de diciembre a las 06:07 me escribiste:
 PHP have arrays and hashes combined in one type. You think that's a good
 idea because arrays and hashes have overlapping features?

I hate to say this, and it has nothing to do with this argument, but I'm actually pretty impressed with php's arrays. The rest of php, not so much.

OK, now is clearer than ever that we will never agree on this =)
 They perform like hashes, but are ordered and sortable.  I tried to find
 out how they are implemented, but the code is so buried in the zend
 engine that I didn't know where to look.

BTW, they are not "ordered", they preserve the "order of appending" (like lists or, well, arrays): $ php -a Interactive mode enabled <? $a = array(1,2); $a[9] = 9; $a[5] = 5; foreach ($a as $k => $v) printf("%s: %s\n", $k, $v); 0: 1 1: 2 9: 9 5: 5 I think they are a plain hash with a next pointer for the iteration, I don't think they have much more magic than that.
 I think the whole basis of your argument is invalid -- the current
 array/slice combination type is easier to use and explain *and* performs
 better.

I can say exactly the same about you. That's why I said I don't argue anymore about this (grrr! I don't know why I ended up doing it in this thread). This is a matter of taste, you think array/slice combined are easier to use and I don't.

I'd love to see an example where you think this is the case. If it's simply a matter of taste, then you have no argument. If that's the case, it's just a personal choice, and at that point it's up to Walter.

Exactly. That's what I'm saying about 3 mails ago. I just think it's easier to think in terms of arrays and slices separately (I know it is for me). And besides that, you can have simpler implementations, without making them very tightly coupled with the GC, but you already said too that for you is not a concern the runtime complexity.
 I just don't see why you think it's easier.

It's just how my mind work. I just like simpler concepts separated than a type that fit it all a la PHP. Maybe Python has something to do with it because it works like this. I learned Python after PHP and it seems much cleaner for me. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Le pedí que me enseñe a usar el mouse Pero solo quiere hablarme del Bauhaus Le pregunté si era chorra o rockera Me dijo "Gertrude Stein era re-tortillera" Me hizo mucho mal la cumbiera intelectual
Dec 25 2009