www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RFC: naming for FrontTransversal and Transversal ranges

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


I'm defining two ranges that go vertically over a range of ranges. 
Consider for example an int[][]. That's a range that has a range as its 
element type.

FrontTransversal goes transversally over the range of ranges and returns 
the front of each element in turn:

     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = frontTransversal(x);
     auto witness = [ 1, 3 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);

Then Transversal does a similar thing, just that instead of going 
through the front of each range, it goes over a column set during 
construction:

     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = transversal(x, 1);
     auto witness = [ 2, 4 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);

Question is, what would be two good names for these ranges?


Andrei
Apr 29 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I'm defining two ranges that go vertically over a range of ranges. 
 Consider for example an int[][]. That's a range that has a range as its 
 element type.

Maybe I am not understanding the true purpose of this, but isn't a general 2D/ND slicer much better? Bye, bearophile
Apr 29 2009
prev sibling next sibling parent reply BLS <windevguy hotmail.de> writes:
Andrei Alexandrescu wrote:
 Hello,
 
 
 I'm defining two ranges that go vertically over a range of ranges. 
 Consider for example an int[][]. That's a range that has a range as its 
 element type.
 
 FrontTransversal goes transversally over the range of ranges and returns 
 the front of each element in turn:
 
     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = frontTransversal(x);
     auto witness = [ 1, 3 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);
 
 Then Transversal does a similar thing, just that instead of going 
 through the front of each range, it goes over a column set during 
 construction:
 
     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = transversal(x, 1);
     auto witness = [ 2, 4 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);
 
 Question is, what would be two good names for these ranges?
 
 
 Andrei

Question is, how this fits into a collection/container package. ..So frankly, I don't worry about naming conventions. I am more concerned about ranges within a let's say dynamic d e queue. Lock free ADTs are a final proof of product , no ? Maybe I miss something. But fact is : D2 (Phobos) still don't have (not even in it's brand new incarnation ) support for simple humpty dumpty collections. So Proof of product is where ?
Apr 29 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
BLS wrote:
 Question is, how this fits into a collection/container package.
 
 ..So frankly, I don't worry about naming  conventions. I am more 
 concerned about ranges within a let's say dynamic d e queue.
 Lock free ADTs  are a final proof of product , no ?
 
 Maybe I miss something.
 But fact is : D2 (Phobos) still don't have (not even in it's brand new 
 incarnation ) support for simple humpty dumpty collections. So Proof of 
 product is where ?

You're not missing anything, except for perhaps a little patience :o). There is vivid discussion about container support in D2. I was actually thinking of posting a RFC sooner or later here. It is very exciting to realize that with minimal (and actually simplifying) changes to the language we can offer the option of garbage-collected vs. reference counted vs. "unsafe" semi-automatically managed memory. Moreover, I realized that the global allocation model only affects the built-in literal types such as T[] and T[U], but still allows using the other models as library types. It has become increasingly clear we'll need to support arrays in addition to slices. One big question mark for me is, should we define for arrays (and associative arrays and containers in general) value semantics, reference counted value semantics, or reference semantics? It is not clear to me yet what the best option is. Think of arrays as an archetypal example. 1. Value semantics (arrays are like int) + Simple to reason about + No long-distance sharing of state + Familiar to refugees coming from C++ with STL - Unfamiliar to Java/C# refugees + Relatively simple to implement + Good behavior as members inside structs and classes - Bad behavior as function parameters: if you forget to pass by reference, performance will be harmed 2. Value semantics with reference counting +- All tradeoffs of value semantics, except as noted below - Difficult to implement + Cheap to copy around, will "divide" only upon mutation attempts - Innocuous operations may throw/occasionally take a long time 3. Reference semantics - Hard to reason about: squirrel an array into an object, and way later and farther it modifies the array and effects a change at long distance - Long-distance effects may as well be some of the toughest bugs after memory errors + Cheap to copy + Familiar to Java/C# refugees - Less familiar to C++ refugees - Potentially inefficient as people might start sprinkling copies in the hope the eliminate bugs or chances of bugs + const system helps: if a function doesn't change something just mark it as const and you're home free + immutable also helps: immutable data can never change so it can be shared no problem. Not sure how interesting immutable containers are though. - Implementing fast/unsafe semantics with reference semantics effectively forces reference counting to be part of containers, which complicates implementation. Any thoughts, just share! For the record, Walter has a bias in favor of... I'll better don't bias you guys. One thing: it would be relatively easy to switch among semantics by defining a wrapper such as Value!(T) or Ref!(T). So whatever choice we make, it won't paint us into a corner. Andrei
Apr 29 2009
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 A lot of interesting stuff.

POLS (Principle of Least Surprise) says they should be reference types. That's what C, C++ (T[] doesn't look like it's using STL), C#, Java, Lua, Python, etc., etc. programmers will expect. Jumping out from behind a bush a surprising them with value semantics will probably not impress them. And don't say you can just document it: we get enough people coming in who refuse to read the docs until we tell them to. :P That said, having a Value!T template [1] that implements value-semantics would be good. I'll be honest with you: I'm not sure I'd use it for fear of performance issues. Perhaps making it a copy-on-write type under the hood would help with that ("it only copies if you mutate and someone else has a reference; it's fast AND safe!"). -- Daniel [1] I'd be tempted to call it Block!T as in "here's a block of data".
Apr 29 2009
prev sibling next sibling parent BLS <windevguy hotmail.de> writes:
Andrei ,
many thanks, ( not only because you are taking  the time to  explain ), 
I think I have to  say  “Thanks Andrei” because your answer was very 
gentle, ... ignoring all my grammar and (more important) logical 
mistakes... let me remain with a : You are listening and this is in my 
world : very generous.

Still : Container RFC ?
Björn
Apr 29 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 BLS wrote:
 Question is, how this fits into a collection/container package.
 
 ..So frankly, I don't worry about naming  conventions. I am more 
 concerned about ranges within a let's say dynamic d e queue.
 Lock free ADTs  are a final proof of product , no ?
 
 Maybe I miss something.
 But fact is : D2 (Phobos) still don't have (not even in it's brand new 
 incarnation ) support for simple humpty dumpty collections. So Proof of 
 product is where ?

You're not missing anything, except for perhaps a little patience :o). There is vivid discussion about container support in D2. I was actually thinking of posting a RFC sooner or later here. It is very exciting to realize that with minimal (and actually simplifying) changes to the language we can offer the option of garbage-collected vs. reference counted vs. "unsafe" semi-automatically managed memory. Moreover, I realized that the global allocation model only affects the built-in literal types such as T[] and T[U], but still allows using the other models as library types. It has become increasingly clear we'll need to support arrays in addition to slices. One big question mark for me is, should we define for arrays (and associative arrays and containers in general) value semantics, reference counted value semantics, or reference semantics? It is not clear to me yet what the best option is. Think of arrays as an archetypal example. 1. Value semantics (arrays are like int) + Simple to reason about + No long-distance sharing of state + Familiar to refugees coming from C++ with STL - Unfamiliar to Java/C# refugees + Relatively simple to implement + Good behavior as members inside structs and classes - Bad behavior as function parameters: if you forget to pass by reference, performance will be harmed 2. Value semantics with reference counting +- All tradeoffs of value semantics, except as noted below - Difficult to implement + Cheap to copy around, will "divide" only upon mutation attempts - Innocuous operations may throw/occasionally take a long time 3. Reference semantics - Hard to reason about: squirrel an array into an object, and way later and farther it modifies the array and effects a change at long distance - Long-distance effects may as well be some of the toughest bugs after memory errors + Cheap to copy + Familiar to Java/C# refugees - Less familiar to C++ refugees - Potentially inefficient as people might start sprinkling copies in the hope the eliminate bugs or chances of bugs + const system helps: if a function doesn't change something just mark it as const and you're home free + immutable also helps: immutable data can never change so it can be shared no problem. Not sure how interesting immutable containers are though. - Implementing fast/unsafe semantics with reference semantics effectively forces reference counting to be part of containers, which complicates implementation. Any thoughts, just share! For the record, Walter has a bias in favor of... I'll better don't bias you guys. One thing: it would be relatively easy to switch among semantics by defining a wrapper such as Value!(T) or Ref!(T). So whatever choice we make, it won't paint us into a corner. Andrei

I'm a C++ refugee and value semantics make no sense to me. I always declared classes (as part of function declaration) in C++ with a &... Either T& or const T&. There were plenty of bugs where I messed up and forgot the &. D is nice in that it tries to simplify that. You yourself had issues with remembering ref parameters when implementing ranges!
Apr 29 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 I'm a C++ refugee and value semantics make no sense to me. I always
 declared classes (as part of function declaration) in C++ with a &...
 Either T& or const T&. There were plenty of bugs where I messed up
 and forgot the &. D is nice in that it tries to simplify that. You
 yourself had issues with remembering ref parameters when implementing
 ranges!

Good point, but the issue boils down to this: do we define container members more often, or define container function parameters more often? Because inside a data structure value semantics are useful more often than not, and in a function parameter list reference semantics are useful more often than not. I do agree that D programmers grown used to cheap-to-pass data and I don't want to cure anyone of hiccups :o). Andrei
Apr 29 2009
parent Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 Jason House wrote:
 I'm a C++ refugee and value semantics make no sense to me. I always
 declared classes (as part of function declaration) in C++ with a &...
 Either T& or const T&. There were plenty of bugs where I messed up
 and forgot the &. D is nice in that it tries to simplify that. You
 yourself had issues with remembering ref parameters when implementing
 ranges!

Good point, but the issue boils down to this: do we define container members more often, or define container function parameters more often? Because inside a data structure value semantics are useful more often than not, and in a function parameter list reference semantics are useful more often than not. I do agree that D programmers grown used to cheap-to-pass data and I don't want to cure anyone of hiccups :o).

Will it be an incurable mess if you let the programmer choose each time? (Then we'd of course have this same discussion about which is the default. :-) ) But still, I can see situations where it is _obvious_ which should be used. Somebody into more functional style would have different needs than the more imperative guy. Not to mention more specific situations.
Apr 30 2009
prev sibling next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Andrei Alexandrescu wrote:
 1. Value semantics (arrays are like int)

Don't forget: + Supports timely destruction of contents (i.e. RAII).
 2. Value semantics with reference counting

I like this optimization and use it all the time in my own code, but I'm not convinced that it should be the default. It's also problematic in multithreaded situations. I think a generic CopyOnWrite wrapper over arbitrary value types would be more useful. CopyOnWrite!(int[]).
 3. Reference semantics

I'm strongly opposed to this option. Either of the other options would be acceptable. -- Rainer Deyke - rainerd eldwood.com
Apr 29 2009
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd eldwood.com> said:

 Andrei Alexandrescu wrote:
 2. Value semantics with reference counting

I like this optimization and use it all the time in my own code, but I'm not convinced that it should be the default. It's also problematic in multithreaded situations. I think a generic CopyOnWrite wrapper over arbitrary value types would be more useful. CopyOnWrite!(int[]).

But don't forget that D2 is going to have the shared keyword, possibly with thread-local heaps. So multithreading won't be much of a problem: if the variable is shared, the compiler should force locks as needed or use atomic operations; if the variable is local to a thread, locks should become no-ops and atomic operations regular ones, improving performance. Or at least that's what I'm getting from the hints we currently have.
 3. Reference semantics

I'm strongly opposed to this option. Either of the other options would be acceptable.

Andrei mentioned a couple of reasons against 3, which are yours? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 30 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Michel Fortin wrote:
 On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd eldwood.com> said:
 Andrei Alexandrescu wrote:
 3. Reference semantics

be acceptable.

Andrei mentioned a couple of reasons against 3, which are yours?

I agree with all of Andrei's reasons. In addition, there's this: suppose you have a struct containing a (mutable) array. When you make a copy of that struct, you will almost always want to make a copy of the contained array. Therefore value semantics should be the default, because it simplifies the most common use case. -- Rainer Deyke - rainerd eldwood.com
Apr 30 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com> 
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.

The problem is what to do for the containers in Phobos. Andrei
Apr 30 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 Now, back on topic. I actually make heavy use of what amounts to structs 
 containing arrays and don't want a deep copy >90% of the time. In fact, 
 I haven't used any of the (my array-my array) deep copy functions and I 
 only do copying when I have to convert my arrays to D style arrays or 
 other container types. Also, D arrays are structs that contain C style 
 arrays and I almost never dup them in final code. When I debug/prototype 
 I do tend to use []= and dup more often. So I'd disagree that you'd 
 almost always want to make a copy. Second, I don't think we should 
 simplify the most common use case, but instead the majority of use 
 cases. This is partly semantics, but partly not: everybody uses 
 containers differently.

It must be a matter of style. Having used both semantics extensively, I personally think one never looks back after using value containers. When you fail to pass by reference there's a performance hit and a logic bug (changes are not effected). Both are local issues that manifest immediately, close to the place of the mistake, and are trivial to fix. But undue sharing is a global, long-distance issue - one much harder to tackle. I agree that reference semantics is what people got used to in Java and other languages. For entity types, reference semantics make a ton of sense, but I think containers are not entity classes most of the time. But I also agree that through experience one can design around undue sharing. But it should be remembered that it took years to shake off security bugs in the JDK caused by undue sharing. (One that comes to mind was a sensitive linked list returned by a system API. User code could maliciously change it to mess with the system's security manager. Solution? They returned a compulsory copy of the list.) So I'm not sure where this leaves us. I hear two good arguments: * Some aren't bothered by reference semantics * We are used to reference semantics from other languages These are good arguments, but don't quite end the discussion.
 Also, Andrei, I would had preferred it if you had separated the 
 discussion on arrays from Phobos containers. Many languages have arrays 
 and a separate container library and I feel the decision of whether D's 
 arrays should behave like Phobos' containers would be best left to after 
 the best decision for Phobos is made.

*If* we add a new array built-in type, that will fundamentally affect Phobos because Phobos will design all containers to be like the pair array-slice. If we don't add a new built-in type, then I only have Phobos to discuss. Also something that wasn't discussed that much is the connection of whatever design we devise, with the GC. I am mightily excited by the possibility to operate without GC or with tight reference counting, and I thought many around here would share that excitement. If we go for non-gc ways of memory management, that will probably affect container design. Andrei
Apr 30 2009
parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Also something that wasn't discussed that much is the connection of 
 whatever design we devise, with the GC. I am mightily excited by the 
 possibility to operate without GC or with tight reference counting, and 
 I thought many around here would share that excitement. If we go for 
 non-gc ways of memory management, that will probably affect container 
 design.

Just out of curiosity, why do you like reference counting more than mark/sweep for containers? --benji
May 01 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Also something that wasn't discussed that much is the connection of 
 whatever design we devise, with the GC. I am mightily excited by the 
 possibility to operate without GC or with tight reference counting, 
 and I thought many around here would share that excitement. If we go 
 for non-gc ways of memory management, that will probably affect 
 container design.

Just out of curiosity, why do you like reference counting more than mark/sweep for containers?

Who told you I like it? All I do is acknowledge that RC and MS have different tradeoffs that may form the basis of a preference. Andrei
May 01 2009
prev sibling next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.

I hope you're not suggesting writing ctor/dtor/opAssign for every single struct that uses arrays as value types. It's possible to write a reference wrapper around a value type. It's also possible to write a value wrapper around a reference type. However, the former is both easier and more efficient than the latter. -- Rainer Deyke - rainerd eldwood.com
Apr 30 2009
next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 It's possible to write a reference wrapper around a value type.  It's
 also possible to write a value wrapper around a reference type.
 However, the former is both easier and more efficient than the latter.

Yes and no. Yes, the deep copy mixin is more complex to write the first time, but after that it's easy to use. And second, it's just as efficient as the later, since the deep copy mixin generates the same code as a value type.

A simple deep-copy struct wrapper around a class type is horribly inefficient compared to a direct struct type: - It uses an extra heap allocation per instance. - It uses an extra level of indirection. - It uses virtual function calls. - It allocates an extra reference to a virtual function table per instance. I suppose a very clever wrapper metaprogram or a really clever optimizing compiler might eliminate some of these inefficiencies. -- Rainer Deyke - rainerd eldwood.com
Apr 30 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
   - It uses an extra heap allocation per instance.

And during a deep copy, you're often making a lot of heap copies. (N vs N+1 when N>>1)

In the case of dynamic arrays, N = 1 (or possibly 0 if the small array optimization is used). In the case of static arrays, N = 0. Either way, the extra level of indirection is significant.
   - It allocates an extra reference to a virtual function table per
 instance.

Huh? Isn't that part of the extra heap allocation?

Extra allocations: 1. Extra memory usage: 2 words (virtual function table pointer, local pointer) plus heap overhead for one allocation.
 Also, a really important question is: is it possible to implement are
 shared, lock-free containers as value types?

Is it possible to implement them as reference types? Why would the issues different between reference types and value types? A reference type is just a reference wrapper around a value type. -- Rainer Deyke - rainerd eldwood.com
May 01 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the 
 algorithms I've read explicitly don't support copying, which is why I'm 
 asking.
 
 Why would the
 issues different between reference types and value types?

A value type has to support copying. Reference types don't. Perhaps I should have said value semantics / reference semantics instead.

I thought lock-free data structures are the quintessential COW type. You might as well just have provided the perfect argument against your case. Andrei
May 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Fri, 01 May 2009 15:07:17 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the 
 algorithms I've read explicitly don't support copying, which is why 
 I'm asking.

 Why would the
 issues different between reference types and value types?

I should have said value semantics / reference semantics instead.

I thought lock-free data structures are the quintessential COW type. You might as well just have provided the perfect argument against your case. Andrei

Andrei, I've never seen a COW lock-free algorithm.

Maybe we have some terms confused. Yes, lock-free singly-linked lists and stacks (and some dictionaries) that are defined in the literature are meant to be shared and use minimal and coherent CAS-based updates. This is quite a feat because copying is the easy way to go and doesn't quite deserve a peer-reviewed conference paper. However, Maged Michael and I co-wrote a few magazine articles on the topic. The general structure of a COW lock-free object is: struct Something { ... } struct LockFree { Something * s; void makeSomeChange() { do { Something * copy = s.deepDup; copy.makeSomeChange; } while (!cas(copy, s)); } } For example if Something is some array and you want to insert elements in it atomically, you make a copy of the array, insert stuff, and then substitute the copy for the original array with care such that the array stayed in place (otherwise your insertions aren't doing what you initially thought).
 Andrei, you might be thinking about STM, which follows a transactional 
 model: lazy copy, mutate copy, publish-retry changes.

My understanding is that STM is a generalized, multi-object, transparent, and systematic way of doing CAS-based work (please correct me if I'm wrong, I'm not an expert in STM). Andrei
May 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 Do you have a link to 
 your article? 

http://tinyurl.com/dac82a Andrei
May 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Fri, 01 May 2009 19:25:35 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 Do you have a link to your article?

http://tinyurl.com/dac82a Andrei

Should've seen that one coming. :) Anyways, I'm not sure how you can call the technique lock-free, since you're doing (possibly several) allocations inside the inner CAS loop.

No, repeated allocations are trivial to eliminate. I didn't even bother to explain that in the article. The loop must only refill the allocated object from the object that needs to be replaced.
 (I guess a good, parallel allocator might negate this) It's probably not 
 an issue for the article's use case, where reads vastly dominated 
 updates, but for things like stacks or queues, it wouldn't work. And 
 poor performance leads people to create their own fast, possibly buggy 
 versions, thus defeating the point of a standard library.

Incidentally Maged Michael himself wrote the first lock-free malloc(). Andrei
May 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even 
 bother to explain that in the article. The loop must only refill the 
 allocated object from the object that needs to be replaced.

putting the test in the pseudo code for this would've help my understanding. if(copy is null) copy = s.deepDup; else s.deepDupTo(copy);

copy = cast(T*) GC.malloc(T.sizeof); do { overwrite(s, copy); copy.modify; } while (!cas(copy, s));
 But Andrei, I really don't care about the existence of a lock-free 
 algorithm with value semantics. I only care about a practical lock-free 
 algorithm with value semantics. This approach, while elegant in its 
 simplicity and good for some situations, is very poor at high update / 
 contention objects, like stacks and queues. Maybe, locked algorithms are 
 the best value semantics can do in some situations and this might result 
 in separate fast reference containers. Which is a drawback to value 
 semantics and should be placed into the big pros/cons pile.

I agree. Andrei
May 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even 
 bother to explain that in the article. The loop must only refill the 
 allocated object from the object that needs to be replaced.

understanding. if(copy is null) copy = s.deepDup; else s.deepDupTo(copy);

copy = cast(T*) GC.malloc(T.sizeof); do { overwrite(s, copy); copy.modify; } while (!cas(copy, s));

I'm confused. The article talks about copying the entire data structure, not just a node/etc. And trees, etc tend to have variable sizes, etc.

You can reuse memory even if it comprehends more complex patterns than one single allocation. Andrei
May 02 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 You can reuse memory even if it comprehends more complex patterns
 than one single allocation.
 
 Andrei

I *highly* doubt it is worth the trouble. Most likely, this container won't be lock-free and scalable anymore. Performance will also degrade dramatically.

How did you get to "most likely"? There's so many factors in the equation. What is the alternative? Why wouldn't the code be lock-free? Why would it not be scalable?
 Besides, the more I think about thread-local/shared separation that
 is going to come to D2, the more I see that there is absolutely no
 need to make Phobos containers thread-safe. Most containers will be
 thread-local and won't need any synchronization at all.

I agree. But you also don't want to preclude using a shared container either. Andrei
May 02 2009
prev sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

All containers need to support copying. -- Rainer Deyke - rainerd eldwood.com
May 01 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Rainer Deyke wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

All containers need to support copying.

Speaking of which, I'm thinking of a field like scientific computing (even of the simplest, most basic kind that we all dabble into) or applied math. People who use for example matrices would NOT expect them to have reference semantics. They'd find it very confusing if a = b would not make matrix "a" a copy of matrix "b" and so on. (That + no operator overloading = R.I.P. my excitement for doing numeric work in Java.) It would work to ask people who actually use Matrix!double to wrap it in a Value!(Matrix!double). But say someone asks, whatever, but if Value!(Matrix!double) is a matrix, then what is Matrix? Well, I'd reply, Matrix is actually a reference to a matrix. Then, they'll ask, why don't you call what you call today "Matrix", RefMatrix or Ref!Matrix or whatever, and call a Matrix a matrix? Um, I don't know. That's what the buzz was on the newsgroup when we were thinking of it. Some said that's what people coming from Java expect. I guess at that point the would-be D user would be entitled to make me a lavaliere out of my Matrix library and move on. Andrei
May 01 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I guess at that point the would-be D user would be entitled to make me a
 lavaliere out of my Matrix library and move on.

Python, and therefore NumPy, are reference based.

In a language that's reference based, things can't be helped. They can be helped in D though. I wouldn't want to later cringe that I missed the opportunity.
 So far I haven't seen any scientists posting on the list about how
 having their arrays and matrices be references is driving them crazy.
 It may be surprising to some of them at first, but even non-hard-core
 coders seem to be able to handle it.

To me this sounds more like an opportunity to do the right thing.
 It helps that dummy assignments
 like a = b are rare.   More often you have things like a = b + c, and
 that creates a new matrix.  Or  a += b, which pretty clearly mutates
 a.

Oh rly. How about this then? Matrix a, b, c; ... c = a; a += b; Does the last operation mutate c as well? Andrei
May 01 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 I guess at that point the would-be D user would be entitled to
 make me a lavaliere out of my Matrix library and move on.


can be helped in D though. I wouldn't want to later cringe that I missed the opportunity.
 So far I haven't seen any scientists posting on the list about
 how having their arrays and matrices be references is driving
 them crazy. It may be surprising to some of them at first, but
 even non-hard-core coders seem to be able to handle it.


Could be. I'm not arguing that refs are best, just giving some evidence from the NumPy community to counter your assertion that scientists would be confused by reference-based array and matrix types.
 It helps that dummy assignments like a = b are rare.   More often
 you have things like a = b + c, and that creates a new matrix.
 Or  a += b, which pretty clearly mutates a.

Matrix a, b, c; ... c = a; a += b; Does the last operation mutate c as well?

I said "assignments like a = b are rare" and you put one of those in your example.

You said it and I don't buy it one femtosecond. Code does need to copy values around. It's a fundamental operation!
 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

That is terrible. Anywhere == a portion of the code far, far away.
 And the other thing is that when you pass a matrix to a function, you
  need to be aware of whether the function mutates that matrix or not.
  That's really the biggest pain, and source of hardest to find bugs.

I bet. Things get even more unpleasant when you have object that have reference members. Then the default constructor does The Wrong Thing(tm) - many C++ pundits and authors have eaten many good lunches with the money they got discussing the problem.
 Matlab goes the other extreme and makes everything value semantics. 
 It does a lot of COW optimizations under the hood to make it
 tolerably efficient even when tossing around big matrices.  But it's
 annoying if you want to write a function to make a small update to an
 existing matrix.  You have no choice but to return a copy and hope
 that Matlab's optimizer is smart enough to eliminate the unnecessary
 copy.

Yah. In D, there's ref. Andrei
May 02 2009
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-05-02 10:18:41 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

That is terrible. Anywhere == a portion of the code far, far away.

It's funny that you mention ==, since anywhere you write = you also got to pay attention that you didn't really mean ==. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 02 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
  Matrix a, b, c; ... c = a; a += b;
  Does the last operation mutate c as well?

your example.

You said it and I don't buy it one femtosecond. Code does need to copy values around. It's a fundamental operation!

Andrei, he said that explicit assignments/copies of arrays/containers are rare in numeric computing, which I agree with. Just because it's a fundamental operation doesn't mean it gets used much is his (or I guess Numply's actually) specific applications. Also, disregarding/disbelieving a use case is not helpful to this discussion.

He said something. That's about as much proof as was seen. I didn't buy it so I replied without assuming the same as him. Then he repeated "but I said that!" which upped the ante from supposition to presupposition. I think presuppositions are particularly pernicious so I felt the need to explicitly say that I don't believe it just because he said it. It's not a use case. It's just something that was said. If some sort of evidence is given, that would be great. Don't put the onus on me to disprove what was said.
 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

That is terrible. Anywhere == a portion of the code far, far away.

No, he was talking about a local decision. i.e. Did I or didn't I mean to make a logical copy here?

The local decision has effects that may go undetected until much later. Andrei
May 02 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 I do scientific computing. Generally, I find it breaks down into two 
 parts: things under 4x4, for which value types are probably better, and 
 everything else, for which value types are to be avoided like the 
 plague. I'll often work with 100's mb of data with algorithms that take 
 minutes to hours to complete. So an unexpected copy is both hard to find 
 (am I slow/crashing because of my algorithm, or because of a typo?) and 
 rather harmful, because its big.

I don't buy this. Undue copying is an issue that manifests itself locally, reproducibly, and debuggably. Contrast with long-distance coupling which is bound to hard to debug. You change a matrix here, and all of a sudden a matrix miles away has been messed up. Also, efficiency can be fixed with COW, whereas there is nothing you can do to fix the coupling aside from relentless and patient user education. Walter gave me a good argument (little did he know he was making a point destroying his.) Consider the progress we made when replacing char[] with string. Why? Because with char[] long-distance dependencies crop up easy and fast. With string you know there's never going to be a long-distance dependency. Why? Because unlike char[], content immutability makes string as good as a value. I remember the nightmare. I'd define a little structure: struct Sentence { uint id; char[] data; } Above my desk I have a big red bulb along with an audible alarm. As soon as I add the member "data", the bulb and the alarm go off. Sentence is now an invalid struct - I need to add at least constructor and a postblit. In the constructor I need to call .dup on the incoming data, and in the postblit I need to do something similar (or something more complicated if I want to be efficient). This is a clear example of code that is short and natural, yet does precisely the wrong thing. This is simply a ton of trouble, as experience with C++ has shown. I'm not even getting into calling functions that take a char[] and keeping fingers crossed ("I hope they won't mess with it") or .dup-ing prior to the call to eliminate any doubt (even though the function may anyway call .dup internally). string has marked huge progress towards people considering D seriously.
 But I've generally worked on making 
 something else fast so more data can be crunched, etc. Actual prototype 
 work (for array/matrix based stuff at least) is often done in Matlab, 
 which I think uses COW under-the-hood to provide value semantics. So I 
 think anyone turning to D to do scientific computing will know reference 
 semantics, since they'd already be familiar with them from C/C++, etc 
 (Fortran?). Although successfully attracting algorithm prototypes from 
 Matlab/python/mathmatica/R/etc is probably bigger issue than just the 
 container types, growing the pie was why the Wii won the last console wars.

Fortran uses pass by reference, but sort of gets away with it by assuming and promoting no aliasing throughout. Any two named values in Fortran can be assumed to refer to distinct memory. Also unless I am wrong A = B in Fortran does the right thing (copies B's content into A). Please confirm/infirm. For all I know, Matlab does the closest to "the real thing". Also, C++ numeric/scientific libraries invariably use value semantics in conjunction with expression templates meant to effect loop fusion. Why? Because value semantics is the right thing and C++ is able to express it. I should note, however, that Perl Data Language uses reference semantics (http://tinyurl.com/derlrh). There's also a definite stench when one realizes that a = b; on one side, and a = b * 1; or a = b + 0; on the other, do completely different things. So what we're looking at is: languages that had the option chose value semantics. Languages that didn't, well, they did what they could. I started rather neutral in this discussion but the more time goes by, the more things tilt towards value semantics. Andrei
May 02 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
About the issue of matrices as reference or value semantics.

As a program I think references are more efficient, they are used everywhere in
many languages like Python/Numpy, Sage, Ruby, etc, and I don't like the idea of
putting "ref" everywhere. So I like references more. Also all objects in D are
reference-based, so a D programmer is already very used to the concept of
reference-based data (the problems in D1 don't come from the fact that arrays
are references, but from the fact that their pointer/length is passed by value,
so for example if you change the length inside a function, such change can't be
seen outside. So by default for dynamic arrays I'd like a full reference
semantic).

On the other hand the value semantics is the most natural. Python uses
reference semantics as an optimization mean, and not because references are
more natural for programming newbies.
To prove this, every day in the python newsgroup people don't understand why:
if the following builds a list (array) of 10 integers:

 v = [0] * 10
 v



 v[1] = 5
 v



The following doesn't build a 3*4 matrix:
 m = [[0] * 3] * 4
 m[1][2] = 6
 m



The answer comes from the value semantics of lists in python, plus immutability of natural numbers. Such really common errors in newbies show that reference semantics isn't natural. A future Python-like language that wants to be higher-level and less error-prone and more natural than Python will probably use value semantics everywhere (with COW, etc). And I agree with Andrei Alexandrescu that reference semantics may produce bugs far away from their originating point. The future of programming is more toward immutable values and functional programming, like the immutable strings of D2 (and some very intelligent people have invented clever purely functional data structures that minimize data copying). In a purely functional language there's no point in giving the programmer a way to tell references apart from values. They can be seen as the same thing. Bye, bearophile
May 02 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Consider the progress we made when replacing char[] 
 with string. Why? Because with char[] long-distance dependencies crop up 
 easy and fast.

For big arrays people will put "ref" everywhere, so those bugs will not decrease. Bye, bearophile
May 02 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 Also, in a value semantics world, refs are third class citizens, but in 
 a reference semantic world, value semantics get their own assignment 
 operator ( []= ), and by convention, their own property ( .dup )

The major problem is not assignment. That can be taken care of. The problem is: 1. Passing an object into a function 2. Making the object as a member of another object 3. Yes, assigning to the object (which ought to be congruent with 1 and 2). I have perused some more searches and documentation, and things don't bode well for references. Consider the PyNum library. I have searched pynum pass by reference and found some interesting links. The first reveals differences between numpy and matlab (notably reference semantics). The second is a discussion entitled "beginner confused with numpy". Guess what the confusion was about? Congratulations, you have won a 40'' LCD TV. Reference semantics! Third hit: https://www-old.cae.wisc.edu/pipermail/octave-maintainers/2009-March/011509.html says:
 Octave's pass-by-value mechanism (with lazy copy-on-write) is
 something that is *far* more simple to grasp than NumPy's inherited
 everything-is-a-reference semantics. I do regard myself as moderately
 experienced Python programmer, yet every now and then I get shot in
 the foot by the reference semantics in Python.


I swear I didn't pay that guy. Also, getting back to Perl's Data language (http://tinyurl.com/derlrh) I see the mention to references is clearly a WARNING, not an INTERESTING AND DESIRABLE FEATURE. "It is important to keep the ``reference nature'' of piddles in mind when passing piddles into subroutines. If you modify the input pdls you modify the original argument, not a copy of it. This is different from some other array processing languages but makes for very efficient passing of piddles between subroutines." So what I'm seeing is that reference semantics is not desirable and not natural but was chosen for efficiency reasons. But you know, I don't want to "keep in mind" extra stuff when coding, I have enough to worry about. If I can get away with ref and/or refcounting, then I have taken care of the efficiency issue and I don't need to keep in mind a weird quirk. ============ About undue copying of data: maybe that could be avoided by having functions manipulate ranges (which are cheap to copy), not the containers that use them. In C++ you need to pass the container more often than not mostly because passing two iterators is amazingly burdensome. In D we can pass ranges easily, so I think much D code that wants to do stuff will just take ranges. Andrei
May 02 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 Although, I'm starting to see an interesting story: Here are containers. 
 They have value semantics and are simple to use/prototype with. When 
 you're done, you can move to ranges in the performance critical sections 
 in order to boost performance. And some things, like high performance 
 lock-free queues and stacks, might only exist as ranges.

I'm off the phone with Walter. He made a golden point: matrices are not general containers! They are mathematical entities and probably are indeed best off with value semantics (if efficiency issues can be taken care of). But matrix semantics are not necessarily generalizable to generic container semantics. I think that's a good insight. So probably worrying about matrices when discussing containers is a red herring. Andrei
May 02 2009
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com> 
 wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

All containers need to support copying.

No they don't. To be clear, I'm talking about copying the container, not the values inside them, in a non-destructive manner. My data structures book and wikipedia make no mention of a .dup like method as being part of a container. I do agree .dup is useful and quite practical for serial containers, but it is by no means required. But you haven't answered the question of a value semantic lock-free container.

I have. Andrei
May 01 2009
prev sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 All containers need to support copying.

No they don't.

There is no conceptual problem with a non-copyable struct. However, in order to be a considered a container, the struct must at least support read-only iteration over its contents. If iteration is possible, then so is copying. You can have non-copyable value types, but they can't be containers. -- Rainer Deyke - rainerd eldwood.com
May 02 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 On Sat, 02 May 2009 03:39:27 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 You can have non-copyable value types, but they can't be containers.

No they don't. Iteration can often be destructive. If I iterate over a stack or a queue, I'm left with an empty stack/queue. For value semantics to work non-destructive copying is required.

OK. I grant that there are non-copyable types that can reasonably be called containers. Simple solution: make them non-copyable structs. You still get most of the benefits of value types: - One less layer of indirection. - No long distance dependencies. - RAII. -- Rainer Deyke - rainerd eldwood.com
May 02 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 Again, D array's are structs with reference semantics. This isn't a
 pro/con either way.

The D1 dynamic array type does not have reference semantics, nor does it have value semantics. void f(int[] a) { a.length = 1; } auto a = []; f(a); assert(a.length == 0);
   - No long distance dependencies.

Well, if I can't copy it, then I have to use ref everywhere, which is functionally equivalent to reference semantics. I think you've just proved the counter-point.

Given a value type 'T', you have the guarantee that no two variables of type 'T' can alias each other. This guarantee is preserved when the type 'T' is non-copyable. An argument of type 'ref T' can obviously alias a variable of type 'T'.
   - RAII.

Can be done with structs or classes. Also see point 1. So, this isn't a pro/con either way.

The D1 dynamic array type does not support RAII. -- Rainer Deyke - rainerd eldwood.com
May 02 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 On Sat, 02 May 2009 19:11:11 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 Given a value type 'T', you have the guarantee that no two variables of
 type 'T' can alias each other.  This guarantee is preserved when the
 type 'T' is non-copyable.

 An argument of type 'ref T' can obviously alias a variable of type 'T'.

Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere. Which is reference semantics.

When passing arguments, (possibly const) ref is a reasonable default. I don't care about how arguments are passed. I care about aliasing between variables, especially member variables. With reference semantics, two variables of type T can reference each other. With non-copyable types, they cannot.
   - RAII.

Can be done with structs or classes. Also see point 1. So, this isn't a pro/con either way.

The D1 dynamic array type does not support RAII.

There are two parts to D's arrays. One a struct 2 words long, the other is a chunk of ram. The first part is RAII, the second part is not possible, since D doesn't allow dynamically sized memory allocation on the stack.

It's meaningless to talk about RAII in the context of the "struct" part of a D1 dynamic array, since it doesn't manage any resources. If I place a variable of a RAII type in a D1 dynamic array, it is not properly destroyed when the array goes out of scope. Therefore D1 dynamic arrays do not support RAII. Stack versus heap allocation is an orthogonal issue. -- Rainer Deyke - rainerd eldwood.com
May 02 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 RAII is all about stack allocation over heap allocation (or so I
 thought). Ah, wikipedia has set me straight. Anyways, now for the simple
 answer: you can't create D1 arrays with RAII types, I think. (Anyone
 tried scope Foo[] foo?) Anyways, in D2, if I remember correctly there's
 a bug where struct finilizers don't run if they're allocated on the
 heap. But if you're using classes for RAII like you should, the GC will
 run their finalizers just fine after the array dies. But this is an
 seems to be an issue about the elements/values inside the containers,
 not the container itself. So I'm lost.

A RAII variable is "destroyed" when it goes out of scope, where "destroyed" means that a destructor is called. RAII is a transitive feature. When a RAII variable is destroyed, its members are also destroyed. When a RAII container is destroyed, all of its contents are destroyed. References in D are not RAII types, because when a reference goes out of scope, the "contents" of that reference are not destroyed until the garbage decides to collect them, at which point it is too late to perform clean-up. When an array dies, its contents are destroyed. The issue is when the array dies. If the array is a value type, the array dies when it goes out of scope, so RAII is possible. If the array is a reference type, the array dies when the garbage collector decides to run sometime after all live references to the array have died, so RAII is not possible. -- Rainer Deyke - rainerd eldwood.com
May 02 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Rainer Deyke wrote:
 Robert Jacques wrote:
 RAII is all about stack allocation over heap allocation (or so I
 thought). Ah, wikipedia has set me straight. Anyways, now for the simple
 answer: you can't create D1 arrays with RAII types, I think. (Anyone
 tried scope Foo[] foo?) Anyways, in D2, if I remember correctly there's
 a bug where struct finilizers don't run if they're allocated on the
 heap. But if you're using classes for RAII like you should, the GC will
 run their finalizers just fine after the array dies. But this is an
 seems to be an issue about the elements/values inside the containers,
 not the container itself. So I'm lost.

A RAII variable is "destroyed" when it goes out of scope, where "destroyed" means that a destructor is called. RAII is a transitive feature. When a RAII variable is destroyed, its members are also destroyed. When a RAII container is destroyed, all of its contents are destroyed. References in D are not RAII types, because when a reference goes out of scope, the "contents" of that reference are not destroyed until the garbage decides to collect them, at which point it is too late to perform clean-up. When an array dies, its contents are destroyed. The issue is when the array dies. If the array is a value type, the array dies when it goes out of scope, so RAII is possible. If the array is a reference type, the array dies when the garbage collector decides to run sometime after all live references to the array have died, so RAII is not possible.

RAII can be implemented even with reference semantics. The mechanics would involve reference counting. Andrei
May 03 2009
next sibling parent Rainer Deyke <rainerd eldwood.com> writes:
Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 If the array is a reference type, the array dies when the garbage
 collector decides to run sometime after all live references to the array
 have died, so RAII is not possible.

RAII can be implemented even with reference semantics. The mechanics would involve reference counting.

Granted, although reference counting comes with its own set of issues. -- Rainer Deyke - rainerd eldwood.com
May 03 2009
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  3 de mayo a las 11:55 me escribiste:
If the array is a reference type, the array dies when the garbage
collector decides to run sometime after all live references to the array
have died, so RAII is not possible.

RAII can be implemented even with reference semantics. The mechanics would involve reference counting.

I was not following this discussion, so I might be speaking out of ignorance, but I thing you are underestimating the problems of reference counting again. Circular references break it, and when comes to RAII, you probably loose the ordering guarantees it needs if you implement some kind of algorithm to deal with cycles. Another option is trust the program be smart enough not to create cycles (weark references can help a lot here), but if the RC is hidden in the language it can be a little hard to remember this restriction. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
May 04 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 10:23:13 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 You can reuse memory even if it comprehends more complex patterns
 than one single allocation.
  Andrei

won't be lock-free and scalable anymore. Performance will also degrade dramatically.

How did you get to "most likely"? There's so many factors in the equation. What is the alternative? Why wouldn't the code be lock-free? Why would it not be scalable?

Because it dies a horrible death under contention. It's lock-free, but it isn't concurrent. Worse, the inner CAS loop is actually doing a lot of work, resulting in a lots of wasted CPU cycles. So as long as the write rarely, read many paradigm holds, it works. But try this with a stack or a queue and a good locking algorithm will beat you every time. Also, there's an issue with stale values for readers, but that's a separate thing.
 Besides, the more I think about thread-local/shared separation that
 is going to come to D2, the more I see that there is absolutely no
 need to make Phobos containers thread-safe. Most containers will be
 thread-local and won't need any synchronization at all.

I agree. But you also don't want to preclude using a shared container either.

I think Phobos containers should come in two forms: shared and local. i.e. shared Queue!T sq; /// and Queue!T q; They don't need to be there at day one, but I think if they aren't able to be supported (or slow as molasses), well end of with two container sets and lots of code duplication.
May 02 2009
prev sibling next sibling parent Jason House <jason.james.house gmail.com> writes:
Rainer Deyke Wrote:

 Michel Fortin wrote:
 On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd eldwood.com> said:
 Andrei Alexandrescu wrote:
 3. Reference semantics

be acceptable.

Andrei mentioned a couple of reasons against 3, which are yours?

I agree with all of Andrei's reasons. In addition, there's this: suppose you have a struct containing a (mutable) array. When you make a copy of that struct, you will almost always want to make a copy of the contained array. Therefore value semantics should be the default, because it simplifies the most common use case.

I don't think copying an array member in a struct is as automatic as you imply. We really should figure out use cases that would or would not benefit from a copy. I'm sure we can find lazy range examples where copying would be a performance killer. Also, how deeply should copying go? Does it make sense to do a shallow copy of an array of reference types? What about passing an array into a function?
Apr 30 2009
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 02 May 2009 18:08:30 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even  
 bother to explain that in the article. The loop must only refill the  
 allocated object from the object that needs to be replaced.

understanding. if(copy is null) copy = s.deepDup; else s.deepDupTo(copy);

copy = cast(T*) GC.malloc(T.sizeof); do { overwrite(s, copy); copy.modify; } while (!cas(copy, s));

structure, not just a node/etc. And trees, etc tend to have variable sizes, etc.

You can reuse memory even if it comprehends more complex patterns than one single allocation. Andrei

I *highly* doubt it is worth the trouble. Most likely, this container won't be lock-free and scalable anymore. Performance will also degrade dramatically. Besides, the more I think about thread-local/shared separation that is going to come to D2, the more I see that there is absolutely no need to make Phobos containers thread-safe. Most containers will be thread-local and won't need any synchronization at all.
May 02 2009
prev sibling next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 1. Value semantics (arrays are like int)

Don't forget: + Supports timely destruction of contents (i.e. RAII).

These are collections. RAII doesn't matter when you have a garbage collector and the only resource you have is memory, unless you're quite strapped for memory.
Apr 30 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Christopher Wright wrote:
 Rainer Deyke wrote:
 Don't forget:
  + Supports timely destruction of contents (i.e. RAII).

These are collections. RAII doesn't matter when you have a garbage collector and the only resource you have is memory, unless you're quite strapped for memory.

It also doesn't matter if the resource deallocation pixie deallocates all my non-memory resources the moment they are no longer being used, which is about as likely as only having memory resources in arrays. -- Rainer Deyke - rainerd eldwood.com
Apr 30 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Rainer Deyke wrote:
 Christopher Wright wrote:
 Rainer Deyke wrote:
 Don't forget:
  + Supports timely destruction of contents (i.e. RAII).

collector and the only resource you have is memory, unless you're quite strapped for memory.

It also doesn't matter if the resource deallocation pixie deallocates all my non-memory resources the moment they are no longer being used, which is about as likely as only having memory resources in arrays.

Which requires the container to know to delete everything that it references when it dies. If it is dealing with reference types, that is not the correct behavior. It seems awkward to me to deal with value types with destructors, but it makes sense for reference counting.
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even  
 bother to explain that in the article. The loop must only refill the  
 allocated object from the object that needs to be replaced.

understanding. if(copy is null) copy = s.deepDup; else s.deepDupTo(copy);

copy = cast(T*) GC.malloc(T.sizeof); do { overwrite(s, copy); copy.modify; } while (!cas(copy, s));

I'm confused. The article talks about copying the entire data structure, not just a node/etc. And trees, etc tend to have variable sizes, etc.
 But Andrei, I really don't care about the existence of a lock-free  
 algorithm with value semantics. I only care about a practical lock-free  
 algorithm with value semantics. This approach, while elegant in its  
 simplicity and good for some situations, is very poor at high update /  
 contention objects, like stacks and queues. Maybe, locked algorithms  
 are the best value semantics can do in some situations and this might  
 result in separate fast reference containers. Which is a drawback to  
 value semantics and should be placed into the big pros/cons pile.

I agree. Andrei

May 02 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 22:14:59 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 19:11:11 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 Given a value type 'T', you have the guarantee that no two variables of
 type 'T' can alias each other.  This guarantee is preserved when the
 type 'T' is non-copyable.

 An argument of type 'ref T' can obviously alias a variable of type 'T'.

Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere. Which is reference semantics.

When passing arguments, (possibly const) ref is a reasonable default. I don't care about how arguments are passed. I care about aliasing between variables, especially member variables. With reference semantics, two variables of type T can reference each other. With non-copyable types, they cannot.
   - RAII.

Can be done with structs or classes. Also see point 1. So, this isn't a pro/con either way.

The D1 dynamic array type does not support RAII.

There are two parts to D's arrays. One a struct 2 words long, the other is a chunk of ram. The first part is RAII, the second part is not possible, since D doesn't allow dynamically sized memory allocation on the stack.

It's meaningless to talk about RAII in the context of the "struct" part of a D1 dynamic array, since it doesn't manage any resources. If I place a variable of a RAII type in a D1 dynamic array, it is not properly destroyed when the array goes out of scope. Therefore D1 dynamic arrays do not support RAII. Stack versus heap allocation is an orthogonal issue.

RAII is all about stack allocation over heap allocation (or so I thought). Ah, wikipedia has set me straight. Anyways, now for the simple answer: you can't create D1 arrays with RAII types, I think. (Anyone tried scope Foo[] foo?) Anyways, in D2, if I remember correctly there's a bug where struct finilizers don't run if they're allocated on the heap. But if you're using classes for RAII like you should, the GC will run their finalizers just fine after the array dies. But this is an seems to be an issue about the elements/values inside the containers, not the container itself. So I'm lost.
May 02 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

No, Andrei it hasn't. A detailed paragraph (or more) explaining we you think so should be included in the full RFC.

There are several converging pieces of evidence. Each can be debated in separation, yet together they make for a rather strong case. People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended. The idiom: array.length = n; array.length = 0; // append to array up to n... is rather disingenuous and looks odd to the casual reader. As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller. So people have defined various types (such as ArrayCreator or ArrayAppender or whatnot) that take care of that aspect. But really what we're looking at is an Array type. Another issue is that we can get (partly) away with disowned slices because of the garbage collector: whenever a slice is to be rebound, whatever chunk it was bound to is left behind and will be duly collected. If, however, we want to support other programming disciplines that exercise stricter control over memory, the "parent" arrays must become an explicit type that code can keep around and manipulate directly. A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible. (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.) Andrei
Apr 30 2009
next sibling parent reply "Joel C. Salomon" <joelcsalomon gmail.com> writes:
Andrei Alexandrescu wrote:
 A design that has one container type and several slice types that crawl
 it in various ways is very compelling, and I think it will be a huge
 selling point for D. It would be odd, not natural, to have arrays as a
 singularly distinct container that is in fact its own range. We get away
 with container == range for arrays partly because arrays are a simple
 structure, but that only blurs thinking and understanding of more
 complex containers. In fact, ranges do predict that any attempt to grow
 a range will cause topological issues in the owning container; that's
 why SList wisely (IMHO :o)) includes a template parameter that tells
 whether its topology is fixed or flexible.

It sounds like you’re suggesting that arrays become like all containers, and that slices become a kind of range over the array. Functions that now take arrays (which are possibly slices) and write to them but don’t shrink or append to them, should therefore be rewritten to take the range that covers the array. B.T.W.: What’s the syntax for a range that accesses the entire container in index order? —Joel Salomon
Apr 30 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Joel C. Salomon wrote:
 Andrei Alexandrescu wrote:
 A design that has one container type and several slice types that crawl
 it in various ways is very compelling, and I think it will be a huge
 selling point for D. It would be odd, not natural, to have arrays as a
 singularly distinct container that is in fact its own range. We get away
 with container == range for arrays partly because arrays are a simple
 structure, but that only blurs thinking and understanding of more
 complex containers. In fact, ranges do predict that any attempt to grow
 a range will cause topological issues in the owning container; that's
 why SList wisely (IMHO :o)) includes a template parameter that tells
 whether its topology is fixed or flexible.

It sounds like you’re suggesting that arrays become like all containers, and that slices become a kind of range over the array.

Yes. I think that that would be a good design.
 Functions that now take arrays (which are possibly slices) and write to
 them but don’t shrink or append to them, should therefore be rewritten
 to take the range that covers the array.

To clarify: T[] is a slice. It has one array testis in the form of the infamous ~=. And it has the problems that I described because of that gender ambiguity. Most functions manipulate ranges (and therefore slices), not full array objects. So the addition of arrays would not be disruptive.
 B.T.W.:  What’s the syntax for a range that accesses the entire
 container in index order?

To get the "default" range for a container c, you write c[]. foreach automatically calls it. For a T[], asking a = b[] is an identity function. For a T[5], it happens to do the right thing. Allow me to bring the std.range.SListRange example again, because it's a very good example outside of arrays. An approach used e.g. in LISP is to conflate lists as containers with lists as iterators in containers. So when you pass a list into a function it could not only look and change at the elements that the list contains, but it could rewire the nodes in the list any way it wants. The root of the list itself is a bit special because it is usually passed "by value" so the caller will hold the root even if the callee would want to even change that. This is not a huge trouble for LISP as lists are the focal data structure, but the behavior comes off as a bit odd for someone who'd like to manipulate lists and other structures in uniform ways. So there comes the notion that the container is concerned with topology: how slots "sit" in memory, how they are arranged with respect to each other etc. The exact ways to manipulate topology and the tradeoffs involved are container-specific (for example it's cheap to prepend to a list but not an array). Ranges, on the other hand, do not allow topological changes - they only have uniform primitives for accessing the elements in containers, whereas they prevent any topological changes. So a SListRange!(int, Topology.flexible) would be a container and offer topological changes such as insertAfter. When such a list is asked for a range (via c[]) it will return a SListRange!(int, Topology.fixed). That guy will know how to move about the host list, but would be unable to exact any changes to its topology. As such, SListRange!(int, Topology.fixed) is as useable as any other forward range. Does any of this make sense? :o) Andrei
Apr 30 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Does any of this make sense? :o)

Yes. I think problems aren't finished yet and more thinking is expected, but you are getting somewhere (I can tell it also because I am starting to understand. I am not bright. When even I understand it often means that the design is meaningful). Regarding the foreach on the items I don't know/understand the syntactic need for the [] to scan. Bye, bearophile
Apr 30 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 Does any of this make sense? :o)

Yes. I think problems aren't finished yet and more thinking is expected, but you are getting somewhere (I can tell it also because I am starting to understand. I am not bright.

Bright is only one :o).
 When even I understand it
 often means that the design is meaningful). Regarding the foreach on
 the items I don't know/understand the syntactic need for the [] to
 scan.

Oh, that's simple. A while ago people asked, how do I iterate a container (assuming the container is distinct from range). So I said, here's how I think: Container!int c; ... foreach (element; c.all) { ...use element... } People said, that sucks! With opApply I don't need to append the .all thingie! So I told Walter to automatically call [] against the container. So people can now write: foreach (element; c) { ...use element... } All the container has to to is define opSlice() to return its "all" range. Andrei
Apr 30 2009
prev sibling parent superdan <super dan.org> writes:
Andrei Alexandrescu Wrote:

 Joel C. Salomon wrote:
 Andrei Alexandrescu wrote:
 A design that has one container type and several slice types that crawl
 it in various ways is very compelling, and I think it will be a huge
 selling point for D. It would be odd, not natural, to have arrays as a
 singularly distinct container that is in fact its own range. We get away
 with container == range for arrays partly because arrays are a simple
 structure, but that only blurs thinking and understanding of more
 complex containers. In fact, ranges do predict that any attempt to grow
 a range will cause topological issues in the owning container; that's
 why SList wisely (IMHO :o)) includes a template parameter that tells
 whether its topology is fixed or flexible.

It sounds like you’re suggesting that arrays become like all containers, and that slices become a kind of range over the array.

Yes. I think that that would be a good design.
 Functions that now take arrays (which are possibly slices) and write to
 them but don’t shrink or append to them, should therefore be rewritten
 to take the range that covers the array.

To clarify: T[] is a slice. It has one array testis in the form of the infamous ~=. And it has the problems that I described because of that gender ambiguity. Most functions manipulate ranges (and therefore slices), not full array objects. So the addition of arrays would not be disruptive.
 B.T.W.:  What’s the syntax for a range that accesses the entire
 container in index order?

To get the "default" range for a container c, you write c[]. foreach automatically calls it. For a T[], asking a = b[] is an identity function. For a T[5], it happens to do the right thing. Allow me to bring the std.range.SListRange example again, because it's a very good example outside of arrays. An approach used e.g. in LISP is to conflate lists as containers with lists as iterators in containers. So when you pass a list into a function it could not only look and change at the elements that the list contains, but it could rewire the nodes in the list any way it wants. The root of the list itself is a bit special because it is usually passed "by value" so the caller will hold the root even if the callee would want to even change that. This is not a huge trouble for LISP as lists are the focal data structure, but the behavior comes off as a bit odd for someone who'd like to manipulate lists and other structures in uniform ways. So there comes the notion that the container is concerned with topology: how slots "sit" in memory, how they are arranged with respect to each other etc. The exact ways to manipulate topology and the tradeoffs involved are container-specific (for example it's cheap to prepend to a list but not an array). Ranges, on the other hand, do not allow topological changes - they only have uniform primitives for accessing the elements in containers, whereas they prevent any topological changes. So a SListRange!(int, Topology.flexible) would be a container and offer topological changes such as insertAfter. When such a list is asked for a range (via c[]) it will return a SListRange!(int, Topology.fixed). That guy will know how to move about the host list, but would be unable to exact any changes to its topology. As such, SListRange!(int, Topology.fixed) is as useable as any other forward range. Does any of this make sense? :o) Andrei

fuck man u dance around da shit but never step in it. the clear example is a file. messin' with a file is diff from messin' with ranges that read from da file. forget the lists n shit. file is a real example coz u also have to be sure of lifetime n shit. i mean u wanna close it eh.
Apr 30 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 People have complained about the incredibly bad, unsafe, and slow 
 behavior of appending. Even before the considerable recent slowdown of 
 ~=, there were speed concerns. Worse than that, the completely weird 
 behavior of slices that grow to stomp on other slices cannot be defended.

Actually, this is a problem due to an optimization in the current GC. Fixing it is rather trivial: simply store the maximum requested array end at either the beginning or end of the memory block.

How does an arbitrary slice get to the beginning/end of the memory block? I guess that's not a fat pointer anymore, it's an obese pointer.
 But the bigger [ (x ~ y) may or may not create a new array ]

x ~ y always creates a new array.
 I'm a little sad to hear you say this, as both Daniel Keep and I posted 
 sample pseudo code on how to do slice types properly in both reference 
 counted and manual managed memory systems.

No need to be sad. The code is obvious. What I was saying is that people actually need to get their hand on the container so they can control its scope.
 A design that has one container type and several slice types that 
 crawl it in various ways is very compelling, and I think it will be a 
 huge selling point for D. It would be odd, not natural, to have arrays 
 as a singularly distinct container that is in fact its own range. We 
 get away with container == range for arrays partly because arrays are 
 a simple structure, but that only blurs thinking and understanding of 
 more complex containers. In fact, ranges do predict that any attempt 
 to grow a range will cause topological issues in the owning container; 
 that's why SList wisely (IMHO :o)) includes a template parameter that 
 tells whether its topology is fixed or flexible.

I'd disagree with regard to a selling point/compelling. My lab maintains an open source C++ array/slice type pair as part of a larger robotics package. It's laid outed with an ArrayBase containing all the implementation. The Array type simply extends ArrayBase and adds memory management. And Slice extends Arraybase, only adding binding capabilities. And save for memory allocation, they are identical, i.e. the container and range are interchangeable.

How about a hashtable? A tree? Are the ranges crawling them identical with the containers? Not at all. Arrays are probably the worst example to use as an archetype for a design because they are very simple. Pretty much any design looks a bit baroque when only applied to arrays.
 (Last but not least, .NET has arrays and slices. D's slices that 
 actually have an array testis mesh real bad with .NET because in .NET 
 you can't have a disowned slice.)

I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.

We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it. Andrei
Apr 30 2009
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-04-30 20:52:05 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 (Last but not least, .NET has arrays and slices. D's slices that 
 actually have an array testis mesh real bad with .NET because in .NET 
 you can't have a disowned slice.)

I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.

We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it.

As I understand it, the issue is that the .NET API generally doesn't expect ArraySegments, but Array, so Cristi wants T[] to be an array, not an array segment, in order to make the API usable without having to duplicate ArraySegments into Arrays every time. I'm doubtful having a standard container type will help that much, as T[] (the slice) still won't be accepted by the .NET API and the programmer would simply have to always use the container. That the .NET framework wants to use containers everywhere instead of slices is not a reason for D to bend to .NET concept of arrays and slices. But... That makes me think that perhaps my concept of "unique T[]" as the container (see my previous post) might be interesting. Cristi could make the compiler keep "unique T[]" in an Array and implicitly convert it to an ArraySegment as soon as it becomes non-unique. That would mean the .NET API could accept regular D dynamic arrays, as long as they're unique. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 01 2009
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-04-30 15:57:17 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 As predicted by the "range theory" which predicates that a range will 
 never grow, only shrink, slices should always shrink. Growing is a 
 foreign operation for a slice. This is a big problem that goes beyond a 
 need for purity or cleanliness - it's just terrible to have a slice 
 stomping around, and then uncontrollably being rebound elsewhere and so 
 on. People pass slices into functions, functions modify them and also 
 append to them, and depending on the phase of the moon, some updates 
 are being seen in the caller.

I think D slices are a good concept for pointers to a fixed-size array. If you write new int[4], you're creating a fixed-size array of four ints, and getting a slice to the whole. If you concatenate it with some other array, it creates a new one. As you like to point out, the fun starts when you have arrays of mutable elements and want to resize them. If you have an array of four elements and write array.length = 5, then you're perhaps creating a new one, disconnected from the previous one, and perhaps not. This pose no problem if your slice is the only one pointing to the memory block, otherwise the results are rather unpredictable. Well, containers are no better at that. If you have some slices of a container and you grow the container you may or may not disconnect all your slices (unless you're creating an indirect slice as {container*, int start, int end} or course). Which makes me think that perhaps we could introduce a unique type attribute and just forbid growing dynamic arrays unless they are flagged as unique. That'd make "unique T[]" behave as the container and "T[]" as the slice. And perhaps we could build that on top of the concept of uniqueness Bartosz wants to be expressible in D for the concurrency model, as he said in his latest post:
 In traditional type systems there is no way to express the requirement 
 that a message passed by reference must either be a monitor itself or 
 behave like a C++ unique_ptr (an object that leaves no aliases behind). 
 This requirement should be expressible in D, allowing the compiler to 
 check for its violations.

<http://bartoszmilewski.wordpress.com/2009/04/26/multithreading-with-d/> -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 01 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 Lastly, what you're trying to fix is a bug in the implementation of 
 arrays (see my posts in this thread) and not a bug in the design.

I think your fix to the array-slice conflation is simply to make slices arrays by adding extra information. This approach is reasonable, but (1) will make slices bigger and slower, and (2) will port very poorly to most other containers. Code using ad-hoc arrays in C often had to pass two words around (usually pointers+length). Similarly, code using iterators in C++ must pass two pointers/iterators around (begin+end). With D, we have a 1-1 correspondence of that situation with slices, which makes for a great argument on why a slice that contains two boundaries is an awesome way to encapsulate a concept at no efficiency cost. Adding a third word to the slice would mark a step backwards in terms of data being trafficked around, and a questionable increase in expressiveness. An obese three-words slice would avoid stomping over other slices upon append, but would still exhibit a rather unpredictable append behavior: a function taking a slice and appending to it may or may not publish its changes to its caller. A good design is to have containers and ranges, not container_ranges. Andrei
May 01 2009
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-05-01 13:44:58 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I think you might of misunderstood the concept of a unique/mobile type. 
 If  you take a slice or a reference of an unique, you null the source. 
 Which  makes using unique T logically invalid as a container.

Take note: I'm not expecting it to copy automatically, "unique T[]" isn't a value-type container. I'm solving the problem of slices detaching from one another when you grow them, which I don't think is a problem addressed by value-type containers either. If you grow the slice or container, previously-taken slices of it may or may not be left dangling. Allowing grow only on "unique T[]" makes sure there are no slice to be invalidated when you grow. Also, generally type systems featuring unique also have that "lent" flag allowing you to pass the unique reference to functions which are required to not keep it anywhere after the function returns. So you could pass lent slices of your unique T[] to functions without breaking uniqueness of the reference.
 Also, shared unique  T can't be converted to shared T. This 
 incompatibility is the whole point  of having a unique type in the 
 first place, and negates your concept of  using unique T as containers.

Why couldn't it? You just lose (forever) the uniqueness of the reference when you copy it to a non-unique one.
 And generally, unique T implies shared  storage, and can not be used 
 for thread-local arrays.

Generally, but not necessarily. Any problem for unique be applicable to thread-local arrays? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

No, Andrei it hasn't. A detailed paragraph (or more) explaining we you think so should be included in the full RFC.
 One big question mark for me is, should we define
 for arrays (and associative arrays and containers in general) value
 semantics, reference counted value semantics, or reference semantics? It
 is not clear to me yet what the best option is. Think of arrays as an
 archetypal example.

Here are some additional pros/cons: 1. Value semantics (arrays are like int) + can return static arrays from functions - can be emulated with slice assign and .dup, where needed - requires arrays and slices to be separate types +- requires opImplicitCast from container to slice type +- requires template argument deduction to be opImplicitCast/alias this aware - If you forget to pass by reference (when you meant to), logical bugs are produced (i.e. my mutation doesn't get published). - Dis-allows inheritance from all containers (i.e. implemented as structs) I think clarifying 2 as being copy-on-write is useful. 2. Copy-on-write (Value semantics with reference counting) +- The added trade-offs of value semantics - arrays may no longer be passed in registers (performance reduction) 3. Reference semantics ++ Allows arrays and slices to be the same type + How D's arrays/slices work today + Familiar to C/C++ refugees (Similar to plain old arrays) + Familiar to most OO language refugees: containers are objects + Allows inheritance from most containers (i.e. implemented as objects, except for arrays)
 - Hard to reason about: squirrel an array into an object, and way later
 and farther it modifies the array and effects a change at long distance
 - Long-distance effects may as well be some of the toughest bugs after
 memory errors
 - Potentially inefficient as people might start sprinkling copies in the
 hope the eliminate bugs or chances of bugs
 + const system helps: if a function doesn't change something just mark
 it as const and you're home free
 + immutable also helps: immutable data can never change so it can be
 shared no problem. Not sure how interesting immutable containers are  
 though.

All these issues also exist with objects, which people are familiar with.
 - Implementing fast/unsafe semantics with reference semantics
 effectively forces reference counting to be part of containers, which
 complicates implementation.

I'm drawing a blank on this. My thinking is this; 1) garbage-collected: reference counting isn't needed 2) reference counted: reference counting is expected and built into objects in general. 3) "unsafe" management: safety is explicitly not expected, so reference counting isn't needed My vote is for reference semantics. It's fast, familiar, easy to reason about (i.e. like objects) and doesn't require seperate array and slice types.
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.
Apr 30 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 30 Apr 2009 23:57:17 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

think so should be included in the full RFC.

There are several converging pieces of evidence. Each can be debated in separation, yet together they make for a rather strong case. People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended. The idiom: array.length = n; array.length = 0; // append to array up to n... is rather disingenuous and looks odd to the casual reader. As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller. So people have defined various types (such as ArrayCreator or ArrayAppender or whatnot) that take care of that aspect. But really what we're looking at is an Array type. Another issue is that we can get (partly) away with disowned slices because of the garbage collector: whenever a slice is to be rebound, whatever chunk it was bound to is left behind and will be duly collected. If, however, we want to support other programming disciplines that exercise stricter control over memory, the "parent" arrays must become an explicit type that code can keep around and manipulate directly. A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible. (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.) Andrei

I'm all for introducing Array type, but what would using it look like? Is it a plain Array!(T), or is there any shortcut planned. One of the selling points of slices is their ease of use. T[] is freaking awesome! I also would like to see what would Phobos look like after introducing Arrays. A good thing is that it's quite easy start implementing Array right now - all you need is to disallow ~ and ~= operations on slices. Then, go through Phobos code and see what got broken and start fixing it using newly written Array template (compiler doesn't have to know anything about it). I beleve it will become immediately clear whether D Arrays should be a reference or value type.
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Apr 2009 15:21:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>  
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

letting the struct designer decide the behaviour is better.

The problem is what to do for the containers in Phobos. Andrei

Yes. I agree. Sorry, looking back, I interpreted what Rainer was talking about as applying to all structs and not as a supporting argument. It would have helped my understanding if the argument was concrete, i.e. about Rainer's actual experiences, and not a broad over-generalization. Now, back on topic. I actually make heavy use of what amounts to structs containing arrays and don't want a deep copy >90% of the time. In fact, I haven't used any of the (my array-my array) deep copy functions and I only do copying when I have to convert my arrays to D style arrays or other container types. Also, D arrays are structs that contain C style arrays and I almost never dup them in final code. When I debug/prototype I do tend to use []= and dup more often. So I'd disagree that you'd almost always want to make a copy. Second, I don't think we should simplify the most common use case, but instead the majority of use cases. This is partly semantics, but partly not: everybody uses containers differently. Also, Andrei, I would had preferred it if you had separated the discussion on arrays from Phobos containers. Many languages have arrays and a separate container library and I feel the decision of whether D's arrays should behave like Phobos' containers would be best left to after the best decision for Phobos is made.
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

That's what struct ctors/dtors/opAssign are for. Personally, I think letting the struct designer decide the behaviour is better.

I hope you're not suggesting writing ctor/dtor/opAssign for every single struct that uses arrays as value types.

I'm wasn't _suggesting_ anything. This is how D currently works, and is how any D library you're using that has structs that uses arrays as value types, works. Being surprised at my response gives me the feeling that you've never needed to code what you're talking about. If you have, your concrete example would be helpful as a use case in this discussion.
 It's possible to write a reference wrapper around a value type.  It's
 also possible to write a value wrapper around a reference type.
 However, the former is both easier and more efficient than the latter.

Yes and no. Yes, the deep copy mixin is more complex to write the first time, but after that it's easy to use. And second, it's just as efficient as the later, since the deep copy mixin generates the same code as a value type.
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

think so should be included in the full RFC.

There are several converging pieces of evidence. Each can be debated in separation, yet together they make for a rather strong case. People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended.

Actually, this is a problem due to an optimization in the current GC. Fixing it is rather trivial: simply store the maximum requested array end at either the beginning or end of the memory block.
 The idiom:

 array.length = n;
 array.length = 0;
 // append to array up to n...

 is rather disingenuous and looks odd to the casual reader.

My view is that this is an argument for a reserve function: array.reserve = n; // append to array up to n...
 As predicted by the "range theory" which predicates that a range will  
 never grow, only shrink, slices should always shrink. Growing is a  
 foreign operation for a slice. This is a big problem that goes beyond a  
 need for purity or cleanliness - it's just terrible to have a slice  
 stomping around, and then uncontrollably being rebound elsewhere and so  
 on. People pass slices into functions, functions modify them and also  
 append to them, and depending on the phase of the moon, some updates are  
 being seen in the caller.

I agree this is a problem, but the bugs are bugs in the implementation, not design. (see above) But the bigger [ (x ~ y) may or may not create a new array ] resulting in either unseen intended updates or seen unintended updates affects both array and slice types. Note that with the bugs fixed, this would only occur when concatenating to the logically complete array, which is why I say it affects both an array and slice types equally. Value semantics might help though. I understand where you're coming from with "range theory", though I don't personally see that slices are impure: I view (x~y) as creating a logically new array and returning a view of it.
 So people have defined various types (such as ArrayCreator or  
 ArrayAppender or whatnot) that take care of that aspect. But really what  
 we're looking at is an Array type.

An Array type doesn't address the needs of ArrayCreator/ArrayAppender. The reason ArrayAppender exists is to optimize finding the array capacity. While this can be added to an array type, it can just as easily be added to a slice type. The main reason this enhancement got shot down long ago, was that the D community thought making arrays 3 words long, instead of 2, would result in a net performance loss. Personally, I'd love to see some real numbers proving or dis-proving this.
 Another issue is that we can get (partly) away with disowned slices  
 because of the garbage collector: whenever a slice is to be rebound,  
 whatever chunk it was bound to is left behind and will be duly  
 collected. If, however, we want to support other programming disciplines  
 that exercise stricter control over memory, the "parent" arrays must  
 become an explicit type that code can keep around and manipulate  
 directly.

I'm a little sad to hear you say this, as both Daniel Keep and I posted sample pseudo code on how to do slice types properly in both reference counted and manual managed memory systems. To reiterate: struct T[]{ T* ptr; size_t length; MemInfo* memInfo; } struct MemInfo { size_t refCount; //not included if not reference counting size_t capacity; } With meminfo being located at the start of the array's memory block. (i.e. com style)
 A design that has one container type and several slice types that crawl  
 it in various ways is very compelling, and I think it will be a huge  
 selling point for D. It would be odd, not natural, to have arrays as a  
 singularly distinct container that is in fact its own range. We get away  
 with container == range for arrays partly because arrays are a simple  
 structure, but that only blurs thinking and understanding of more  
 complex containers. In fact, ranges do predict that any attempt to grow  
 a range will cause topological issues in the owning container; that's  
 why SList wisely (IMHO :o)) includes a template parameter that tells  
 whether its topology is fixed or flexible.

I'd disagree with regard to a selling point/compelling. My lab maintains an open source C++ array/slice type pair as part of a larger robotics package. It's laid outed with an ArrayBase containing all the implementation. The Array type simply extends ArrayBase and adds memory management. And Slice extends Arraybase, only adding binding capabilities. And save for memory allocation, they are identical, i.e. the container and range are interchangeable.
 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)

I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Apr 2009 20:52:05 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 People have complained about the incredibly bad, unsafe, and slow  
 behavior of appending. Even before the considerable recent slowdown of  
 ~=, there were speed concerns. Worse than that, the completely weird  
 behavior of slices that grow to stomp on other slices cannot be  
 defended.

Fixing it is rather trivial: simply store the maximum requested array end at either the beginning or end of the memory block.

How does an arbitrary slice get to the beginning/end of the memory block? I guess that's not a fat pointer anymore, it's an obese pointer.

Okay, so you have: struct T[]{ T* ptr; size_t length; MemInfo* memInfo; } struct MemInfo { size_t refCount; //not included if not reference counting size_t capacity; } And MemInfo is located in the first 2 words of the memory block. Then the start of the memory block == memInfo*. The end of the memory block to (cast(byte)memInfo* ) + memInfo.capacity; T* and T*+length both point to locations between between the memory start and end (i.e. between memInfo* and memInfo* + capacity/(2*size_t) ) In terms of size, it keeps the pointer down to 3 words, which is only slightly chubbier than the current arrays. And given at the minimum you have to store a pointer the the ref count, its no larger than it has to be. (Exception: Manual memory management could with separate array types, could strip off this extra word)
 But the bigger [ (x ~ y) may or may not create a new array ]

x ~ y always creates a new array.

Opps, sorry. x ~= y.
 I'm a little sad to hear you say this, as both Daniel Keep and I posted  
 sample pseudo code on how to do slice types properly in both reference  
 counted and manual managed memory systems.

No need to be sad. The code is obvious. What I was saying is that people actually need to get their hand on the container so they can control its scope.

Could you clarify something for me. Is this a low-level implementation argument or a high-level semantic argument you're making?
 A design that has one container type and several slice types that  
 crawl it in various ways is very compelling, and I think it will be a  
 huge selling point for D. It would be odd, not natural, to have arrays  
 as a singularly distinct container that is in fact its own range. We  
 get away with container == range for arrays partly because arrays are  
 a simple structure, but that only blurs thinking and understanding of  
 more complex containers. In fact, ranges do predict that any attempt  
 to grow a range will cause topological issues in the owning container;  
 that's why SList wisely (IMHO :o)) includes a template parameter that  
 tells whether its topology is fixed or flexible.

maintains an open source C++ array/slice type pair as part of a larger robotics package. It's laid outed with an ArrayBase containing all the implementation. The Array type simply extends ArrayBase and adds memory management. And Slice extends Arraybase, only adding binding capabilities. And save for memory allocation, they are identical, i.e. the container and range are interchangeable.

How about a hashtable? A tree? Are the ranges crawling them identical with the containers? Not at all. Arrays are probably the worst example to use as an archetype for a design because they are very simple. Pretty much any design looks a bit baroque when only applied to arrays.

My disagreement was limited to the disagree with regard to a selling point/compelling. Otherwise, I agree. hmm... thinking out loud Slices of lists, could be lists. But its a recipe for leaks and other issues. Slices of trees, could be trees. Although these 'slices' couldn't actually walk the tree. (i.e. aren't ranges) And re-balancing causes logical bug. But hash tables, queues, dequeues, stacks, etc don't have nice logical slices.
 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)

just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.

We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it. Andrei

Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
Andrei, how do you do shared, lock-free containers as value types?
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 It's possible to write a reference wrapper around a value type.  It's
 also possible to write a value wrapper around a reference type.
 However, the former is both easier and more efficient than the latter.

Yes and no. Yes, the deep copy mixin is more complex to write the first time, but after that it's easy to use. And second, it's just as efficient as the later, since the deep copy mixin generates the same code as a value type.

A simple deep-copy struct wrapper around a class type is horribly inefficient compared to a direct struct type:

Reference semantics don't require classes. D arrays have reference semantics, but are implemented as structs. Phobos could do something similar.
   - It uses an extra heap allocation per instance.

And during a deep copy, you're often making a lot of heap copies. (N vs N+1 when N>>1)
   - It uses an extra level of indirection.
   - It uses virtual function calls.

Not if the class is final.
   - It allocates an extra reference to a virtual function table per
 instance.

Huh? Isn't that part of the extra heap allocation? Also, a really important question is: is it possible to implement are shared, lock-free containers as value types?
Apr 30 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 06:26:19 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-04-30 15:57:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 As predicted by the "range theory" which predicates that a range will  
 never grow, only shrink, slices should always shrink. Growing is a  
 foreign operation for a slice. This is a big problem that goes beyond a  
 need for purity or cleanliness - it's just terrible to have a slice  
 stomping around, and then uncontrollably being rebound elsewhere and so  
 on. People pass slices into functions, functions modify them and also  
 append to them, and depending on the phase of the moon, some updates  
 are being seen in the caller.

I think D slices are a good concept for pointers to a fixed-size array. If you write new int[4], you're creating a fixed-size array of four ints, and getting a slice to the whole. If you concatenate it with some other array, it creates a new one. As you like to point out, the fun starts when you have arrays of mutable elements and want to resize them. If you have an array of four elements and write array.length = 5, then you're perhaps creating a new one, disconnected from the previous one, and perhaps not. This pose no problem if your slice is the only one pointing to the memory block, otherwise the results are rather unpredictable. Well, containers are no better at that. If you have some slices of a container and you grow the container you may or may not disconnect all your slices (unless you're creating an indirect slice as {container*, int start, int end} or course). Which makes me think that perhaps we could introduce a unique type attribute and just forbid growing dynamic arrays unless they are flagged as unique. That'd make "unique T[]" behave as the container and "T[]" as the slice. And perhaps we could build that on top of the concept of uniqueness Bartosz wants to be expressible in D for the concurrency model, as he said in his latest post:
 In traditional type systems there is no way to express the requirement  
 that a message passed by reference must either be a monitor itself or  
 behave like a C++ unique_ptr (an object that leaves no aliases behind).  
 This requirement should be expressible in D, allowing the compiler to  
 check for its violations.

<http://bartoszmilewski.wordpress.com/2009/04/26/multithreading-with-d/>

I think you might of misunderstood the concept of a unique/mobile type. If you take a slice or a reference of an unique, you null the source. Which makes using unique T logically invalid as a container. Also, shared unique T can't be converted to shared T. This incompatibility is the whole point of having a unique type in the first place, and negates your concept of using unique T as containers. And generally, unique T implies shared storage, and can not be used for thread-local arrays. Lastly, what you're trying to fix is a bug in the implementation of arrays (see my posts in this thread) and not a bug in the design.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 06:45:20 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-04-30 20:52:05 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)

just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.

what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it.

As I understand it, the issue is that the .NET API generally doesn't expect ArraySegments, but Array, so Cristi wants T[] to be an array, not an array segment, in order to make the API usable without having to duplicate ArraySegments into Arrays every time. I'm doubtful having a standard container type will help that much, as T[] (the slice) still won't be accepted by the .NET API and the programmer would simply have to always use the container. That the .NET framework wants to use containers everywhere instead of slices is not a reason for D to bend to .NET concept of arrays and slices.

I agree. D shouldn't go borrowing bad design from other languages.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 14:00:35 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Lastly, what you're trying to fix is a bug in the implementation of  
 arrays (see my posts in this thread) and not a bug in the design.

I think your fix to the array-slice conflation is simply to make slices arrays by adding extra information. This approach is reasonable, but (1) will make slices bigger and slower, and

The extra information is on the heap, not on the stack, so slices are no bigger or slower than normal (for GC or reference counted memory; fully manual memory management could have a 3 word array type and a 2 word slice type)
 (2) will port very poorly to most other containers.

Well, these bugs don't affect other containers.
 Code using ad-hoc arrays in C often had to pass two words around  
 (usually pointers+length). Similarly, code using iterators in C++ must  
 pass two pointers/iterators around (begin+end). With D, we have a 1-1  
 correspondence of that situation with slices, which makes for a great  
 argument on why a slice that contains two boundaries is an awesome way  
 to encapsulate a concept at no efficiency cost. Adding a third word to  
 the slice would mark a step backwards in terms of data being trafficked  
 around, and a questionable increase in expressiveness. An obese  
 three-words slice would avoid stomping over other slices upon append,  
 but would still exhibit a rather unpredictable append behavior: a  
 function taking a slice and appending to it may or may not publish its  
 changes to its caller.

I have not suggested obese 3-slices (in general). Let's look at actual costs of your previous array + slice proposal and my array/slice proposal: GC: Array 2 words (begin*, end*) Slice 2 words (begin*, end*) Array/Slice 2 words (ptr*, length) Reference counting: Array 4 words (begin*, end*, end-of-store*, refcount*) Slice 3 words (begin*, end*, owner*) Array/Slice 3 words (begin*, end*, memInfo*) Manual memory managment: Array 3 words (begin*, end*, end-of-store*) Slice 2 words (begin*, end*) Array/Slice 3 words (ptr*, length, memInfo*) And you known something interesting: a 4-word alice is actually faster that a 2 word slice, _if_ you use MMX or SSE instructions (even unaligned, though aligned is even faster).
 A good design is to have containers and ranges, not container_ranges.

I agree in general. But, containers don't talk the same language, ranges do. So array slices having a few extra features (i.e ~=), doesn't bother my abstractions: it's still a range.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 14:03:44 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
   - It uses an extra heap allocation per instance.

And during a deep copy, you're often making a lot of heap copies. (N vs N+1 when N>>1)

In the case of dynamic arrays, N = 1 (or possibly 0 if the small array optimization is used). In the case of static arrays, N = 0. Either way, the extra level of indirection is significant.

In the case of dynamic arrays, it's a struct. So no extra heap allocation is done. Same goes with heaps. Your argument only applies to object based containers, like trees, which generally contain lots of child objects., i.e. N heap allocations.
   - It allocates an extra reference to a virtual function table per
 instance.

Huh? Isn't that part of the extra heap allocation?

Extra allocations: 1. Extra memory usage: 2 words (virtual function table pointer, local pointer) plus heap overhead for one allocation.

Nope. Most of the time, the extra memory usage is 0. Remember, the GC uses large blocks with power of 2 sizes. Unless you happen to push across a boundry, the extra 2 words don't matter.
 Also, a really important question is: is it possible to implement are
 shared, lock-free containers as value types?

Is it possible to implement them as reference types?

Yes. See java.util.concurrent for examples. In fact, all of the algorithms I've read explicitly don't support copying, which is why I'm asking.
 Why would the
 issues different between reference types and value types?

A value type has to support copying. Reference types don't. Perhaps I should have said value semantics / reference semantics instead.
 A reference
 type is just a reference wrapper around a value type.

Precisely, so it's always passed by reference and doesn't need to support copying.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 15:07:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the  
 algorithms I've read explicitly don't support copying, which is why I'm  
 asking.

 Why would the
 issues different between reference types and value types?

should have said value semantics / reference semantics instead.

I thought lock-free data structures are the quintessential COW type. You might as well just have provided the perfect argument against your case. Andrei

Andrei, I've never seen a COW lock-free algorithm. In fact, to the best of my knowledge, you can only copy lock-free hash tables and arrays. Everything else, even stacks or queues implemented on an array, don't work. For instance, the issues with walkers on singly linked lists is a well known problem. I've read several paper on lock-free queues and lock-free stacks, and none have mentioned copying. They simply ignore it. (The hash table talk implied it, but didn't mention it explicitly) So if you have a paper on how to do a lock free stack, queue, etc that supports copying I'd love to read it. (which is one reason why I'm asking) Andrei, you might be thinking about STM, which follows a transactional model: lazy copy, mutate copy, publish-retry changes. (Technically, under the hood the publish takes a lock. It's just that by taking all the required locks at once in a known manner, deadlocks are avoided. And the locks are taken for a much shorter time period.(Caveat: there are other ways of doing STM, and I don't know what D's specific plans are)) This is generally considered to be different from specific lock-free containers. (I'm pretty sure that simple STM containers would have worse performance than a lock version, let alone a lock-free version, but I've haven't seen data for STM used with simple containers, like a stack, queue or tree, so I don't know for sure. Usually, you see it with unstructured objects ) I should probably have used the term Concurrent Collections as opposed to lock-free collections.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 18:09:09 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Fri, 01 May 2009 15:07:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the  
 algorithms I've read explicitly don't support copying, which is why  
 I'm asking.

 Why would the
 issues different between reference types and value types?

I should have said value semantics / reference semantics instead.

I thought lock-free data structures are the quintessential COW type. You might as well just have provided the perfect argument against your case. Andrei


Maybe we have some terms confused. Yes, lock-free singly-linked lists and stacks (and some dictionaries) that are defined in the literature are meant to be shared and use minimal and coherent CAS-based updates. This is quite a feat because copying is the easy way to go and doesn't quite deserve a peer-reviewed conference paper. However, Maged Michael and I co-wrote a few magazine articles on the topic. The general structure of a COW lock-free object is: struct Something { ... } struct LockFree { Something * s; void makeSomeChange() { do { Something * copy = s.deepDup; copy.makeSomeChange; } while (!cas(copy, s)); } } For example if Something is some array and you want to insert elements in it atomically, you make a copy of the array, insert stuff, and then substitute the copy for the original array with care such that the array stayed in place (otherwise your insertions aren't doing what you initially thought).

Ah, thank you. I would call this copy-on-mutate, rather than copy-on-write (though this is semantics). The COW that I normally think about has value semantics. In copy-on-mutate, the copying is done under the hood in order to provide lock-free properties. Do you have a link to your article? I'd like to know the relative performance. Though somehow changing O(1) algorithms that rarely touch the GC for O(n*t) that abuse the GC, provokes a certain gut reaction. This style of programming seems to need STM, so you can group a bunch of mutations into a single transaction, however, a lot of the queue/stack use cases are inherently single mutate transactions. What really worries me, is that lock-free queues and stacks are the building blocks of a lot of concurrency code, so making them fast is very important.
 Andrei, you might be thinking about STM, which follows a transactional  
 model: lazy copy, mutate copy, publish-retry changes.

My understanding is that STM is a generalized, multi-object, transparent, and systematic way of doing CAS-based work (please correct me if I'm wrong, I'm not an expert in STM).

May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

All containers need to support copying.

No they don't. To be clear, I'm talking about copying the container, not the values inside them, in a non-destructive manner. My data structures book and wikipedia make no mention of a .dup like method as being part of a container. I do agree .dup is useful and quite practical for serial containers, but it is by no means required. But you haven't answered the question of a value semantic lock-free container.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 19:23:45 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.


Speaking of which, I'm thinking of a field like scientific computing (even of the simplest, most basic kind that we all dabble into) or applied math. People who use for example matrices would NOT expect them to have reference semantics. They'd find it very confusing if a = b would not make matrix "a" a copy of matrix "b" and so on. (That + no operator overloading = R.I.P. my excitement for doing numeric work in Java.) It would work to ask people who actually use Matrix!double to wrap it in a Value!(Matrix!double). But say someone asks, whatever, but if Value!(Matrix!double) is a matrix, then what is Matrix? Well, I'd reply, Matrix is actually a reference to a matrix. Then, they'll ask, why don't you call what you call today "Matrix", RefMatrix or Ref!Matrix or whatever, and call a Matrix a matrix? Um, I don't know. That's what the buzz was on the newsgroup when we were thinking of it. Some said that's what people coming from Java expect. I guess at that point the would-be D user would be entitled to make me a lavaliere out of my Matrix library and move on. Andrei

I do scientific computing. Generally, I find it breaks down into two parts: things under 4x4, for which value types are probably better, and everything else, for which value types are to be avoided like the plague. I'll often work with 100's mb of data with algorithms that take minutes to hours to complete. So an unexpected copy is both hard to find (am I slow/crashing because of my algorithm, or because of a typo?) and rather harmful, because its big. But I've generally worked on making something else fast so more data can be crunched, etc. Actual prototype work (for array/matrix based stuff at least) is often done in Matlab, which I think uses COW under-the-hood to provide value semantics. So I think anyone turning to D to do scientific computing will know reference semantics, since they'd already be familiar with them from C/C++, etc (Fortran?). Although successfully attracting algorithm prototypes from Matlab/python/mathmatica/R/etc is probably bigger issue than just the container types, growing the pie was why the Wii won the last console wars.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 01 May 2009 19:25:35 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 Do you have a link to your article?

http://tinyurl.com/dac82a Andrei

Should've seen that one coming. :) Anyways, I'm not sure how you can call the technique lock-free, since you're doing (possibly several) allocations inside the inner CAS loop. (I guess a good, parallel allocator might negate this) It's probably not an issue for the article's use case, where reads vastly dominated updates, but for things like stacks or queues, it wouldn't work. And poor performance leads people to create their own fast, possibly buggy versions, thus defeating the point of a standard library.
May 01 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 02:34:57 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Fri, 01 May 2009 19:25:35 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 Do you have a link to your article?

http://tinyurl.com/dac82a Andrei

Anyways, I'm not sure how you can call the technique lock-free, since you're doing (possibly several) allocations inside the inner CAS loop.

No, repeated allocations are trivial to eliminate. I didn't even bother to explain that in the article. The loop must only refill the allocated object from the object that needs to be replaced.

putting the test in the pseudo code for this would've help my understanding. if(copy is null) copy = s.deepDup; else s.deepDupTo(copy);
 (I guess a good, parallel allocator might negate this) It's probably  
 not an issue for the article's use case, where reads vastly dominated  
 updates, but for things like stacks or queues, it wouldn't work. And  
 poor performance leads people to create their own fast, possibly buggy  
 versions, thus defeating the point of a standard library.

Incidentally Maged Michael himself wrote the first lock-free malloc().

Cool.
 Andrei

But Andrei, I really don't care about the existence of a lock-free algorithm with value semantics. I only care about a practical lock-free algorithm with value semantics. This approach, while elegant in its simplicity and good for some situations, is very poor at high update / contention objects, like stacks and queues. Maybe, locked algorithms are the best value semantics can do in some situations and this might result in separate fast reference containers. Which is a drawback to value semantics and should be placed into the big pros/cons pile. Of course, I could be missing something (again). P.S. A general thank you all your posts and explanations.
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 03:39:27 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 All containers need to support copying.

No they don't.

There is no conceptual problem with a non-copyable struct. However, in order to be a considered a container, the struct must at least support read-only iteration over its contents. If iteration is possible, then so is copying. You can have non-copyable value types, but they can't be containers.

No they don't. Iteration can often be destructive. If I iterate over a stack or a queue, I'm left with an empty stack/queue. For value semantics to work non-destructive copying is required.
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 06:23:46 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-05-01 13:44:58 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I think you might of misunderstood the concept of a unique/mobile type.  
 If  you take a slice or a reference of an unique, you null the source.  
 Which  makes using unique T logically invalid as a container.

Take note: I'm not expecting it to copy automatically, "unique T[]" isn't a value-type container. I'm solving the problem of slices detaching from one another when you grow them, which I don't think is a problem addressed by value-type containers either. If you grow the slice or container, previously-taken slices of it may or may not be left dangling. Allowing grow only on "unique T[]" makes sure there are no slice to be invalidated when you grow. Also, generally type systems featuring unique also have that "lent" flag allowing you to pass the unique reference to functions which are required to not keep it anywhere after the function returns. So you could pass lent slices of your unique T[] to functions without breaking uniqueness of the reference.

By the way, the canonical system with a unique/mobile type (CSP) doesn't have a lent flag. It lets you circumvent uniqueness instead. Scope/lent's definitely better :) Hmm... lent functions are a fairly restrictive subset, if you consider unique T[] to be long lived. However, in the more common string builder style of programming, it works just fine. (Though people will still want a capacity field). So maybe an array type with unique semantics and a capacity field is what's warranted (either language or library type). Would you also allow T[] ~=? i.e. x~=y; => x = x~y; which is less efficient but still valid.
 Also, shared unique  T can't be converted to shared T. This  
 incompatibility is the whole point  of having a unique type in the  
 first place, and negates your concept of  using unique T as containers.

Why couldn't it? You just lose (forever) the uniqueness of the reference when you copy it to a non-unique one.

The whole point of unique is that it lacks any sharing protection mechanisms. Instead the object is protected by the fact that only one thread has access to it at a time. You could design the system so that uniques could be casted to local objects (I don't think it's a good idea, but you could), but casting them to shared objects is a recipe for disaster, due to races, etc.
 And generally, unique T implies shared  storage, and can not be used  
 for thread-local arrays.

Generally, but not necessarily. Any problem for unique be applicable to thread-local arrays?

Nope, the implication comes from the standard use case of cheaply passing data between threads.
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 10:58:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 I do scientific computing. Generally, I find it breaks down into two  
 parts: things under 4x4, for which value types are probably better, and  
 everything else, for which value types are to be avoided like the  
 plague. I'll often work with 100's mb of data with algorithms that take  
 minutes to hours to complete. So an unexpected copy is both hard to  
 find (am I slow/crashing because of my algorithm, or because of a  
 typo?) and rather harmful, because its big.

I don't buy this. Undue copying is an issue that manifests itself locally, reproducibly, and debuggably. Contrast with long-distance coupling which is bound to hard to debug. You change a matrix here, and all of a sudden a matrix miles away has been messed up. Also, efficiency can be fixed with COW, whereas there is nothing you can do to fix the coupling aside from relentless and patient user education. Walter gave me a good argument (little did he know he was making a point destroying his.) Consider the progress we made when replacing char[] with string. Why? Because with char[] long-distance dependencies crop up easy and fast. With string you know there's never going to be a long-distance dependency. Why? Because unlike char[], content immutability makes string as good as a value. I remember the nightmare. I'd define a little structure: struct Sentence { uint id; char[] data; } Above my desk I have a big red bulb along with an audible alarm. As soon as I add the member "data", the bulb and the alarm go off. Sentence is now an invalid struct - I need to add at least constructor and a postblit. In the constructor I need to call .dup on the incoming data, and in the postblit I need to do something similar (or something more complicated if I want to be efficient). This is a clear example of code that is short and natural, yet does precisely the wrong thing. This is simply a ton of trouble, as experience with C++ has shown. I'm not even getting into calling functions that take a char[] and keeping fingers crossed ("I hope they won't mess with it") or .dup-ing prior to the call to eliminate any doubt (even though the function may anyway call .dup internally). string has marked huge progress towards people considering D seriously.

Andrei, you're perfectly right about strings. They've been a god send. But strings are small, cheap to copy and tend to roam a lot. Scientific computing (at least the kind I do) is very local, and uses large datasets which are expensive to copy. I load data and set up the problem parameters, then pass through a series of functions and write it out the answer. When I first learned Matlab, my prof gave me a great rule of thumb: if it's more than 10 lines long, you're probably doing it wrong. My code bases tend to be small and focused, and use either my own code or libraries I understand since they're logically pure (Ah, math functions (I don't mess with rounding)). And while finding bugs is relatively easy (go small, personal code bases) tracking down a performance issue if you don't even know it exists is a lot harder/slower.
 But I've generally worked on making something else fast so more data  
 can be crunched, etc. Actual prototype work (for array/matrix based  
 stuff at least) is often done in Matlab, which I think uses COW  
 under-the-hood to provide value semantics. So I think anyone turning to  
 D to do scientific computing will know reference semantics, since  
 they'd already be familiar with them from C/C++, etc (Fortran?).  
 Although successfully attracting algorithm prototypes from  
 Matlab/python/mathmatica/R/etc is probably bigger issue than just the  
 container types, growing the pie was why the Wii won the last console  
 wars.

Fortran uses pass by reference, but sort of gets away with it by assuming and promoting no aliasing throughout. Any two named values in Fortran can be assumed to refer to distinct memory. Also unless I am wrong A = B in Fortran does the right thing (copies B's content into A). Please confirm/infirm. For all I know, Matlab does the closest to "the real thing". Also, C++ numeric/scientific libraries invariably use value semantics in conjunction with expression templates meant to effect loop fusion. Why? Because value semantics is the right thing and C++ is able to express it. I should note, however, that Perl Data Language uses reference semantics (http://tinyurl.com/derlrh).

Actually, all the big, high performance libraries I know use BLAS/Linpack (i.e. reference semantics). Though there are a good number of middle range libraries use expression templates. Hmm... just went googling and found some neat work on mixing expression templates with BLAS APIs, including using the GPU. Having you're cake and eating it too is nice. Though personally I like the current array op syntax, which is reference based and is just enough to make me aware of doing expensive operations, without being annoying: a[] = b[] + c[];
 There's also a definite stench when one realizes that

 a = b;

 on one side, and

 a = b * 1;

 or

 a = b + 0;

 on the other, do completely different things.

Again, I'd use the []= operator a = b + 0; // Error a[] = b + 0; // Okay So I don't see this as an issue.
 So what we're looking at is: languages that had the option chose value  
 semantics. Languages that didn't, well, they did what they could.

Well, that's one thing I love about D. We have a value assignment operator: []= in addition to a reference semantics assignment operator = (well, for reference types at least). I actually use it to do format copy/conversions in my own array class.
 I started rather neutral in this discussion but the more time goes by,  
 the more things tilt towards value semantics.

Well, for me, the more this goes on the more I get interested in trying value semantics. But if I'm honest, I'd just go on using array slices and forget about the array containers. And the lack of good, fast lock-free value semantics containers (even the concept of shared data without reference semantics) is a border-line deal breaker, depending on who you are (It's bad PR regardless). Also, in a value semantics world, refs are third class citizens, but in a reference semantic world, value semantics get their own assignment operator ( []= ), and by convention, their own property ( .dup )
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 15:17:04 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 03:39:27 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 You can have non-copyable value types, but they can't be containers.

No they don't. Iteration can often be destructive. If I iterate over a stack or a queue, I'm left with an empty stack/queue. For value semantics to work non-destructive copying is required.

OK. I grant that there are non-copyable types that can reasonably be called containers. Simple solution: make them non-copyable structs. You still get most of the benefits of value types: - One less layer of indirection.

Again, D array's are structs with reference semantics. This isn't a pro/con either way.
   - No long distance dependencies.

Well, if I can't copy it, then I have to use ref everywhere, which is functionally equivalent to reference semantics. I think you've just proved the counter-point.
   - RAII.

Can be done with structs or classes. Also see point 1. So, this isn't a pro/con either way.
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 17:59:53 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Also, in a value semantics world, refs are third class citizens, but in  
 a reference semantic world, value semantics get their own assignment  
 operator ( []= ), and by convention, their own property ( .dup )

The major problem is not assignment. That can be taken care of. The problem is: 1. Passing an object into a function 2. Making the object as a member of another object 3. Yes, assigning to the object (which ought to be congruent with 1 and 2). I have perused some more searches and documentation, and things don't bode well for references. Consider the PyNum library. I have searched pynum pass by reference and found some interesting links. The first reveals differences between numpy and matlab (notably reference semantics). The second is a discussion entitled "beginner confused with numpy". Guess what the confusion was about? Congratulations, you have won a 40'' LCD TV. Reference semantics! Third hit: https://www-old.cae.wisc.edu/pipermail/octave-maintainers/2009-March/011509.html says: >> Octave's pass-by-value mechanism (with lazy copy-on-write) is >> something that is *far* more simple to grasp than NumPy's inherited >> everything-is-a-reference semantics. I do regard myself as moderately >> experienced Python programmer, yet every now and then I get shot in >> the foot by the reference semantics in Python. I swear I didn't pay that guy. Also, getting back to Perl's Data language (http://tinyurl.com/derlrh) I see the mention to references is clearly a WARNING, not an INTERESTING AND DESIRABLE FEATURE. "It is important to keep the ``reference nature'' of piddles in mind when passing piddles into subroutines. If you modify the input pdls you modify the original argument, not a copy of it. This is different from some other array processing languages but makes for very efficient passing of piddles between subroutines." So what I'm seeing is that reference semantics is not desirable and not natural but was chosen for efficiency reasons. But you know, I don't want to "keep in mind" extra stuff when coding, I have enough to worry about. If I can get away with ref and/or refcounting, then I have taken care of the efficiency issue and I don't need to keep in mind a weird quirk.

I've said it before and I'll say it again. In high-level interpreted languages, with interactive consoles etc, you're generally doing prototyping work and value semantics are better for that. But D is a systems language. That doesn't make the arguments any less valid, but you might want to down weight them a bit.
 ============

 About undue copying of data: maybe that could be avoided by having  
 functions manipulate ranges (which are cheap to copy), not the  
 containers that use them. In C++ you need to pass the container more  
 often than not mostly because passing two iterators is amazingly  
 burdensome. In D we can pass ranges easily, so I think much D code that  
 wants to do stuff will just take ranges.

And ranges _are_ reference semantics. If as you say most of D's code would use ranges, doesn't that prove the counter point? Although, I'm starting to see an interesting story: Here are containers. They have value semantics and are simple to use/prototype with. When you're done, you can move to ranges in the performance critical sections in order to boost performance. And some things, like high performance lock-free queues and stacks, might only exist as ranges.
May 02 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 19:11:11 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 Again, D array's are structs with reference semantics. This isn't a
 pro/con either way.

The D1 dynamic array type does not have reference semantics, nor does it have value semantics. void f(int[] a) { a.length = 1; } auto a = []; f(a); assert(a.length == 0);
   - No long distance dependencies.

Well, if I can't copy it, then I have to use ref everywhere, which is functionally equivalent to reference semantics. I think you've just proved the counter-point.

Given a value type 'T', you have the guarantee that no two variables of type 'T' can alias each other. This guarantee is preserved when the type 'T' is non-copyable. An argument of type 'ref T' can obviously alias a variable of type 'T'.

Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere. Which is reference semantics.
   - RAII.

Can be done with structs or classes. Also see point 1. So, this isn't a pro/con either way.

The D1 dynamic array type does not support RAII.

There are two parts to D's arrays. One a struct 2 words long, the other is a chunk of ram. The first part is RAII, the second part is not possible, since D doesn't allow dynamically sized memory allocation on the stack. And basically all dynamic data structures have to do some heap allocation. I had thought you were talking about the difference between have the managing head be a struct of a class. (See scope classes)
May 02 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Rainer Deyke wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

All containers need to support copying.

Speaking of which, I'm thinking of a field like scientific computing (even of the simplest, most basic kind that we all dabble into) or applied math. People who use for example matrices would NOT expect them to have reference semantics. They'd find it very confusing if a = b would not make matrix "a" a copy of matrix "b" and so on. (That + no operator overloading = R.I.P. my excitement for doing numeric work in Java.) It would work to ask people who actually use Matrix!double to wrap it in a Value!(Matrix!double). But say someone asks, whatever, but if Value!(Matrix!double) is a matrix, then what is Matrix? Well, I'd reply, Matrix is actually a reference to a matrix. Then, they'll ask, why don't you call what you call today "Matrix", RefMatrix or Ref!Matrix or whatever, and call a Matrix a matrix? Um, I don't know. That's what the buzz was on the newsgroup when we were thinking of it. Some said that's what people coming from Java expect. I guess at that point the would-be D user would be entitled to make me a lavaliere out of my Matrix library and move on.

Python, and therefore NumPy, are reference based. So far I haven't seen any scientists posting on the list about how having their arrays and matrices be references is driving them crazy. It may be surprising to some of them at first, but even non-hard-core coders seem to be able to handle it. It helps that dummy assignments like a = b are rare. More often you have things like a = b + c, and that creates a new matrix. Or a += b, which pretty clearly mutates a. Much more often the discussion on the numpy list takes the form of "how do I make this loop faster" becuase loops are slow in Python so you have to come up with clever transformations to turn your loop into array ops. This is thankfully a problem that D array libs do not have. If you think of it as a loop, go ahead and implement it as a loop. That's why I prefer to do numerical work in D rather than NumPy these days. --bb
May 01 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I guess at that point the would-be D user would be entitled to make me =



 lavaliere out of my Matrix library and move on.

Python, and therefore NumPy, are reference based.

In a language that's reference based, things can't be helped. They can be helped in D though. I wouldn't want to later cringe that I missed the opportunity.
 So far I haven't seen any scientists posting on the list about how
 having their arrays and matrices be references is driving them crazy.
 It may be surprising to some of them at first, but even non-hard-core
 coders seem to be able to handle it.

To me this sounds more like an opportunity to do the right thing.

Could be. I'm not arguing that refs are best, just giving some evidence from the NumPy community to counter your assertion that scientists would be confused by reference-based array and matrix types.
 It helps that dummy assignments
 like a =3D b are rare. =A0 More often you have things like a =3D b + c, =


 that creates a new matrix. =A0Or =A0a +=3D b, which pretty clearly mutat=


 a.

Oh rly. How about this then? Matrix a, b, c; ... c =3D a; a +=3D b; Does the last operation mutate c as well?

I said "assignments like a =3D b are rare" and you put one of those in your example. Yes, when you have an a=3Db anywhere you've got to pay attention and make sure you didn't mean a=3Db.dup. And the other thing is that when you pass a matrix to a function, you need to be aware of whether the function mutates that matrix or not. That's really the biggest pain, and source of hardest to find bugs. Matlab goes the other extreme and makes everything value semantics. It does a lot of COW optimizations under the hood to make it tolerably efficient even when tossing around big matrices. But it's annoying if you want to write a function to make a small update to an existing matrix. You have no choice but to return a copy and hope that Matlab's optimizer is smart enough to eliminate the unnecessary copy. I agree with the poster below that what I usually want is value semantics with small values and matrices, reference semantics with big matrices. --bb
May 02 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 02 May 2009 18:17:16 +0400, Denis Koroskin <2korden gmail.com> wrote:

 On Sat, 02 May 2009 18:08:30 +0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even
 bother to explain that in the article. The loop must only refill the
 allocated object from the object that needs to be replaced.

understanding. if(copy is null) copy = s.deepDup; else s.deepDupTo(copy);

copy = cast(T*) GC.malloc(T.sizeof); do { overwrite(s, copy); copy.modify; } while (!cas(copy, s));

structure, not just a node/etc. And trees, etc tend to have variable sizes, etc.

You can reuse memory even if it comprehends more complex patterns than one single allocation. Andrei

I *highly* doubt it is worth the trouble. Most likely, this container won't be lock-free and scalable anymore. Performance will also degrade dramatically. Besides, the more I think about thread-local/shared separation that is going to come to D2, the more I see that there is absolutely no need to make Phobos containers thread-safe. Most containers will be thread-local and won't need any synchronization at all.

I meant, most essential ones (Array, Set, Map, HashMap, SList, DList, IntrusiveSList, IntrusiveDList etc). Lock-free containers are needed, too, but I believe they should be designed after thread-unsafe ones.
May 02 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
  Matrix a, b, c; ... c = a; a += b;
  Does the last operation mutate c as well?

your example.

You said it and I don't buy it one femtosecond. Code does need to copy values around. It's a fundamental operation!

Andrei, he said that explicit assignments/copies of arrays/containers are rare in numeric computing, which I agree with. Just because it's a fundamental operation doesn't mean it gets used much is his (or I guess Numply's actually) specific applications. Also, disregarding/disbelieving a use case is not helpful to this discussion.
 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

That is terrible. Anywhere == a portion of the code far, far away.

No, he was talking about a local decision. i.e. Did I or didn't I mean to make a logical copy here?
May 02 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 02 May 2009 17:45:16 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
  Matrix a, b, c; ... c = a; a += b;
  Does the last operation mutate c as well?

your example.

You said it and I don't buy it one femtosecond. Code does need to copy values around. It's a fundamental operation!

are rare in numeric computing, which I agree with. Just because it's a fundamental operation doesn't mean it gets used much is his (or I guess Numply's actually) specific applications. Also, disregarding/disbelieving a use case is not helpful to this discussion.

He said something. That's about as much proof as was seen. I didn't buy it so I replied without assuming the same as him. Then he repeated "but I said that!" which upped the ante from supposition to presupposition. I think presuppositions are particularly pernicious so I felt the need to explicitly say that I don't believe it just because he said it. It's not a use case. It's just something that was said. If some sort of evidence is given, that would be great. Don't put the onus on me to disprove what was said.

I don't expect you to disprove what was said. I would hope that when you see someone over-generalizing their own or someone else's experience, you take it for what it was: a use case. That being said, I think there was some miscommunication. Bill seems to be talking about no-op assignments a = c; while you've (probably correctly) hit on is: a = foo(b); which results in an assignment and happens everywhere.
May 02 2009