digitalmars.D - RFC: naming for FrontTransversal and Transversal ranges

Andrei Alexandrescu (27/27) Apr 29 2009 Hello,

bearophile (4/7) Apr 29 2009 Maybe I am not understanding the true purpose of this, but isn't a gener...
BLS (9/45) Apr 29 2009 Question is, how this fits into a collection/container package.

Andrei Alexandrescu (53/63) Apr 29 2009 You're not missing anything, except for perhaps a little patience :o).

Daniel Keep (16/17) Apr 29 2009 POLS (Principle of Least Surprise) says they should be reference types.
BLS (8/8) Apr 29 2009 Andrei ,
Jason House (2/79) Apr 29 2009 I'm a C++ refugee and value semantics make no sense to me. I always decl...

Andrei Alexandrescu (9/15) Apr 29 2009 Good point, but the issue boils down to this: do we define container

Georg Wrede (7/23) Apr 30 2009 Will it be an incurable mess if you let the programmer choose each time?...

Rainer Deyke (12/15) Apr 29 2009 Don't forget:

Michel Fortin (13/26) Apr 30 2009 But don't forget that D2 is going to have the shared keyword, possibly

Rainer Deyke (9/16) Apr 30 2009 I agree with all of Andrei's reasons.

Robert Jacques (4/9) Apr 30 2009 That's what struct ctors/dtors/opAssign are for. Personally, I think

Andrei Alexandrescu (3/13) Apr 30 2009 The problem is what to do for the containers in Phobos.

Robert Jacques (21/33) Apr 30 2009 Yes. I agree. Sorry, looking back, I interpreted what Rainer was talking...

Andrei Alexandrescu (32/48) Apr 30 2009 It must be a matter of style. Having used both semantics extensively, I

Benji Smith (4/10) May 01 2009 Just out of curiosity, why do you like reference counting more than

Andrei Alexandrescu (4/14) May 01 2009 Who told you I like it? All I do is acknowledge that RC and MS have

Rainer Deyke (8/18) Apr 30 2009 I hope you're not suggesting writing ctor/dtor/opAssign for every single

Robert Jacques (11/27) Apr 30 2009 I'm wasn't _suggesting_ anything. This is how D currently works, and is ...

Rainer Deyke (12/22) Apr 30 2009 A simple deep-copy struct wrapper around a class type is horribly

Robert Jacques (11/29) Apr 30 2009 Reference semantics don't require classes. D arrays have reference

Rainer Deyke (12/24) May 01 2009 In the case of dynamic arrays, N = 1 (or possibly 0 if the small array

Robert Jacques (15/39) May 01 2009 In the case of dynamic arrays, it's a struct. So no extra heap allocatio...

Andrei Alexandrescu (4/13) May 01 2009 I thought lock-free data structures are the quintessential COW type. You...

Robert Jacques (24/36) May 01 2009 Andrei, I've never seen a COW lock-free algorithm. In fact, to the best ...

Andrei Alexandrescu (29/50) May 01 2009 Maybe we have some terms confused. Yes, lock-free singly-linked lists

Robert Jacques (14/62) May 01 2009 Ah, thank you. I would call this copy-on-mutate, rather than copy-on-wri...

Andrei Alexandrescu (3/5) May 01 2009 http://tinyurl.com/dac82a

Robert Jacques (10/14) May 01 2009 Should've seen that one coming. :)

Andrei Alexandrescu (6/26) May 01 2009 No, repeated allocations are trivial to eliminate. I didn't even bother

Robert Jacques (19/43) May 02 2009 putting the test in the pseudo code for this would've help my

Andrei Alexandrescu (8/27) May 02 2009 copy = cast(T*) GC.malloc(T.sizeof);

Robert Jacques (4/29) May 02 2009 I'm confused. The article talks about copying the entire data structure,...

Andrei Alexandrescu (4/26) May 02 2009 You can reuse memory even if it comprehends more complex patterns than

Denis Koroskin (3/29) May 02 2009 I *highly* doubt it is worth the trouble. Most likely, this container wo...

Andrei Alexandrescu (7/19) May 02 2009 How did you get to "most likely"? There's so many factors in the

Robert Jacques (14/30) May 02 2009 Because it dies a horrible death under contention. It's lock-free, but i...

Denis Koroskin (3/40) May 02 2009 I meant, most essential ones (Array, Set, Map, HashMap, SList, DList, In...

Rainer Deyke (4/6) May 01 2009 All containers need to support copying.

Andrei Alexandrescu (18/23) May 01 2009 Speaking of which, I'm thinking of a field like scientific computing

Bill Baxter (18/42) May 01 2009 Python, and therefore NumPy, are reference based.

Andrei Alexandrescu (12/26) May 01 2009 In a language that's reference based, things can't be helped. They can

Bill Baxter (25/52) May 02 2009 a

Andrei Alexandrescu (10/57) May 02 2009 You said it and I don't buy it one femtosecond. Code does need to copy

Michel Fortin (8/12) May 02 2009 It's funny that you mention ==, since anywhere you write = you also got
Robert Jacques (9/20) May 02 2009 Andrei, he said that explicit assignments/copies of arrays/containers ar...

Andrei Alexandrescu (11/35) May 02 2009 He said something. That's about as much proof as was seen. I didn't buy

Robert Jacques (11/36) May 02 2009 I don't expect you to disprove what was said. I would hope that when you...

Robert Jacques (17/40) May 01 2009 I do scientific computing. Generally, I find it breaks down into two

Andrei Alexandrescu (55/71) May 02 2009 I don't buy this. Undue copying is an issue that manifests itself

bearophile (17/24) May 02 2009 About the issue of matrices as reference or value semantics.
bearophile (4/7) May 02 2009 For big arrays people will put "ref" everywhere, so those bugs will not ...
Robert Jacques (42/113) May 02 2009 Andrei, you're perfectly right about strings. They've been a god send. B...

Andrei Alexandrescu (39/47) May 02 2009 The major problem is not assignment. That can be taken care of. The

Robert Jacques (14/62) May 02 2009 I've said it before and I'll say it again. In high-level interpreted

Andrei Alexandrescu (9/14) May 02 2009 I'm off the phone with Walter. He made a golden point: matrices are not

Robert Jacques (8/12) May 01 2009 No they don't. To be clear, I'm talking about copying the container, not...

Andrei Alexandrescu (3/18) May 01 2009 I have.
Rainer Deyke (8/13) May 02 2009 There is no conceptual problem with a non-copyable struct. However, in

Robert Jacques (5/16) May 02 2009 No they don't. Iteration can often be destructive. If I iterate over a

Rainer Deyke (9/16) May 02 2009 OK. I grant that there are non-copyable types that can reasonably be

Robert Jacques (9/23) May 02 2009 Again, D array's are structs with reference semantics. This isn't a

Rainer Deyke (16/27) May 02 2009 The D1 dynamic array type does not have reference semantics, nor does it

Robert Jacques (10/35) May 02 2009 Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere. ...

Rainer Deyke (14/35) May 02 2009 When passing arguments, (possibly const) ref is a reasonable default. I

Robert Jacques (11/45) May 02 2009 RAII is all about stack allocation over heap allocation (or so I thought...

Rainer Deyke (19/28) May 02 2009 A RAII variable is "destroyed" when it goes out of scope, where

Andrei Alexandrescu (4/35) May 03 2009 RAII can be implemented even with reference semantics. The mechanics

Rainer Deyke (4/11) May 03 2009 Granted, although reference counting comes with its own set of issues.
Leandro Lucarella (15/21) May 04 2009 I was not following this discussion, so I might be speaking out of

Jason House (2/18) Apr 30 2009 I don't think copying an array member in a struct is as automatic as you...

Christopher Wright (4/9) Apr 30 2009 These are collections. RAII doesn't matter when you have a garbage

Rainer Deyke (6/13) Apr 30 2009 It also doesn't matter if the resource deallocation pixie deallocates

Christopher Wright (5/16) Apr 30 2009 Which requires the container to know to delete everything that it

Robert Jacques (37/58) Apr 30 2009 No, Andrei it hasn't. A detailed paragraph (or more) explaining we you

Andrei Alexandrescu (43/51) Apr 30 2009 There are several converging pieces of evidence. Each can be debated in

Denis Koroskin (4/55) Apr 30 2009 I'm all for introducing Array type, but what would using it look like? I...
Joel C. Salomon (9/19) Apr 30 2009 It sounds like you’re suggesting that arrays become like all container...

Andrei Alexandrescu (36/55) Apr 30 2009 To clarify: T[] is a slice. It has one array testis in the form of the

bearophile (5/6) Apr 30 2009 Yes. I think problems aren't finished yet and more thinking is expected,...

Andrei Alexandrescu (20/30) Apr 30 2009 Oh, that's simple. A while ago people asked, how do I iterate a

superdan (2/69) Apr 30 2009 fuck man u dance around da shit but never step in it. the clear example ...

Robert Jacques (51/101) Apr 30 2009 Actually, this is a problem due to an optimization in the current GC.

Andrei Alexandrescu (16/56) Apr 30 2009 How does an arbitrary slice get to the beginning/end of the memory

Robert Jacques (34/88) Apr 30 2009 Okay, so you have:
Michel Fortin (23/35) May 01 2009 As I understand it, the issue is that the .NET API generally doesn't

Robert Jacques (3/25) May 01 2009 I agree. D shouldn't go borrowing bad design from other languages.

Michel Fortin (27/40) May 01 2009 I think D slices are a good concept for pointers to a fixed-size array.

Robert Jacques (11/47) May 01 2009 I think you might of misunderstood the concept of a unique/mobile type. ...

Andrei Alexandrescu (19/21) May 01 2009 I think your fix to the array-slice conflation is simply to make slices

Robert Jacques (27/47) May 01 2009 The extra information is on the heap, not on the stack, so slices are no...

Michel Fortin (21/29) May 02 2009 Take note: I'm not expecting it to copy automatically, "unique T[]"

Robert Jacques (20/45) May 02 2009 By the way, the canonical system with a unique/mobile type (CSP) doesn't...

Robert Jacques (1/1) Apr 30 2009 Andrei, how do you do shared, lock-free containers as value types?

bearophile (6/12) May 01 2009 Sigh! Already today, and even more tomorrow, this is often false for D t...

Bill Baxter (11/20) May 01 2009 o. In my computer I have a cheap GPU that is sleeping while my D code ru...

Robert Jacques (4/26) May 02 2009 Well, since I do GP-GPU work, the GPU algorithm is much algorithmically ...
Don (13/31) May 02 2009 I confess to being mighty skeptical about the whole multi-threaded,

Robert Jacques (6/40) May 02 2009 Actually, AI is mostly embarrassingly-parallel. The issue is the mostly ...
Georg Wrede (28/65) May 02 2009 I had this bad dream where there's a language where it's trivial to use

bearophile (6/9) May 02 2009 Too much late. My personal computer has already about 100 small cores in...

Georg Wrede (11/19) May 02 2009 http://en.wikipedia.org/wiki/Simultaneous_multithreading:

bearophile (5/6) May 02 2009 Nope. I have said many things :-) One of the things I have said is that ...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Hello,


I'm defining two ranges that go vertically over a range of ranges. 
Consider for example an int[][]. That's a range that has a range as its 
element type.

FrontTransversal goes transversally over the range of ranges and returns 
the front of each element in turn:

     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = frontTransversal(x);
     auto witness = [ 1, 3 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);

Then Transversal does a similar thing, just that instead of going 
through the front of each range, it goes over a column set during 
construction:

     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = transversal(x, 1);
     auto witness = [ 2, 4 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);

Question is, what would be two good names for these ranges?


Andrei

Apr 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 I'm defining two ranges that go vertically over a range of ranges. 
 Consider for example an int[][]. That's a range that has a range as its 
 element type.

Maybe I am not understanding the true purpose of this, but isn't a general
2D/ND slicer much better?

Bye,
bearophile

Apr 29 2009

BLS <windevguy hotmail.de> writes:

Andrei Alexandrescu wrote:
 Hello,
 
 
 I'm defining two ranges that go vertically over a range of ranges. 
 Consider for example an int[][]. That's a range that has a range as its 
 element type.
 
 FrontTransversal goes transversally over the range of ranges and returns 
 the front of each element in turn:
 
     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = frontTransversal(x);
     auto witness = [ 1, 3 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);
 
 Then Transversal does a similar thing, just that instead of going 
 through the front of each range, it goes over a column set during 
 construction:
 
     int[][] x = new int[][2];
     x[0] = [1, 2];
     x[1] = [3, 4];
     auto ror = transversal(x, 1);
     auto witness = [ 2, 4 ];
     uint i;
     foreach (e; ror) assert(e == witness[i++]);
     assert(i == 2);
 
 Question is, what would be two good names for these ranges?
 
 
 Andrei

Question is, how this fits into a collection/container package.

..So frankly, I don't worry about naming  conventions. I am more 
concerned about ranges within a let's say dynamic d e queue.
Lock free ADTs  are a final proof of product , no ?

Maybe I miss something.
But fact is : D2 (Phobos) still don't have (not even in it's brand new 
incarnation ) support for simple humpty dumpty collections. So Proof of 
product is where ?

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BLS wrote:
 Question is, how this fits into a collection/container package.
 
 ..So frankly, I don't worry about naming  conventions. I am more 
 concerned about ranges within a let's say dynamic d e queue.
 Lock free ADTs  are a final proof of product , no ?
 
 Maybe I miss something.
 But fact is : D2 (Phobos) still don't have (not even in it's brand new 
 incarnation ) support for simple humpty dumpty collections. So Proof of 
 product is where ?

You're not missing anything, except for perhaps a little patience :o).

There is vivid discussion about container support in D2. I was actually
thinking of posting a RFC sooner or later here.

It is very exciting to realize that with minimal (and actually
simplifying) changes to the language we can offer the option of
garbage-collected vs. reference counted vs. "unsafe" semi-automatically
managed memory. Moreover, I realized that the global allocation model
only affects the built-in literal types such as T[] and T[U], but still
allows using the other models as library types.

It has become increasingly clear we'll need to support arrays in
addition to slices. One big question mark for me is, should we define
for arrays (and associative arrays and containers in general) value
semantics, reference counted value semantics, or reference semantics? It
is not clear to me yet what the best option is. Think of arrays as an
archetypal example.

1. Value semantics (arrays are like int)

+ Simple to reason about
+ No long-distance sharing of state
+ Familiar to refugees coming from C++ with STL

+ Relatively simple to implement
+ Good behavior as members inside structs and classes
- Bad behavior as function parameters: if you forget to pass by
reference, performance will be harmed

2. Value semantics with reference counting

+- All tradeoffs of value semantics, except as noted below
- Difficult to implement
+ Cheap to copy around, will "divide" only upon mutation attempts
- Innocuous operations may throw/occasionally take a long time

3. Reference semantics

- Hard to reason about: squirrel an array into an object, and way later
and farther it modifies the array and effects a change at long distance
- Long-distance effects may as well be some of the toughest bugs after
memory errors
+ Cheap to copy

- Less familiar to C++ refugees
- Potentially inefficient as people might start sprinkling copies in the
hope the eliminate bugs or chances of bugs
+ const system helps: if a function doesn't change something just mark
it as const and you're home free
+ immutable also helps: immutable data can never change so it can be
shared no problem. Not sure how interesting immutable containers are though.
- Implementing fast/unsafe semantics with reference semantics
effectively forces reference counting to be part of containers, which
complicates implementation.

Any thoughts, just share! For the record, Walter has a bias in favor
of... I'll better don't bias you guys.

One thing: it would be relatively easy to switch among semantics by 
defining a wrapper such as Value!(T) or Ref!(T). So whatever choice we 
make, it won't paint us into a corner.


Andrei

Apr 29 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 A lot of interesting stuff.

POLS (Principle of Least Surprise) says they should be reference types.

Lua, Python, etc., etc. programmers will expect.  Jumping out from
behind a bush a surprising them with value semantics will probably not
impress them.

And don't say you can just document it: we get enough people coming in
who refuse to read the docs until we tell them to.  :P

That said, having a Value!T template [1] that implements value-semantics
would be good.  I'll be honest with you: I'm not sure I'd use it for
fear of performance issues.

Perhaps making it a copy-on-write type under the hood would help with
that ("it only copies if you mutate and someone else has a reference;
it's fast AND safe!").

  -- Daniel


[1] I'd be tempted to call it Block!T as in "here's a block of data".

Apr 29 2009

BLS <windevguy hotmail.de> writes:

Andrei ,
many thanks, ( not only because you are taking  the time to  explain ), 
I think I have to  say  �Thanks Andrei� because your answer was very 
gentle, ... ignoring all my grammar and (more important) logical 
mistakes... let me remain with a : You are listening and this is in my 
world : very generous.

Still : Container RFC ?
Bj�rn

Apr 29 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 BLS wrote:
 Question is, how this fits into a collection/container package.
 
 ..So frankly, I don't worry about naming  conventions. I am more 
 concerned about ranges within a let's say dynamic d e queue.
 Lock free ADTs  are a final proof of product , no ?
 
 Maybe I miss something.
 But fact is : D2 (Phobos) still don't have (not even in it's brand new 
 incarnation ) support for simple humpty dumpty collections. So Proof of 
 product is where ?

 
 You're not missing anything, except for perhaps a little patience :o).
 
 There is vivid discussion about container support in D2. I was actually
 thinking of posting a RFC sooner or later here.
 
 It is very exciting to realize that with minimal (and actually
 simplifying) changes to the language we can offer the option of
 garbage-collected vs. reference counted vs. "unsafe" semi-automatically
 managed memory. Moreover, I realized that the global allocation model
 only affects the built-in literal types such as T[] and T[U], but still
 allows using the other models as library types.
 
 It has become increasingly clear we'll need to support arrays in
 addition to slices. One big question mark for me is, should we define
 for arrays (and associative arrays and containers in general) value
 semantics, reference counted value semantics, or reference semantics? It
 is not clear to me yet what the best option is. Think of arrays as an
 archetypal example.
 
 1. Value semantics (arrays are like int)
 
 + Simple to reason about
 + No long-distance sharing of state
 + Familiar to refugees coming from C++ with STL

 + Relatively simple to implement
 + Good behavior as members inside structs and classes
 - Bad behavior as function parameters: if you forget to pass by
 reference, performance will be harmed
 
 2. Value semantics with reference counting
 
 +- All tradeoffs of value semantics, except as noted below
 - Difficult to implement
 + Cheap to copy around, will "divide" only upon mutation attempts
 - Innocuous operations may throw/occasionally take a long time
 
 3. Reference semantics
 
 - Hard to reason about: squirrel an array into an object, and way later
 and farther it modifies the array and effects a change at long distance
 - Long-distance effects may as well be some of the toughest bugs after
 memory errors
 + Cheap to copy

 - Less familiar to C++ refugees
 - Potentially inefficient as people might start sprinkling copies in the
 hope the eliminate bugs or chances of bugs
 + const system helps: if a function doesn't change something just mark
 it as const and you're home free
 + immutable also helps: immutable data can never change so it can be
 shared no problem. Not sure how interesting immutable containers are though.
 - Implementing fast/unsafe semantics with reference semantics
 effectively forces reference counting to be part of containers, which
 complicates implementation.
 
 Any thoughts, just share! For the record, Walter has a bias in favor
 of... I'll better don't bias you guys.
 
 One thing: it would be relatively easy to switch among semantics by 
 defining a wrapper such as Value!(T) or Ref!(T). So whatever choice we 
 make, it won't paint us into a corner.
 
 
 Andrei

I'm a C++ refugee and value semantics make no sense to me. I always declared
classes (as part of function declaration) in C++ with a &... Either T& or const
T&. There were plenty of bugs where I messed up and forgot the &. D is nice in
that it tries to simplify that. You yourself had issues with remembering ref
parameters when implementing ranges!

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 I'm a C++ refugee and value semantics make no sense to me. I always
 declared classes (as part of function declaration) in C++ with a &...
 Either T& or const T&. There were plenty of bugs where I messed up
 and forgot the &. D is nice in that it tries to simplify that. You
 yourself had issues with remembering ref parameters when implementing
 ranges!

Good point, but the issue boils down to this: do we define container 
members more often, or define container function parameters more often? 
Because inside a data structure value semantics are useful more often 
than not, and in a function parameter list reference semantics are 
useful more often than not.

I do agree that D programmers grown used to cheap-to-pass data and I 
don't want to cure anyone of hiccups :o).


Andrei

Apr 29 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Jason House wrote:
 I'm a C++ refugee and value semantics make no sense to me. I always
 declared classes (as part of function declaration) in C++ with a &...
 Either T& or const T&. There were plenty of bugs where I messed up
 and forgot the &. D is nice in that it tries to simplify that. You
 yourself had issues with remembering ref parameters when implementing
 ranges!

 
 Good point, but the issue boils down to this: do we define container 
 members more often, or define container function parameters more often? 
 Because inside a data structure value semantics are useful more often 
 than not, and in a function parameter list reference semantics are 
 useful more often than not.
 
 I do agree that D programmers grown used to cheap-to-pass data and I 
 don't want to cure anyone of hiccups :o).

Will it be an incurable mess if you let the programmer choose each time? 
(Then we'd of course have this same discussion about which is the 
default. :-) ) But still, I can see situations where it is _obvious_ 
which should be used.

Somebody into more functional style would have different needs than the 
more imperative guy. Not to mention more specific situations.

Apr 30 2009

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 1. Value semantics (arrays are like int)

Don't forget:
 + Supports timely destruction of contents (i.e. RAII).

 2. Value semantics with reference counting

I like this optimization and use it all the time in my own code, but I'm
not convinced that it should be the default.  It's also problematic in
multithreaded situations.

I think a generic CopyOnWrite wrapper over arbitrary value types would
be more useful.  CopyOnWrite!(int[]).

 3. Reference semantics

I'm strongly opposed to this option.  Either of the other options would
be acceptable.


-- 
Rainer Deyke - rainerd eldwood.com

Apr 29 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd eldwood.com> said:

 Andrei Alexandrescu wrote:
 2. Value semantics with reference counting

 
 I like this optimization and use it all the time in my own code, but I'm
 not convinced that it should be the default.  It's also problematic in
 multithreaded situations.
 
 I think a generic CopyOnWrite wrapper over arbitrary value types would
 be more useful.  CopyOnWrite!(int[]).

But don't forget that D2 is going to have the shared keyword, possibly 
with thread-local heaps. So multithreading won't be much of a problem: 
if the variable is shared, the compiler should force locks as needed or 
use atomic operations; if the variable is local to a thread, locks 
should become no-ops and atomic operations regular ones, improving 
performance.

Or at least that's what I'm getting from the hints we currently have.


 3. Reference semantics

 
 I'm strongly opposed to this option.  Either of the other options would
 be acceptable.

Andrei mentioned a couple of reasons against 3, which are yours?


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 30 2009

Rainer Deyke <rainerd eldwood.com> writes:

Michel Fortin wrote:
 On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd eldwood.com> said:
 Andrei Alexandrescu wrote:
 3. Reference semantics

 I'm strongly opposed to this option.  Either of the other options would
 be acceptable.

 
 Andrei mentioned a couple of reasons against 3, which are yours?

I agree with all of Andrei's reasons.

In addition, there's this: suppose you have a struct containing a
(mutable) array.  When you make a copy of that struct, you will almost
always want to make a copy of the contained array.  Therefore value
semantics should be the default, because it simplifies the most common
use case.


-- 
Rainer Deyke - rainerd eldwood.com

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

That's what struct ctors/dtors/opAssign are for. Personally, I think  
letting the struct designer decide the behaviour is better.

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com> 
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

 
 That's what struct ctors/dtors/opAssign are for. Personally, I think 
 letting the struct designer decide the behaviour is better.

The problem is what to do for the containers in Phobos.

Andrei

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Apr 2009 15:21:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>  
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

  That's what struct ctors/dtors/opAssign are for. Personally, I think  
 letting the struct designer decide the behaviour is better.

 The problem is what to do for the containers in Phobos.

 Andrei

Yes. I agree. Sorry, looking back, I interpreted what Rainer was talking  
about as applying to all structs and not as a supporting argument. It  
would have helped my understanding if the argument was concrete, i.e.  
about Rainer's actual experiences, and not a broad over-generalization.

Now, back on topic. I actually make heavy use of what amounts to structs  
containing arrays and don't want a deep copy >90% of the time. In fact, I  
haven't used any of the (my array-my array) deep copy functions and I only  
do copying when I have to convert my arrays to D style arrays or other  
container types. Also, D arrays are structs that contain C style arrays  
and I almost never dup them in final code. When I debug/prototype I do  
tend to use []= and dup more often. So I'd disagree that you'd almost  
always want to make a copy. Second, I don't think we should simplify the  
most common use case, but instead the majority of use cases. This is  
partly semantics, but partly not: everybody uses containers differently.

Also, Andrei, I would had preferred it if you had separated the discussion  
on arrays from Phobos containers. Many languages have arrays and a  
separate container library and I feel the decision of whether D's arrays  
should behave like Phobos' containers would be best left to after the best  
decision for Phobos is made.

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 Now, back on topic. I actually make heavy use of what amounts to structs 
 containing arrays and don't want a deep copy >90% of the time. In fact, 
 I haven't used any of the (my array-my array) deep copy functions and I 
 only do copying when I have to convert my arrays to D style arrays or 
 other container types. Also, D arrays are structs that contain C style 
 arrays and I almost never dup them in final code. When I debug/prototype 
 I do tend to use []= and dup more often. So I'd disagree that you'd 
 almost always want to make a copy. Second, I don't think we should 
 simplify the most common use case, but instead the majority of use 
 cases. This is partly semantics, but partly not: everybody uses 
 containers differently.

It must be a matter of style. Having used both semantics extensively, I 
personally think one never looks back after using value containers. When 
you fail to pass by reference there's a performance hit and a logic bug 
(changes are not effected). Both are local issues that manifest 
immediately, close to the place of the mistake, and are trivial to fix. 
But undue sharing is a global, long-distance issue - one much harder to 
tackle.

I agree that reference semantics is what people got used to in Java and 
other languages. For entity types, reference semantics make a ton of 
sense, but I think containers are not entity classes most of the time. 
But I also agree that through experience one can design around undue 
sharing. But it should be remembered that it took years to shake off 
security bugs in the JDK caused by undue sharing. (One that comes to 
mind was a sensitive linked list returned by a system API. User code 
could maliciously change it to mess with the system's security manager. 
Solution? They returned a compulsory copy of the list.)

So I'm not sure where this leaves us. I hear two good arguments:

* Some aren't bothered by reference semantics
* We are used to reference semantics from other languages

These are good arguments, but don't quite end the discussion.

 Also, Andrei, I would had preferred it if you had separated the 
 discussion on arrays from Phobos containers. Many languages have arrays 
 and a separate container library and I feel the decision of whether D's 
 arrays should behave like Phobos' containers would be best left to after 
 the best decision for Phobos is made.

*If* we add a new array built-in type, that will fundamentally affect 
Phobos because Phobos will design all containers to be like the pair 
array-slice. If we don't add a new built-in type, then I only have 
Phobos to discuss.

Also something that wasn't discussed that much is the connection of 
whatever design we devise, with the GC. I am mightily excited by the 
possibility to operate without GC or with tight reference counting, and 
I thought many around here would share that excitement. If we go for 
non-gc ways of memory management, that will probably affect container 
design.


Andrei

Apr 30 2009

Benji Smith <dlanguage benjismith.net> writes:

Andrei Alexandrescu wrote:
 Also something that wasn't discussed that much is the connection of 
 whatever design we devise, with the GC. I am mightily excited by the 
 possibility to operate without GC or with tight reference counting, and 
 I thought many around here would share that excitement. If we go for 
 non-gc ways of memory management, that will probably affect container 
 design.

Just out of curiosity, why do you like reference counting more than 
mark/sweep for containers?

--benji

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Also something that wasn't discussed that much is the connection of 
 whatever design we devise, with the GC. I am mightily excited by the 
 possibility to operate without GC or with tight reference counting, 
 and I thought many around here would share that excitement. If we go 
 for non-gc ways of memory management, that will probably affect 
 container design.

 
 Just out of curiosity, why do you like reference counting more than 
 mark/sweep for containers?

Who told you I like it? All I do is acknowledge that RC and MS have 
different tradeoffs that may form the basis of a preference.

Andrei

May 01 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

 
 That's what struct ctors/dtors/opAssign are for. Personally, I think
 letting the struct designer decide the behaviour is better.

I hope you're not suggesting writing ctor/dtor/opAssign for every single
struct that uses arrays as value types.

It's possible to write a reference wrapper around a value type.  It's
also possible to write a value wrapper around a reference type.
However, the former is both easier and more efficient than the latter.


-- 
Rainer Deyke - rainerd eldwood.com

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

 That's what struct ctors/dtors/opAssign are for. Personally, I think
 letting the struct designer decide the behaviour is better.

 I hope you're not suggesting writing ctor/dtor/opAssign for every single
 struct that uses arrays as value types.

I'm wasn't _suggesting_ anything. This is how D currently works, and is  
how any D library you're using that has structs that uses arrays as value  
types, works. Being surprised at my response gives me the feeling that  
you've never needed to code what you're talking about. If you have, your  
concrete example would be helpful as a use case in this discussion.

 It's possible to write a reference wrapper around a value type.  It's
 also possible to write a value wrapper around a reference type.
 However, the former is both easier and more efficient than the latter.

Yes and no. Yes, the deep copy mixin is more complex to write the first  
time, but after that it's easy to use. And second, it's just as efficient  
as the later, since the deep copy mixin generates the same code as a value  
type.

Apr 30 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 It's possible to write a reference wrapper around a value type.  It's
 also possible to write a value wrapper around a reference type.
 However, the former is both easier and more efficient than the latter.

 
 Yes and no. Yes, the deep copy mixin is more complex to write the first
 time, but after that it's easy to use. And second, it's just as
 efficient as the later, since the deep copy mixin generates the same
 code as a value type.

A simple deep-copy struct wrapper around a class type is horribly
inefficient compared to a direct struct type:
  - It uses an extra heap allocation per instance.
  - It uses an extra level of indirection.
  - It uses virtual function calls.
  - It allocates an extra reference to a virtual function table per
instance.

I suppose a very clever wrapper metaprogram or a really clever
optimizing compiler might eliminate some of these inefficiencies.


-- 
Rainer Deyke - rainerd eldwood.com

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 It's possible to write a reference wrapper around a value type.  It's
 also possible to write a value wrapper around a reference type.
 However, the former is both easier and more efficient than the latter.

 Yes and no. Yes, the deep copy mixin is more complex to write the first
 time, but after that it's easy to use. And second, it's just as
 efficient as the later, since the deep copy mixin generates the same
 code as a value type.

 A simple deep-copy struct wrapper around a class type is horribly
 inefficient compared to a direct struct type:

Reference semantics don't require classes. D arrays have reference  
semantics, but are implemented as structs. Phobos could do something  
similar.

   - It uses an extra heap allocation per instance.

And during a deep copy, you're often making a lot of heap copies. (N vs  
N+1 when N>>1)

   - It uses an extra level of indirection.
   - It uses virtual function calls.

Not if the class is final.

   - It allocates an extra reference to a virtual function table per
 instance.

Huh? Isn't that part of the extra heap allocation?

Also, a really important question is: is it possible to implement are  
shared, lock-free containers as value types?

Apr 30 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
   - It uses an extra heap allocation per instance.

 
 And during a deep copy, you're often making a lot of heap copies. (N vs
 N+1 when N>>1)

In the case of dynamic arrays, N = 1 (or possibly 0 if the small array
optimization is used).  In the case of static arrays, N = 0.  Either
way, the extra level of indirection is significant.

   - It allocates an extra reference to a virtual function table per
 instance.

 
 Huh? Isn't that part of the extra heap allocation?

Extra allocations: 1.

Extra memory usage: 2 words (virtual function table pointer, local
pointer) plus heap overhead for one allocation.

 Also, a really important question is: is it possible to implement are
 shared, lock-free containers as value types?


Is it possible to implement them as reference types?  Why would the
issues different between reference types and value types?  A reference
type is just a reference wrapper around a value type.


-- 
Rainer Deyke - rainerd eldwood.com

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 14:03:44 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
   - It uses an extra heap allocation per instance.

 And during a deep copy, you're often making a lot of heap copies. (N vs
 N+1 when N>>1)

 In the case of dynamic arrays, N = 1 (or possibly 0 if the small array
 optimization is used).  In the case of static arrays, N = 0.  Either
 way, the extra level of indirection is significant.

In the case of dynamic arrays, it's a struct. So no extra heap allocation  
is done. Same goes with heaps. Your argument only applies to object based  
containers, like trees, which generally contain lots of child objects.,  
i.e. N heap allocations.

   - It allocates an extra reference to a virtual function table per
 instance.

 Huh? Isn't that part of the extra heap allocation?

 Extra allocations: 1.

 Extra memory usage: 2 words (virtual function table pointer, local
 pointer) plus heap overhead for one allocation.

Nope. Most of the time, the extra memory usage is 0. Remember, the GC uses  
large blocks with power of 2 sizes. Unless you happen to push across a  
boundry, the extra 2 words don't matter.

 Also, a really important question is: is it possible to implement are
 shared, lock-free containers as value types?


 Is it possible to implement them as reference types?

Yes. See java.util.concurrent for examples. In fact, all of the algorithms  
I've read explicitly don't support copying, which is why I'm asking.

 Why would the
 issues different between reference types and value types?

A value type has to support copying. Reference types don't. Perhaps I  
should have said value semantics / reference semantics instead.

 A reference
 type is just a reference wrapper around a value type.

Precisely, so it's always passed by reference and doesn't need to support  
copying.

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the 
 algorithms I've read explicitly don't support copying, which is why I'm 
 asking.
 
 Why would the
 issues different between reference types and value types?

 
 A value type has to support copying. Reference types don't. Perhaps I 
 should have said value semantics / reference semantics instead.

I thought lock-free data structures are the quintessential COW type. You 
might as well just have provided the perfect argument against your case.

Andrei

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 15:07:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the  
 algorithms I've read explicitly don't support copying, which is why I'm  
 asking.

 Why would the
 issues different between reference types and value types?

  A value type has to support copying. Reference types don't. Perhaps I  
 should have said value semantics / reference semantics instead.

 I thought lock-free data structures are the quintessential COW type. You  
 might as well just have provided the perfect argument against your case.

 Andrei

Andrei, I've never seen a COW lock-free algorithm. In fact, to the best of  
my knowledge, you can only copy lock-free hash tables and arrays.  
Everything else, even stacks or queues implemented on an array, don't  
work. For instance, the issues with walkers on singly linked lists is a  
well known problem. I've read several paper on lock-free queues and  
lock-free stacks, and none have mentioned copying. They simply ignore it.  
(The hash table talk implied it, but didn't mention it explicitly) So if  
you have a paper on how to do a lock free stack, queue, etc that supports  
copying I'd love to read it. (which is one reason why I'm asking)

Andrei, you might be thinking about STM, which follows a transactional  
model: lazy copy, mutate copy, publish-retry changes. (Technically, under  
the hood the publish takes a lock. It's just that by taking all the  
required locks at once in a known manner, deadlocks are avoided. And the  
locks are taken for a much shorter time period.(Caveat: there are other  
ways of doing STM, and I don't know what D's specific plans are)) This is  
generally considered to be different from specific lock-free containers.  
(I'm pretty sure that simple STM containers would have worse performance  
than a lock version, let alone a lock-free version, but I've haven't seen  
data for STM used with simple containers, like a stack, queue or tree, so  
I don't know for sure. Usually, you see it with unstructured objects )

I should probably have used the term Concurrent Collections as opposed to  
lock-free collections.

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Fri, 01 May 2009 15:07:17 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the 
 algorithms I've read explicitly don't support copying, which is why 
 I'm asking.

 Why would the
 issues different between reference types and value types?

  A value type has to support copying. Reference types don't. Perhaps 
 I should have said value semantics / reference semantics instead.

 I thought lock-free data structures are the quintessential COW type. 
 You might as well just have provided the perfect argument against your 
 case.

 Andrei

 
 Andrei, I've never seen a COW lock-free algorithm.

Maybe we have some terms confused. Yes, lock-free singly-linked lists 
and stacks (and some dictionaries) that are defined in the literature 
are meant to be shared and use minimal and coherent CAS-based updates. 
This is quite a feat because copying is the easy way to go and doesn't 
quite deserve a peer-reviewed conference paper. However, Maged Michael 
and I co-wrote a few magazine articles on the topic.

The general structure of a COW lock-free object is:

struct Something { ... }

struct LockFree
{
     Something * s;

     void makeSomeChange()
     {
         do {
             Something * copy = s.deepDup;
             copy.makeSomeChange;
         } while (!cas(copy, s));
     }
}

For example if Something is some array and you want to insert elements 
in it atomically, you make a copy of the array, insert stuff, and then 
substitute the copy for the original array with care such that the array 
stayed in place (otherwise your insertions aren't doing what you 
initially thought).

 Andrei, you might be thinking about STM, which follows a transactional 
 model: lazy copy, mutate copy, publish-retry changes.

My understanding is that STM is a generalized, multi-object, 
transparent, and systematic way of doing CAS-based work (please correct 
me if I'm wrong, I'm not an expert in STM).


Andrei

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 18:09:09 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Fri, 01 May 2009 15:07:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Yes. See java.util.concurrent for examples. In fact, all of the  
 algorithms I've read explicitly don't support copying, which is why  
 I'm asking.

 Why would the
 issues different between reference types and value types?

  A value type has to support copying. Reference types don't. Perhaps  
 I should have said value semantics / reference semantics instead.

 I thought lock-free data structures are the quintessential COW type.  
 You might as well just have provided the perfect argument against your  
 case.

 Andrei

  Andrei, I've never seen a COW lock-free algorithm.

 Maybe we have some terms confused. Yes, lock-free singly-linked lists  
 and stacks (and some dictionaries) that are defined in the literature  
 are meant to be shared and use minimal and coherent CAS-based updates.  
 This is quite a feat because copying is the easy way to go and doesn't  
 quite deserve a peer-reviewed conference paper. However, Maged Michael  
 and I co-wrote a few magazine articles on the topic.

 The general structure of a COW lock-free object is:

 struct Something { ... }

 struct LockFree
 {
      Something * s;

      void makeSomeChange()
      {
          do {
              Something * copy = s.deepDup;
              copy.makeSomeChange;
          } while (!cas(copy, s));
      }
 }

 For example if Something is some array and you want to insert elements  
 in it atomically, you make a copy of the array, insert stuff, and then  
 substitute the copy for the original array with care such that the array  
 stayed in place (otherwise your insertions aren't doing what you  
 initially thought).

Ah, thank you. I would call this copy-on-mutate, rather than copy-on-write  
(though this is semantics). The COW that I normally think about has value  
semantics. In copy-on-mutate, the copying is done under the hood in order  
to provide lock-free properties. Do you have a link to your article? I'd  
like to know the relative performance. Though somehow changing O(1)  
algorithms that rarely touch the GC for O(n*t) that abuse the GC, provokes  
a certain gut reaction. This style of programming seems to need STM, so  
you can group a bunch of mutations into a single transaction, however, a  
lot of the queue/stack use cases are inherently single mutate  
transactions. What really worries me, is that lock-free queues and stacks  
are the building blocks of a lot of concurrency code, so making them fast  
is very important.

 Andrei, you might be thinking about STM, which follows a transactional  
 model: lazy copy, mutate copy, publish-retry changes.

 My understanding is that STM is a generalized, multi-object,  
 transparent, and systematic way of doing CAS-based work (please correct  
 me if I'm wrong, I'm not an expert in STM).

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 Do you have a link to 
 your article? 

http://tinyurl.com/dac82a


Andrei

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 19:25:35 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 Do you have a link to your article?

 http://tinyurl.com/dac82a


 Andrei

Should've seen that one coming. :)

Anyways, I'm not sure how you can call the technique lock-free, since  
you're doing (possibly several) allocations inside the inner CAS loop. (I  
guess a good, parallel allocator might negate this) It's probably not an  
issue for the article's use case, where reads vastly dominated updates,  
but for things like stacks or queues, it wouldn't work. And poor  
performance leads people to create their own fast, possibly buggy  
versions, thus defeating the point of a standard library.

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Fri, 01 May 2009 19:25:35 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 Do you have a link to your article?

 http://tinyurl.com/dac82a


 Andrei

 
 Should've seen that one coming. :)
 
 Anyways, I'm not sure how you can call the technique lock-free, since 
 you're doing (possibly several) allocations inside the inner CAS loop. 

No, repeated allocations are trivial to eliminate. I didn't even bother 
to explain that in the article. The loop must only refill the allocated 
object from the object that needs to be replaced.

 (I guess a good, parallel allocator might negate this) It's probably not 
 an issue for the article's use case, where reads vastly dominated 
 updates, but for things like stacks or queues, it wouldn't work. And 
 poor performance leads people to create their own fast, possibly buggy 
 versions, thus defeating the point of a standard library.

Incidentally Maged Michael himself wrote the first lock-free malloc().


Andrei

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 02:34:57 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Fri, 01 May 2009 19:25:35 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 Do you have a link to your article?

 http://tinyurl.com/dac82a


 Andrei

  Should've seen that one coming. :)
  Anyways, I'm not sure how you can call the technique lock-free, since  
 you're doing (possibly several) allocations inside the inner CAS loop.

 No, repeated allocations are trivial to eliminate. I didn't even bother  
 to explain that in the article. The loop must only refill the allocated  
 object from the object that needs to be replaced.

putting the test in the pseudo code for this would've help my  
understanding.

if(copy is null)
     copy = s.deepDup;
else
     s.deepDupTo(copy);

 (I guess a good, parallel allocator might negate this) It's probably  
 not an issue for the article's use case, where reads vastly dominated  
 updates, but for things like stacks or queues, it wouldn't work. And  
 poor performance leads people to create their own fast, possibly buggy  
 versions, thus defeating the point of a standard library.

 Incidentally Maged Michael himself wrote the first lock-free malloc().

Cool.

 Andrei

But Andrei, I really don't care about the existence of a lock-free  
algorithm with value semantics. I only care about a practical lock-free  
algorithm with value semantics. This approach, while elegant in its  
simplicity and good for some situations, is very poor at high update /  
contention objects, like stacks and queues. Maybe, locked algorithms are  
the best value semantics can do in some situations and this might result  
in separate fast reference containers. Which is a drawback to value  
semantics and should be placed into the big pros/cons pile. Of course, I  
could be missing something (again).

P.S. A general thank you all your posts and explanations.

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even 
 bother to explain that in the article. The loop must only refill the 
 allocated object from the object that needs to be replaced.

 
 putting the test in the pseudo code for this would've help my 
 understanding.
 
 if(copy is null)
     copy = s.deepDup;
 else
     s.deepDupTo(copy);

copy = cast(T*) GC.malloc(T.sizeof);
do {
     overwrite(s, copy);
     copy.modify;
} while (!cas(copy, s));

 But Andrei, I really don't care about the existence of a lock-free 
 algorithm with value semantics. I only care about a practical lock-free 
 algorithm with value semantics. This approach, while elegant in its 
 simplicity and good for some situations, is very poor at high update / 
 contention objects, like stacks and queues. Maybe, locked algorithms are 
 the best value semantics can do in some situations and this might result 
 in separate fast reference containers. Which is a drawback to value 
 semantics and should be placed into the big pros/cons pile.

I agree.


Andrei

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even  
 bother to explain that in the article. The loop must only refill the  
 allocated object from the object that needs to be replaced.

  putting the test in the pseudo code for this would've help my  
 understanding.
  if(copy is null)
     copy = s.deepDup;
 else
     s.deepDupTo(copy);

 copy = cast(T*) GC.malloc(T.sizeof);
 do {
      overwrite(s, copy);
      copy.modify;
 } while (!cas(copy, s));

I'm confused. The article talks about copying the entire data structure,  
not just a node/etc. And trees, etc tend to have variable sizes, etc.

 But Andrei, I really don't care about the existence of a lock-free  
 algorithm with value semantics. I only care about a practical lock-free  
 algorithm with value semantics. This approach, while elegant in its  
 simplicity and good for some situations, is very poor at high update /  
 contention objects, like stacks and queues. Maybe, locked algorithms  
 are the best value semantics can do in some situations and this might  
 result in separate fast reference containers. Which is a drawback to  
 value semantics and should be placed into the big pros/cons pile.

 I agree.


 Andrei

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even 
 bother to explain that in the article. The loop must only refill the 
 allocated object from the object that needs to be replaced.

  putting the test in the pseudo code for this would've help my 
 understanding.
  if(copy is null)
     copy = s.deepDup;
 else
     s.deepDupTo(copy);

 copy = cast(T*) GC.malloc(T.sizeof);
 do {
      overwrite(s, copy);
      copy.modify;
 } while (!cas(copy, s));

 
 I'm confused. The article talks about copying the entire data structure, 
 not just a node/etc. And trees, etc tend to have variable sizes, etc.

You can reuse memory even if it comprehends more complex patterns than 
one single allocation.

Andrei

May 02 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sat, 02 May 2009 18:08:30 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even  
 bother to explain that in the article. The loop must only refill the  
 allocated object from the object that needs to be replaced.

  putting the test in the pseudo code for this would've help my  
 understanding.
  if(copy is null)
     copy = s.deepDup;
 else
     s.deepDupTo(copy);

 copy = cast(T*) GC.malloc(T.sizeof);
 do {
      overwrite(s, copy);
      copy.modify;
 } while (!cas(copy, s));

  I'm confused. The article talks about copying the entire data  
 structure, not just a node/etc. And trees, etc tend to have variable  
 sizes, etc.

 You can reuse memory even if it comprehends more complex patterns than  
 one single allocation.

 Andrei

I *highly* doubt it is worth the trouble. Most likely, this container won't be
lock-free and scalable anymore. Performance will also degrade dramatically.

Besides, the more I think about thread-local/shared separation that is going to
come to D2, the more I see that there is absolutely no need to make Phobos
containers thread-safe. Most containers will be thread-local and won't need any
synchronization at all.

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 You can reuse memory even if it comprehends more complex patterns
 than one single allocation.
 
 Andrei

 
 I *highly* doubt it is worth the trouble. Most likely, this container
 won't be lock-free and scalable anymore. Performance will also
 degrade dramatically.

How did you get to "most likely"? There's so many factors in the 
equation. What is the alternative? Why wouldn't the code be lock-free? 
Why would it not be scalable?

 Besides, the more I think about thread-local/shared separation that
 is going to come to D2, the more I see that there is absolutely no
 need to make Phobos containers thread-safe. Most containers will be
 thread-local and won't need any synchronization at all.

I agree. But you also don't want to preclude using a shared container 
either.


Andrei

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 10:23:13 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 You can reuse memory even if it comprehends more complex patterns
 than one single allocation.
  Andrei

  I *highly* doubt it is worth the trouble. Most likely, this container
 won't be lock-free and scalable anymore. Performance will also
 degrade dramatically.

 How did you get to "most likely"? There's so many factors in the  
 equation. What is the alternative? Why wouldn't the code be lock-free?  
 Why would it not be scalable?

Because it dies a horrible death under contention. It's lock-free, but it  
isn't concurrent. Worse, the inner CAS loop is actually doing a lot of  
work, resulting in a lots of wasted CPU cycles. So as long as the write  
rarely, read many paradigm holds, it works. But try this with a stack or a  
queue and a good locking algorithm will beat you every time. Also, there's  
an issue with stale values for readers, but that's a separate thing.

 Besides, the more I think about thread-local/shared separation that
 is going to come to D2, the more I see that there is absolutely no
 need to make Phobos containers thread-safe. Most containers will be
 thread-local and won't need any synchronization at all.

 I agree. But you also don't want to preclude using a shared container  
 either.

I think Phobos containers should come in two forms: shared and local. i.e.
shared Queue!T sq;   /// and
        Queue!T  q;
They don't need to be there at day one, but I think if they aren't able to  
be supported (or slow as molasses), well end of with two container sets  
and lots of code duplication.

May 02 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sat, 02 May 2009 18:17:16 +0400, Denis Koroskin <2korden gmail.com> wrote:

 On Sat, 02 May 2009 18:08:30 +0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 No, repeated allocations are trivial to eliminate. I didn't even
 bother to explain that in the article. The loop must only refill the
 allocated object from the object that needs to be replaced.

  putting the test in the pseudo code for this would've help my
 understanding.
  if(copy is null)
     copy = s.deepDup;
 else
     s.deepDupTo(copy);

 copy = cast(T*) GC.malloc(T.sizeof);
 do {
      overwrite(s, copy);
      copy.modify;
 } while (!cas(copy, s));

  I'm confused. The article talks about copying the entire data
 structure, not just a node/etc. And trees, etc tend to have variable
 sizes, etc.

 You can reuse memory even if it comprehends more complex patterns than
 one single allocation.

 Andrei

 I *highly* doubt it is worth the trouble. Most likely, this container  
 won't be lock-free and scalable anymore. Performance will also degrade  
 dramatically.

 Besides, the more I think about thread-local/shared separation that is  
 going to come to D2, the more I see that there is absolutely no need to  
 make Phobos containers thread-safe. Most containers will be thread-local  
 and won't need any synchronization at all.

I meant, most essential ones (Array, Set, Map, HashMap, SList, DList,
IntrusiveSList, IntrusiveDList etc).
Lock-free containers are needed, too, but I believe they should be designed
after thread-unsafe ones.

May 02 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

All containers need to support copying.


-- 
Rainer Deyke - rainerd eldwood.com

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

 
 All containers need to support copying.

Speaking of which, I'm thinking of a field like scientific computing 
(even of the simplest, most basic kind that we all dabble into) or 
applied math. People who use for example matrices would NOT expect them 
to have reference semantics. They'd find it very confusing if a = b 
would not make matrix "a" a copy of matrix "b" and so on. (That + no 
operator overloading = R.I.P. my excitement for doing numeric work in Java.)

It would work to ask people who actually use Matrix!double to wrap it in 
a Value!(Matrix!double). But say someone asks, whatever, but if 
Value!(Matrix!double) is a matrix, then what is Matrix? Well, I'd reply, 
Matrix is actually a reference to a matrix. Then, they'll ask, why don't 
you call what you call today "Matrix", RefMatrix or Ref!Matrix or 
whatever, and call a Matrix a matrix? Um, I don't know. That's what the 
buzz was on the newsgroup when we were thinking of it. Some said that's 
what people coming from Java expect.

I guess at that point the would-be D user would be entitled to make me a 
lavaliere out of my Matrix library and move on.


Andrei

May 01 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Rainer Deyke wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

 All containers need to support copying.

 Speaking of which, I'm thinking of a field like scientific computing (even
 of the simplest, most basic kind that we all dabble into) or applied math.
 People who use for example matrices would NOT expect them to have reference
 semantics. They'd find it very confusing if a = b would not make matrix "a"
 a copy of matrix "b" and so on. (That + no operator overloading = R.I.P. my
 excitement for doing numeric work in Java.)

 It would work to ask people who actually use Matrix!double to wrap it in a
 Value!(Matrix!double). But say someone asks, whatever, but if
 Value!(Matrix!double) is a matrix, then what is Matrix? Well, I'd reply,
 Matrix is actually a reference to a matrix. Then, they'll ask, why don't you
 call what you call today "Matrix", RefMatrix or Ref!Matrix or whatever, and
 call a Matrix a matrix? Um, I don't know. That's what the buzz was on the
 newsgroup when we were thinking of it. Some said that's what people coming
 from Java expect.

 I guess at that point the would-be D user would be entitled to make me a
 lavaliere out of my Matrix library and move on.

Python, and therefore NumPy, are reference based.
So far I haven't seen any scientists posting on the list about how
having their arrays and matrices be references is driving them crazy.
It may be surprising to some of them at first, but even non-hard-core
coders seem to be able to handle it.   It helps that dummy assignments
like a = b are rare.   More often you have things like a = b + c, and
that creates a new matrix.  Or  a += b, which pretty clearly mutates
a.

Much more often the discussion on the numpy list takes the form of
"how do I make this loop faster" becuase loops are slow in Python so
you have to come up with clever transformations to turn your loop into
array ops.  This is thankfully a problem that D array libs do not
have.  If you think of it as a loop, go ahead and implement it as a
loop.  That's why I prefer to do numerical work in D rather than NumPy
these days.

--bb

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I guess at that point the would-be D user would be entitled to make me a
 lavaliere out of my Matrix library and move on.

 
 Python, and therefore NumPy, are reference based.

In a language that's reference based, things can't be helped. They can 
be helped in D though. I wouldn't want to later cringe that I missed the 
opportunity.

 So far I haven't seen any scientists posting on the list about how
 having their arrays and matrices be references is driving them crazy.
 It may be surprising to some of them at first, but even non-hard-core
 coders seem to be able to handle it.

To me this sounds more like an opportunity to do the right thing.

 It helps that dummy assignments
 like a = b are rare.   More often you have things like a = b + c, and
 that creates a new matrix.  Or  a += b, which pretty clearly mutates
 a.

Oh rly. How about this then?

Matrix a, b, c;
...
c = a;
a += b;

Does the last operation mutate c as well?


Andrei

May 01 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I guess at that point the would-be D user would be entitled to make me =



a
 lavaliere out of my Matrix library and move on.

 Python, and therefore NumPy, are reference based.

 In a language that's reference based, things can't be helped. They can be
 helped in D though. I wouldn't want to later cringe that I missed the
 opportunity.

 So far I haven't seen any scientists posting on the list about how
 having their arrays and matrices be references is driving them crazy.
 It may be surprising to some of them at first, but even non-hard-core
 coders seem to be able to handle it.

 To me this sounds more like an opportunity to do the right thing.

Could be.  I'm not arguing that refs are best, just giving some
evidence from the NumPy community to counter your assertion that
scientists would be confused by reference-based array and matrix
types.


 It helps that dummy assignments
 like a =3D b are rare. =A0 More often you have things like a =3D b + c, =


and
 that creates a new matrix. =A0Or =A0a +=3D b, which pretty clearly mutat=


es
 a.

 Oh rly. How about this then?

 Matrix a, b, c;
 ...
 c =3D a;
 a +=3D b;

 Does the last operation mutate c as well?

I said "assignments like a =3D b are rare" and you put one of those in
your example.  Yes, when you have an a=3Db anywhere you've got to pay
attention and make sure you didn't mean a=3Db.dup.

And the other thing is that when you pass a matrix to a function, you
need to be aware of whether the function mutates that matrix or not.
That's really the biggest pain, and source of hardest to find bugs.

Matlab goes the other extreme and makes everything value semantics.
It does a lot of COW optimizations under the hood to make it tolerably
efficient even when tossing around big matrices.  But it's annoying if
you want to write a function to make a small update to an existing
matrix.  You have no choice but to return a copy and hope that
Matlab's optimizer is smart enough to eliminate the unnecessary copy.

I agree with the poster below that what I usually want is value
semantics with small values and matrices, reference semantics with big
matrices.

--bb

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 4:23 PM, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 I guess at that point the would-be D user would be entitled to
 make me a lavaliere out of my Matrix library and move on.

 Python, and therefore NumPy, are reference based.

 In a language that's reference based, things can't be helped. They
 can be helped in D though. I wouldn't want to later cringe that I
 missed the opportunity.
 
 So far I haven't seen any scientists posting on the list about
 how having their arrays and matrices be references is driving
 them crazy. It may be surprising to some of them at first, but
 even non-hard-core coders seem to be able to handle it.

 To me this sounds more like an opportunity to do the right thing.

 
 Could be.  I'm not arguing that refs are best, just giving some 
 evidence from the NumPy community to counter your assertion that 
 scientists would be confused by reference-based array and matrix 
 types.
 
 
 It helps that dummy assignments like a = b are rare.   More often
 you have things like a = b + c, and that creates a new matrix.
 Or  a += b, which pretty clearly mutates a.

 Oh rly. How about this then?
 
 Matrix a, b, c; ... c = a; a += b;
 
 Does the last operation mutate c as well?

 
 I said "assignments like a = b are rare" and you put one of those in 
 your example.

You said it and I don't buy it one femtosecond. Code does need to copy
values around. It's a fundamental operation!

 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

That is terrible. Anywhere == a portion of the code far, far away.

 And the other thing is that when you pass a matrix to a function, you
  need to be aware of whether the function mutates that matrix or not.
  That's really the biggest pain, and source of hardest to find bugs.

I bet. Things get even more unpleasant when you have object that have 
reference members. Then the default constructor does The Wrong Thing(tm) 
- many C++ pundits and authors have eaten many good lunches with the 
money they got discussing the problem.

 Matlab goes the other extreme and makes everything value semantics. 
 It does a lot of COW optimizations under the hood to make it
 tolerably efficient even when tossing around big matrices.  But it's
 annoying if you want to write a function to make a small update to an
 existing matrix.  You have no choice but to return a copy and hope
 that Matlab's optimizer is smart enough to eliminate the unnecessary
 copy.

Yah. In D, there's ref.


Andrei

May 02 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-05-02 10:18:41 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

 
 That is terrible. Anywhere == a portion of the code far, far away.

It's funny that you mention ==, since anywhere you write = you also got 
to pay attention that you didn't really mean ==.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
  Matrix a, b, c; ... c = a; a += b;
  Does the last operation mutate c as well?

  I said "assignments like a = b are rare" and you put one of those in  
 your example.

 You said it and I don't buy it one femtosecond. Code does need to copy
 values around. It's a fundamental operation!

Andrei, he said that explicit assignments/copies of arrays/containers are  
rare in numeric computing, which I agree with. Just because it's a  
fundamental operation doesn't mean it gets used much is his (or I guess  
Numply's actually) specific applications. Also, disregarding/disbelieving  
a use case is not helpful to this discussion.

 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

 That is terrible. Anywhere == a portion of the code far, far away.

No, he was talking about a local decision. i.e. Did I or didn't I mean to  
make a logical copy here?

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
  Matrix a, b, c; ... c = a; a += b;
  Does the last operation mutate c as well?

  I said "assignments like a = b are rare" and you put one of those in 
 your example.

 You said it and I don't buy it one femtosecond. Code does need to copy
 values around. It's a fundamental operation!

 
 Andrei, he said that explicit assignments/copies of arrays/containers 
 are rare in numeric computing, which I agree with. Just because it's a 
 fundamental operation doesn't mean it gets used much is his (or I guess 
 Numply's actually) specific applications. Also, 
 disregarding/disbelieving a use case is not helpful to this discussion.

He said something. That's about as much proof as was seen. I didn't buy 
it so I replied without assuming the same as him. Then he repeated "but 
I said that!" which upped the ante from supposition to presupposition. I 
think presuppositions are particularly pernicious so I felt the need to 
explicitly say that I don't believe it just because he said it. It's not 
a use case. It's just something that was said. If some sort of evidence 
is given, that would be great. Don't put the onus on me to disprove what 
was said.

 Yes, when you have an a=b anywhere you've got to pay attention and
 make sure you didn't mean a=b.dup.

 That is terrible. Anywhere == a portion of the code far, far away.

 
 No, he was talking about a local decision. i.e. Did I or didn't I mean 
 to make a logical copy here?

The local decision has effects that may go undetected until much later.


Andrei

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 17:45:16 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
  Matrix a, b, c; ... c = a; a += b;
  Does the last operation mutate c as well?

  I said "assignments like a = b are rare" and you put one of those in  
 your example.

 You said it and I don't buy it one femtosecond. Code does need to copy
 values around. It's a fundamental operation!

  Andrei, he said that explicit assignments/copies of arrays/containers  
 are rare in numeric computing, which I agree with. Just because it's a  
 fundamental operation doesn't mean it gets used much is his (or I guess  
 Numply's actually) specific applications. Also,  
 disregarding/disbelieving a use case is not helpful to this discussion.

 He said something. That's about as much proof as was seen. I didn't buy  
 it so I replied without assuming the same as him. Then he repeated "but  
 I said that!" which upped the ante from supposition to presupposition. I  
 think presuppositions are particularly pernicious so I felt the need to  
 explicitly say that I don't believe it just because he said it. It's not  
 a use case. It's just something that was said. If some sort of evidence  
 is given, that would be great. Don't put the onus on me to disprove what  
 was said.

I don't expect you to disprove what was said. I would hope that when you  
see someone over-generalizing their own or someone else's experience, you  
take it for what it was: a use case. That being said, I think there was  
some miscommunication.
Bill seems to be talking about no-op assignments

a = c;

while you've (probably correctly) hit on is:

a = foo(b);

which results in an assignment and happens everywhere.

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 19:23:45 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

  All containers need to support copying.

 Speaking of which, I'm thinking of a field like scientific computing  
 (even of the simplest, most basic kind that we all dabble into) or  
 applied math. People who use for example matrices would NOT expect them  
 to have reference semantics. They'd find it very confusing if a = b  
 would not make matrix "a" a copy of matrix "b" and so on. (That + no  
 operator overloading = R.I.P. my excitement for doing numeric work in  
 Java.)

 It would work to ask people who actually use Matrix!double to wrap it in  
 a Value!(Matrix!double). But say someone asks, whatever, but if  
 Value!(Matrix!double) is a matrix, then what is Matrix? Well, I'd reply,  
 Matrix is actually a reference to a matrix. Then, they'll ask, why don't  
 you call what you call today "Matrix", RefMatrix or Ref!Matrix or  
 whatever, and call a Matrix a matrix? Um, I don't know. That's what the  
 buzz was on the newsgroup when we were thinking of it. Some said that's  
 what people coming from Java expect.

 I guess at that point the would-be D user would be entitled to make me a  
 lavaliere out of my Matrix library and move on.


 Andrei

I do scientific computing. Generally, I find it breaks down into two  
parts: things under 4x4, for which value types are probably better, and  
everything else, for which value types are to be avoided like the plague.  
I'll often work with 100's mb of data with algorithms that take minutes to  
hours to complete. So an unexpected copy is both hard to find (am I  
slow/crashing because of my algorithm, or because of a typo?) and rather  
harmful, because its big. But I've generally worked on making something  
else fast so more data can be crunched, etc. Actual prototype work (for  
array/matrix based stuff at least) is often done in Matlab, which I think  
uses COW under-the-hood to provide value semantics. So I think anyone  
turning to D to do scientific computing will know reference semantics,  
since they'd already be familiar with them from C/C++, etc (Fortran?).  
Although successfully attracting algorithm prototypes from  
Matlab/python/mathmatica/R/etc is probably bigger issue than just the  
container types, growing the pie was why the Wii won the last console wars.

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 I do scientific computing. Generally, I find it breaks down into two 
 parts: things under 4x4, for which value types are probably better, and 
 everything else, for which value types are to be avoided like the 
 plague. I'll often work with 100's mb of data with algorithms that take 
 minutes to hours to complete. So an unexpected copy is both hard to find 
 (am I slow/crashing because of my algorithm, or because of a typo?) and 
 rather harmful, because its big.

I don't buy this. Undue copying is an issue that manifests itself 
locally, reproducibly, and debuggably. Contrast with long-distance 
coupling which is bound to hard to debug. You change a matrix here, and 
all of a sudden a matrix miles away has been messed up. Also, efficiency 
can be fixed with COW, whereas there is nothing you can do to fix the 
coupling aside from relentless and patient user education.

Walter gave me a good argument (little did he know he was making a point 
destroying his.) Consider the progress we made when replacing char[] 
with string. Why? Because with char[] long-distance dependencies crop up 
easy and fast. With string you know there's never going to be a 
long-distance dependency. Why? Because unlike char[], content 
immutability makes string as good as a value.

I remember the nightmare. I'd define a little structure:

struct Sentence
{
     uint id;
     char[] data;
}

Above my desk I have a big red bulb along with an audible alarm. As soon 
as I add the member "data", the bulb and the alarm go off. Sentence is 
now an invalid struct - I need to add at least constructor and a 
postblit. In the constructor I need to  call .dup on the incoming data, 
and in the postblit I need to do something similar (or something more 
complicated if I want to be efficient). This is a clear example of code 
that is short and natural, yet does precisely the wrong thing. This is 
simply a ton of trouble, as experience with C++ has shown.

I'm not even getting into calling functions that take a char[] and 
keeping fingers crossed ("I hope they won't mess with it") or .dup-ing 
prior to the call to eliminate any doubt (even though the function may 
anyway call .dup internally). string has marked huge progress towards 
people considering D seriously.

 But I've generally worked on making 
 something else fast so more data can be crunched, etc. Actual prototype 
 work (for array/matrix based stuff at least) is often done in Matlab, 
 which I think uses COW under-the-hood to provide value semantics. So I 
 think anyone turning to D to do scientific computing will know reference 
 semantics, since they'd already be familiar with them from C/C++, etc 
 (Fortran?). Although successfully attracting algorithm prototypes from 
 Matlab/python/mathmatica/R/etc is probably bigger issue than just the 
 container types, growing the pie was why the Wii won the last console wars.

Fortran uses pass by reference, but sort of gets away with it by 
assuming and promoting no aliasing throughout. Any two named values in 
Fortran can be assumed to refer to distinct memory. Also unless I am 
wrong A = B in Fortran does the right thing (copies B's content into A). 
Please confirm/infirm.

For all I know, Matlab does the closest to "the real thing". Also, C++ 
numeric/scientific libraries invariably use value semantics in 
conjunction with expression templates meant to effect loop fusion. Why? 
Because value semantics is the right thing and C++ is able to express 
it. I should note, however, that Perl Data Language uses reference 
semantics (http://tinyurl.com/derlrh).

There's also a definite stench when one realizes that

a = b;

on one side, and

a = b * 1;

or

a = b + 0;

on the other, do completely different things.

So what we're looking at is: languages that had the option chose value 
semantics. Languages that didn't, well, they did what they could.

I started rather neutral in this discussion but the more time goes by, 
the more things tilt towards value semantics.


Andrei

May 02 2009

bearophile <bearophileHUGS lycos.com> writes:

About the issue of matrices as reference or value semantics.

As a program I think references are more efficient, they are used everywhere in
many languages like Python/Numpy, Sage, Ruby, etc, and I don't like the idea of
putting "ref" everywhere. So I like references more. Also all objects in D are
reference-based, so a D programmer is already very used to the concept of
reference-based data (the problems in D1 don't come from the fact that arrays
are references, but from the fact that their pointer/length is passed by value,
so for example if you change the length inside a function, such change can't be
seen outside. So by default for dynamic arrays I'd like a full reference
semantic).

On the other hand the value semantics is the most natural. Python uses
reference semantics as an optimization mean, and not because references are
more natural for programming newbies.
To prove this, every day in the python newsgroup people don't understand why:
if the following builds a list (array) of 10 integers:

 v = [0] * 10
 v



[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
 v[1] = 5
 v



[0, 5, 0, 0, 0, 0, 0, 0, 0, 0]

The following doesn't build a 3*4 matrix:
 m = [[0] * 3] * 4
 m[1][2] = 6
 m



[[0, 0, 6], [0, 0, 6], [0, 0, 6], [0, 0, 6]]

The answer comes from the value semantics of lists in python, plus immutability
of natural numbers.
Such really common errors in newbies show that reference semantics isn't
natural.
A future Python-like language that wants to be higher-level and less
error-prone and more natural than Python will probably use value semantics
everywhere (with COW, etc).

And I agree with Andrei Alexandrescu that reference semantics may produce bugs
far away from their originating point.

The future of programming is more toward immutable values and functional
programming, like the immutable strings of D2 (and some very intelligent people
have invented clever purely functional data structures that minimize data
copying).
In a purely functional language there's no point in giving the programmer a way
to tell references apart from values. They can be seen as the same thing.

Bye,
bearophile

May 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Consider the progress we made when replacing char[] 
 with string. Why? Because with char[] long-distance dependencies crop up 
 easy and fast.

For big arrays people will put "ref" everywhere, so those bugs will not
decrease.

Bye,
bearophile

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 10:58:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 I do scientific computing. Generally, I find it breaks down into two  
 parts: things under 4x4, for which value types are probably better, and  
 everything else, for which value types are to be avoided like the  
 plague. I'll often work with 100's mb of data with algorithms that take  
 minutes to hours to complete. So an unexpected copy is both hard to  
 find (am I slow/crashing because of my algorithm, or because of a  
 typo?) and rather harmful, because its big.

 I don't buy this. Undue copying is an issue that manifests itself  
 locally, reproducibly, and debuggably. Contrast with long-distance  
 coupling which is bound to hard to debug. You change a matrix here, and  
 all of a sudden a matrix miles away has been messed up. Also, efficiency  
 can be fixed with COW, whereas there is nothing you can do to fix the  
 coupling aside from relentless and patient user education.

 Walter gave me a good argument (little did he know he was making a point  
 destroying his.) Consider the progress we made when replacing char[]  
 with string. Why? Because with char[] long-distance dependencies crop up  
 easy and fast. With string you know there's never going to be a  
 long-distance dependency. Why? Because unlike char[], content  
 immutability makes string as good as a value.

 I remember the nightmare. I'd define a little structure:

 struct Sentence
 {
      uint id;
      char[] data;
 }

 Above my desk I have a big red bulb along with an audible alarm. As soon  
 as I add the member "data", the bulb and the alarm go off. Sentence is  
 now an invalid struct - I need to add at least constructor and a  
 postblit. In the constructor I need to  call .dup on the incoming data,  
 and in the postblit I need to do something similar (or something more  
 complicated if I want to be efficient). This is a clear example of code  
 that is short and natural, yet does precisely the wrong thing. This is  
 simply a ton of trouble, as experience with C++ has shown.

 I'm not even getting into calling functions that take a char[] and  
 keeping fingers crossed ("I hope they won't mess with it") or .dup-ing  
 prior to the call to eliminate any doubt (even though the function may  
 anyway call .dup internally). string has marked huge progress towards  
 people considering D seriously.

Andrei, you're perfectly right about strings. They've been a god send. But  
strings are small, cheap to copy and tend to roam a lot. Scientific  
computing (at least the kind I do) is very local, and uses large datasets  
which are expensive to copy. I load data and set up the problem  
parameters, then pass through a series of functions and write it out the  
answer. When I first learned Matlab, my prof gave me a great rule of  
thumb: if it's more than 10 lines long, you're probably doing it wrong. My  
code bases tend to be small and focused, and use either my own code or  
libraries I understand since they're logically pure (Ah, math functions (I  
don't mess with rounding)).

And while finding bugs is relatively easy (go small, personal code bases)  
tracking down a performance issue if you don't even know it exists is a  
lot harder/slower.

 But I've generally worked on making something else fast so more data  
 can be crunched, etc. Actual prototype work (for array/matrix based  
 stuff at least) is often done in Matlab, which I think uses COW  
 under-the-hood to provide value semantics. So I think anyone turning to  
 D to do scientific computing will know reference semantics, since  
 they'd already be familiar with them from C/C++, etc (Fortran?).  
 Although successfully attracting algorithm prototypes from  
 Matlab/python/mathmatica/R/etc is probably bigger issue than just the  
 container types, growing the pie was why the Wii won the last console  
 wars.

 Fortran uses pass by reference, but sort of gets away with it by  
 assuming and promoting no aliasing throughout. Any two named values in  
 Fortran can be assumed to refer to distinct memory. Also unless I am  
 wrong A = B in Fortran does the right thing (copies B's content into A).  
 Please confirm/infirm.

 For all I know, Matlab does the closest to "the real thing". Also, C++  
 numeric/scientific libraries invariably use value semantics in  
 conjunction with expression templates meant to effect loop fusion. Why?  
 Because value semantics is the right thing and C++ is able to express  
 it. I should note, however, that Perl Data Language uses reference  
 semantics (http://tinyurl.com/derlrh).

Actually, all the big, high performance libraries I know use BLAS/Linpack  
(i.e. reference semantics). Though there are a good number of middle range  
libraries use expression templates.

Hmm... just went googling and found some neat work on mixing expression  
templates with BLAS APIs, including using the GPU. Having you're cake and  
eating it too is nice.

Though personally I like the current array op syntax, which is reference  
based and is just enough to make me aware of doing expensive operations,  
without being annoying:

a[] = b[] + c[];


 There's also a definite stench when one realizes that

 a = b;

 on one side, and

 a = b * 1;

 or

 a = b + 0;

 on the other, do completely different things.

Again, I'd use the []= operator

a   = b + 0; // Error
a[] = b + 0; // Okay

So I don't see this as an issue.

 So what we're looking at is: languages that had the option chose value  
 semantics. Languages that didn't, well, they did what they could.

Well, that's one thing I love about D. We have a value assignment  
operator: []= in addition to a reference semantics assignment operator =  
(well, for reference types at least). I actually use it to do format  
copy/conversions in my own array class.

 I started rather neutral in this discussion but the more time goes by,  
 the more things tilt towards value semantics.

Well, for me, the more this goes on the more I get interested in trying  
value semantics. But if I'm honest, I'd just go on using array slices and  
forget about the array containers. And the lack of good, fast lock-free  
value semantics containers (even the concept of shared data without  
reference semantics) is a border-line deal breaker, depending on who you  
are (It's bad PR regardless).

Also, in a value semantics world, refs are third class citizens, but in a  
reference semantic world, value semantics get their own assignment  
operator ( []= ), and by convention, their own property ( .dup )

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 Also, in a value semantics world, refs are third class citizens, but in 
 a reference semantic world, value semantics get their own assignment 
 operator ( []= ), and by convention, their own property ( .dup )

The major problem is not assignment. That can be taken care of. The 
problem is:

1. Passing an object into a function

2. Making the object as a member of another object

3. Yes, assigning to the object (which ought to be congruent with 1 and 2).

I have perused some more searches and documentation, and things don't 
bode well for references. Consider the PyNum library. I have searched

pynum pass by reference

and found some interesting links. The first reveals differences between 
numpy and matlab (notably reference semantics). The second is a 
discussion entitled "beginner confused with numpy". Guess what the 
confusion was about? Congratulations, you have won a 40'' LCD TV. 
Reference semantics!

Third hit: 
https://www-old.cae.wisc.edu/pipermail/octave-maintainers/2009-March/011509.html

says:

 Octave's pass-by-value mechanism (with lazy copy-on-write) is
 something that is *far* more simple to grasp than NumPy's inherited
 everything-is-a-reference semantics. I do regard myself as moderately
 experienced Python programmer, yet every now and then I get shot in
 the foot by the reference semantics in Python.


I swear I didn't pay that guy.

Also, getting back to Perl's Data language (http://tinyurl.com/derlrh) I 
see the mention to references is clearly a WARNING, not an INTERESTING 
AND DESIRABLE FEATURE.

"It is important to keep the ``reference nature'' of piddles in mind 
when passing piddles into subroutines. If you modify the input pdls you 
modify the original argument,  not a copy of it. This is different from 
some other array processing languages but makes for very efficient 
passing of piddles between subroutines."

So what I'm seeing is that reference semantics is not desirable and not 
natural but was chosen for efficiency reasons. But you know, I don't 
want to "keep in mind" extra stuff when coding, I have enough to worry 
about. If I can get away with ref and/or refcounting, then I have taken 
care of the efficiency issue and I don't need to keep in mind a weird quirk.

============

About undue copying of data: maybe that could be avoided by having 
functions manipulate ranges (which are cheap to copy), not the 
containers that use them. In C++ you need to pass the container more 
often than not mostly because passing two iterators is amazingly 
burdensome. In D we can pass ranges easily, so I think much D code that 
wants to do stuff will just take ranges.


Andrei

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 17:59:53 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Also, in a value semantics world, refs are third class citizens, but in  
 a reference semantic world, value semantics get their own assignment  
 operator ( []= ), and by convention, their own property ( .dup )

 The major problem is not assignment. That can be taken care of. The  
 problem is:

 1. Passing an object into a function

 2. Making the object as a member of another object

 3. Yes, assigning to the object (which ought to be congruent with 1 and  
 2).

 I have perused some more searches and documentation, and things don't  
 bode well for references. Consider the PyNum library. I have searched

 pynum pass by reference

 and found some interesting links. The first reveals differences between  
 numpy and matlab (notably reference semantics). The second is a  
 discussion entitled "beginner confused with numpy". Guess what the  
 confusion was about? Congratulations, you have won a 40'' LCD TV.  
 Reference semantics!

 Third hit:  
 https://www-old.cae.wisc.edu/pipermail/octave-maintainers/2009-March/011509.html

 says:

  >> Octave's pass-by-value mechanism (with lazy copy-on-write) is
  >> something that is *far* more simple to grasp than NumPy's inherited
  >> everything-is-a-reference semantics. I do regard myself as moderately
  >> experienced Python programmer, yet every now and then I get shot in
  >> the foot by the reference semantics in Python.

 I swear I didn't pay that guy.

 Also, getting back to Perl's Data language (http://tinyurl.com/derlrh) I  
 see the mention to references is clearly a WARNING, not an INTERESTING  
 AND DESIRABLE FEATURE.

 "It is important to keep the ``reference nature'' of piddles in mind  
 when passing piddles into subroutines. If you modify the input pdls you  
 modify the original argument,  not a copy of it. This is different from  
 some other array processing languages but makes for very efficient  
 passing of piddles between subroutines."

 So what I'm seeing is that reference semantics is not desirable and not  
 natural but was chosen for efficiency reasons. But you know, I don't  
 want to "keep in mind" extra stuff when coding, I have enough to worry  
 about. If I can get away with ref and/or refcounting, then I have taken  
 care of the efficiency issue and I don't need to keep in mind a weird  
 quirk.

I've said it before and I'll say it again. In high-level interpreted  
languages, with interactive consoles etc, you're generally doing  
prototyping work and value semantics are better for that. But D is a  
systems language. That doesn't make the arguments any less valid, but you  
might want to down weight them a bit.

 ============

 About undue copying of data: maybe that could be avoided by having  
 functions manipulate ranges (which are cheap to copy), not the  
 containers that use them. In C++ you need to pass the container more  
 often than not mostly because passing two iterators is amazingly  
 burdensome. In D we can pass ranges easily, so I think much D code that  
 wants to do stuff will just take ranges.

And ranges _are_ reference semantics. If as you say most of D's code would  
use ranges, doesn't that prove the counter point?

Although, I'm starting to see an interesting story: Here are containers.  
They have value semantics and are simple to use/prototype with. When  
you're done, you can move to ranges in the performance critical sections  
in order to boost performance. And some things, like high performance  
lock-free queues and stacks, might only exist as ranges.

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 Although, I'm starting to see an interesting story: Here are containers. 
 They have value semantics and are simple to use/prototype with. When 
 you're done, you can move to ranges in the performance critical sections 
 in order to boost performance. And some things, like high performance 
 lock-free queues and stacks, might only exist as ranges.

I'm off the phone with Walter. He made a golden point: matrices are not 
general containers! They are mathematical entities and probably are 
indeed best off with value semantics (if efficiency issues can be taken 
care of).

But matrix semantics are not necessarily generalizable to generic 
container semantics. I think that's a good insight. So probably worrying 
about matrices when discussing containers is a red herring.


Andrei

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

 All containers need to support copying.

No they don't. To be clear, I'm talking about copying the container, not  
the values inside them, in a non-destructive manner. My data structures  
book and wikipedia make no mention of a .dup like method as being part of  
a container. I do agree .dup is useful and quite practical for serial  
containers, but it is by no means required. But you haven't answered the  
question of a value semantic lock-free container.

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com> 
 wrote:
 Robert Jacques wrote:
 Precisely, so it's always passed by reference and doesn't need to
 support copying.

 All containers need to support copying.

 
 No they don't. To be clear, I'm talking about copying the container, not 
 the values inside them, in a non-destructive manner. My data structures 
 book and wikipedia make no mention of a .dup like method as being part 
 of a container. I do agree .dup is useful and quite practical for serial 
 containers, but it is by no means required. But you haven't answered the 
 question of a value semantic lock-free container.
 

I have.

Andrei

May 01 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 All containers need to support copying.

 
 No they don't.

There is no conceptual problem with a non-copyable struct.  However, in
order to be a considered a container, the struct must at least support
read-only iteration over its contents.  If iteration is possible, then
so is copying.

You can have non-copyable value types, but they can't be containers.


-- 
Rainer Deyke - rainerd eldwood.com

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 03:39:27 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Fri, 01 May 2009 15:37:56 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 All containers need to support copying.

 No they don't.

 There is no conceptual problem with a non-copyable struct.  However, in
 order to be a considered a container, the struct must at least support
 read-only iteration over its contents.  If iteration is possible, then
 so is copying.

 You can have non-copyable value types, but they can't be containers.

No they don't. Iteration can often be destructive. If I iterate over a  
stack or a queue, I'm left with an empty stack/queue. For value semantics  
to work non-destructive copying is required.

May 02 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 On Sat, 02 May 2009 03:39:27 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 You can have non-copyable value types, but they can't be containers.

 
 No they don't. Iteration can often be destructive. If I iterate over a
 stack or a queue, I'm left with an empty stack/queue. For value
 semantics to work non-destructive copying is required.

OK.  I grant that there are non-copyable types that can reasonably be
called containers.  Simple solution: make them non-copyable structs.
You still get most of the benefits of value types:
  - One less layer of indirection.
  - No long distance dependencies.
  - RAII.


-- 
Rainer Deyke - rainerd eldwood.com

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 15:17:04 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 03:39:27 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 You can have non-copyable value types, but they can't be containers.

 No they don't. Iteration can often be destructive. If I iterate over a
 stack or a queue, I'm left with an empty stack/queue. For value
 semantics to work non-destructive copying is required.

 OK.  I grant that there are non-copyable types that can reasonably be
 called containers.  Simple solution: make them non-copyable structs.
 You still get most of the benefits of value types:
   - One less layer of indirection.

Again, D array's are structs with reference semantics. This isn't a  
pro/con either way.

   - No long distance dependencies.

Well, if I can't copy it, then I have to use ref everywhere, which is  
functionally equivalent to reference semantics. I think you've just proved  
the counter-point.

   - RAII.

Can be done with structs or classes. Also see point 1. So, this isn't a  
pro/con either way.

May 02 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 Again, D array's are structs with reference semantics. This isn't a
 pro/con either way.

The D1 dynamic array type does not have reference semantics, nor does it
have value semantics.

void f(int[] a) {
  a.length = 1;
}

auto a = [];
f(a);
assert(a.length == 0);

   - No long distance dependencies.

 
 Well, if I can't copy it, then I have to use ref everywhere, which is
 functionally equivalent to reference semantics. I think you've just
 proved the counter-point.

Given a value type 'T', you have the guarantee that no two variables of
type 'T' can alias each other.  This guarantee is preserved when the
type 'T' is non-copyable.

An argument of type 'ref T' can obviously alias a variable of type 'T'.

   - RAII.

 
 Can be done with structs or classes. Also see point 1. So, this isn't a
 pro/con either way.

The D1 dynamic array type does not support RAII.


-- 
Rainer Deyke - rainerd eldwood.com

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 19:11:11 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 Again, D array's are structs with reference semantics. This isn't a
 pro/con either way.

 The D1 dynamic array type does not have reference semantics, nor does it
 have value semantics.

 void f(int[] a) {
   a.length = 1;
 }

 auto a = [];
 f(a);
 assert(a.length == 0);

   - No long distance dependencies.

 Well, if I can't copy it, then I have to use ref everywhere, which is
 functionally equivalent to reference semantics. I think you've just
 proved the counter-point.

 Given a value type 'T', you have the guarantee that no two variables of
 type 'T' can alias each other.  This guarantee is preserved when the
 type 'T' is non-copyable.

 An argument of type 'ref T' can obviously alias a variable of type 'T'.

Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere.  
Which is reference semantics.

   - RAII.

 Can be done with structs or classes. Also see point 1. So, this isn't a
 pro/con either way.

 The D1 dynamic array type does not support RAII.

There are two parts to D's arrays. One a struct 2 words long, the other is  
a chunk of ram. The first part is RAII, the second part is not possible,  
since D doesn't allow dynamically sized memory allocation on the stack.  
And basically all dynamic data structures have to do some heap allocation.  
I had thought you were talking about the difference between have the  
managing head be a struct of a class. (See scope classes)

May 02 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 On Sat, 02 May 2009 19:11:11 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 Given a value type 'T', you have the guarantee that no two variables of
 type 'T' can alias each other.  This guarantee is preserved when the
 type 'T' is non-copyable.

 An argument of type 'ref T' can obviously alias a variable of type 'T'.

 
 Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere.
 Which is reference semantics.

When passing arguments, (possibly const) ref is a reasonable default.  I
don't care about how arguments are passed.  I care about aliasing
between variables, especially member variables.

With reference semantics, two variables of type T can reference each
other.  With non-copyable types, they cannot.

   - RAII.

 Can be done with structs or classes. Also see point 1. So, this isn't a
 pro/con either way.

 The D1 dynamic array type does not support RAII.

 
 There are two parts to D's arrays. One a struct 2 words long, the other
 is a chunk of ram. The first part is RAII, the second part is not
 possible, since D doesn't allow dynamically sized memory allocation on
 the stack.

It's meaningless to talk about RAII in the context of the "struct" part
of a D1 dynamic array, since it doesn't manage any resources.  If I
place a variable of a RAII type in a D1 dynamic array, it is not
properly destroyed when the array goes out of scope.  Therefore D1
dynamic arrays do not support RAII.

Stack versus heap allocation is an orthogonal issue.


-- 
Rainer Deyke - rainerd eldwood.com

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 22:14:59 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 Robert Jacques wrote:
 On Sat, 02 May 2009 19:11:11 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 Given a value type 'T', you have the guarantee that no two variables of
 type 'T' can alias each other.  This guarantee is preserved when the
 type 'T' is non-copyable.

 An argument of type 'ref T' can obviously alias a variable of type 'T'.

 Okay, if T is not copyable, then I _must_ pass it as ref T, everywhere.
 Which is reference semantics.

 When passing arguments, (possibly const) ref is a reasonable default.  I
 don't care about how arguments are passed.  I care about aliasing
 between variables, especially member variables.

 With reference semantics, two variables of type T can reference each
 other.  With non-copyable types, they cannot.

   - RAII.

 Can be done with structs or classes. Also see point 1. So, this isn't  
 a
 pro/con either way.

 The D1 dynamic array type does not support RAII.

 There are two parts to D's arrays. One a struct 2 words long, the other
 is a chunk of ram. The first part is RAII, the second part is not
 possible, since D doesn't allow dynamically sized memory allocation on
 the stack.

 It's meaningless to talk about RAII in the context of the "struct" part
 of a D1 dynamic array, since it doesn't manage any resources.  If I
 place a variable of a RAII type in a D1 dynamic array, it is not
 properly destroyed when the array goes out of scope.  Therefore D1
 dynamic arrays do not support RAII.

 Stack versus heap allocation is an orthogonal issue.

RAII is all about stack allocation over heap allocation (or so I thought).  
Ah, wikipedia has set me straight. Anyways, now for the simple answer: you  
can't create D1 arrays with RAII types, I think. (Anyone tried scope Foo[]  
foo?) Anyways, in D2, if I remember correctly there's a bug where struct  
finilizers don't run if they're allocated on the heap. But if you're using  
classes for RAII like you should, the GC will run their finalizers just  
fine after the array dies. But this is an seems to be an issue about the  
elements/values inside the containers, not the container itself. So I'm  
lost.

May 02 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 RAII is all about stack allocation over heap allocation (or so I
 thought). Ah, wikipedia has set me straight. Anyways, now for the simple
 answer: you can't create D1 arrays with RAII types, I think. (Anyone
 tried scope Foo[] foo?) Anyways, in D2, if I remember correctly there's
 a bug where struct finilizers don't run if they're allocated on the
 heap. But if you're using classes for RAII like you should, the GC will
 run their finalizers just fine after the array dies. But this is an
 seems to be an issue about the elements/values inside the containers,
 not the container itself. So I'm lost.

A RAII variable is "destroyed" when it goes out of scope, where
"destroyed" means that a destructor is called.  RAII is a transitive
feature.  When a RAII variable is destroyed, its members are also
destroyed.  When a RAII container is destroyed, all of its contents are
destroyed.

References in D are not RAII types, because when a reference goes out of
scope, the "contents" of that reference are not destroyed until the
garbage decides to collect them, at which point it is too late to
perform clean-up.

When an array dies, its contents are destroyed.  The issue is when the
array dies.

If the array is a value type, the array dies when it goes out of scope,
so RAII is possible.

If the array is a reference type, the array dies when the garbage
collector decides to run sometime after all live references to the array
have died, so RAII is not possible.


-- 
Rainer Deyke - rainerd eldwood.com

May 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Robert Jacques wrote:
 RAII is all about stack allocation over heap allocation (or so I
 thought). Ah, wikipedia has set me straight. Anyways, now for the simple
 answer: you can't create D1 arrays with RAII types, I think. (Anyone
 tried scope Foo[] foo?) Anyways, in D2, if I remember correctly there's
 a bug where struct finilizers don't run if they're allocated on the
 heap. But if you're using classes for RAII like you should, the GC will
 run their finalizers just fine after the array dies. But this is an
 seems to be an issue about the elements/values inside the containers,
 not the container itself. So I'm lost.

 
 A RAII variable is "destroyed" when it goes out of scope, where
 "destroyed" means that a destructor is called.  RAII is a transitive
 feature.  When a RAII variable is destroyed, its members are also
 destroyed.  When a RAII container is destroyed, all of its contents are
 destroyed.
 
 References in D are not RAII types, because when a reference goes out of
 scope, the "contents" of that reference are not destroyed until the
 garbage decides to collect them, at which point it is too late to
 perform clean-up.
 
 When an array dies, its contents are destroyed.  The issue is when the
 array dies.
 
 If the array is a value type, the array dies when it goes out of scope,
 so RAII is possible.
 
 If the array is a reference type, the array dies when the garbage
 collector decides to run sometime after all live references to the array
 have died, so RAII is not possible.

RAII can be implemented even with reference semantics. The mechanics 
would involve reference counting.

Andrei

May 03 2009

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 Rainer Deyke wrote:
 If the array is a reference type, the array dies when the garbage
 collector decides to run sometime after all live references to the array
 have died, so RAII is not possible.

 
 RAII can be implemented even with reference semantics. The mechanics
 would involve reference counting.

Granted, although reference counting comes with its own set of issues.


-- 
Rainer Deyke - rainerd eldwood.com

May 03 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  3 de mayo a las 11:55 me escribiste:
If the array is a reference type, the array dies when the garbage
collector decides to run sometime after all live references to the array
have died, so RAII is not possible.

 
 RAII can be implemented even with reference semantics. The mechanics would 
 involve reference counting.

I was not following this discussion, so I might be speaking out of
ignorance, but I thing you are underestimating the problems of reference
counting again.

Circular references break it, and when comes to RAII, you probably loose
the ordering guarantees it needs if you implement some kind of algorithm
to deal with cycles.

Another option is trust the program be smart enough not to create cycles
(weark references can help a lot here), but if the RC is hidden in the
language it can be a little hard to remember this restriction.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------

May 04 2009

Jason House <jason.james.house gmail.com> writes:

Rainer Deyke Wrote:

 Michel Fortin wrote:
 On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd eldwood.com> said:
 Andrei Alexandrescu wrote:
 3. Reference semantics

 I'm strongly opposed to this option.  Either of the other options would
 be acceptable.

 
 Andrei mentioned a couple of reasons against 3, which are yours?

 
 I agree with all of Andrei's reasons.
 
 In addition, there's this: suppose you have a struct containing a
 (mutable) array.  When you make a copy of that struct, you will almost
 always want to make a copy of the contained array.  Therefore value
 semantics should be the default, because it simplifies the most common
 use case.

I don't think copying an array member in a struct is as automatic as you imply.
We really should figure out use cases that would or would not benefit from a
copy. I'm sure we can find lazy range examples where copying would be a
performance killer. Also, how deeply should copying go? Does it make sense to
do a shallow copy of an array of reference types? What about passing an array
into a function?

Apr 30 2009

Christopher Wright <dhasenan gmail.com> writes:

Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 1. Value semantics (arrays are like int)

 
 Don't forget:
  + Supports timely destruction of contents (i.e. RAII).

These are collections. RAII doesn't matter when you have a garbage 
collector and the only resource you have is memory, unless you're quite 
strapped for memory.

Apr 30 2009

Rainer Deyke <rainerd eldwood.com> writes:

Christopher Wright wrote:
 Rainer Deyke wrote:
 Don't forget:
  + Supports timely destruction of contents (i.e. RAII).

 
 These are collections. RAII doesn't matter when you have a garbage
 collector and the only resource you have is memory, unless you're quite
 strapped for memory.

It also doesn't matter if the resource deallocation pixie deallocates
all my non-memory resources the moment they are no longer being used,
which is about as likely as only having memory resources in arrays.


-- 
Rainer Deyke - rainerd eldwood.com

Apr 30 2009

Christopher Wright <dhasenan gmail.com> writes:

Rainer Deyke wrote:
 Christopher Wright wrote:
 Rainer Deyke wrote:
 Don't forget:
  + Supports timely destruction of contents (i.e. RAII).

 These are collections. RAII doesn't matter when you have a garbage
 collector and the only resource you have is memory, unless you're quite
 strapped for memory.

 
 It also doesn't matter if the resource deallocation pixie deallocates
 all my non-memory resources the moment they are no longer being used,
 which is about as likely as only having memory resources in arrays.

Which requires the container to know to delete everything that it 
references when it dies. If it is dealing with reference types, that is 
not the correct behavior. It seems awkward to me to deal with value 
types with destructors, but it makes sense for reference counting.

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

No, Andrei it hasn't. A detailed paragraph (or more) explaining we you  
think so should be included in the full RFC.

 One big question mark for me is, should we define
 for arrays (and associative arrays and containers in general) value
 semantics, reference counted value semantics, or reference semantics? It
 is not clear to me yet what the best option is. Think of arrays as an
 archetypal example.

Here are some additional pros/cons:

1. Value semantics (arrays are like int)
  +  can return static arrays from functions
  -  can be emulated with slice assign and .dup, where needed
  -  requires arrays and slices to be separate types
  +- requires opImplicitCast from container to slice type
  +- requires template argument deduction to be opImplicitCast/alias this  
aware
  -  If you forget to pass by reference (when you meant to), logical bugs  
are produced (i.e. my mutation doesn't get published).
  -  Dis-allows inheritance from all containers (i.e. implemented as  
structs)

I think clarifying 2 as being copy-on-write is useful.
2. Copy-on-write (Value semantics with reference counting)
  +- The added trade-offs of value semantics
  -  arrays may no longer be passed in registers (performance reduction)

3. Reference semantics
  ++ Allows arrays and slices to be the same type
  +  How D's arrays/slices work today
  +  Familiar to C/C++ refugees (Similar to plain old arrays)
  +  Familiar to most OO language refugees: containers are objects
  +  Allows inheritance from most containers (i.e. implemented as objects,  
except for arrays)

 - Hard to reason about: squirrel an array into an object, and way later
 and farther it modifies the array and effects a change at long distance
 - Long-distance effects may as well be some of the toughest bugs after
 memory errors
 - Potentially inefficient as people might start sprinkling copies in the
 hope the eliminate bugs or chances of bugs
 + const system helps: if a function doesn't change something just mark
 it as const and you're home free
 + immutable also helps: immutable data can never change so it can be
 shared no problem. Not sure how interesting immutable containers are  
 though.

All these issues also exist with objects, which people are familiar with.

 - Implementing fast/unsafe semantics with reference semantics
 effectively forces reference counting to be part of containers, which
 complicates implementation.

I'm drawing a blank on this. My thinking is this;
1) garbage-collected:   reference counting isn't needed
2) reference counted:   reference counting is expected and built into  
objects in general.
3) "unsafe" management: safety is explicitly not expected, so reference  
counting isn't needed

My vote is for reference semantics. It's fast, familiar, easy to reason  
about (i.e. like objects) and doesn't require seperate array and slice  
types.

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

 
 No, Andrei it hasn't. A detailed paragraph (or more) explaining we you 
 think so should be included in the full RFC.

There are several converging pieces of evidence. Each can be debated in 
separation, yet together they make for a rather strong case.

People have complained about the incredibly bad, unsafe, and slow 
behavior of appending. Even before the considerable recent slowdown of 
~=, there were speed concerns. Worse than that, the completely weird 
behavior of slices that grow to stomp on other slices cannot be defended.

The idiom:

array.length = n;
array.length = 0;
// append to array up to n...

is rather disingenuous and looks odd to the casual reader.

As predicted by the "range theory" which predicates that a range will 
never grow, only shrink, slices should always shrink. Growing is a 
foreign operation for a slice. This is a big problem that goes beyond a 
need for purity or cleanliness - it's just terrible to have a slice 
stomping around, and then uncontrollably being rebound elsewhere and so 
on. People pass slices into functions, functions modify them and also 
append to them, and depending on the phase of the moon, some updates are 
being seen in the caller.

So people have defined various types (such as ArrayCreator or 
ArrayAppender or whatnot) that take care of that aspect. But really what 
we're looking at is an Array type.

Another issue is that we can get (partly) away with disowned slices 
because of the garbage collector: whenever a slice is to be rebound, 
whatever chunk it was bound to is left behind and will be duly 
collected. If, however, we want to support other programming disciplines 
that exercise stricter control over memory, the "parent" arrays must 
become an explicit type that code can keep around and manipulate directly.

A design that has one container type and several slice types that crawl 
it in various ways is very compelling, and I think it will be a huge 
selling point for D. It would be odd, not natural, to have arrays as a 
singularly distinct container that is in fact its own range. We get away 
with container == range for arrays partly because arrays are a simple 
structure, but that only blurs thinking and understanding of more 
complex containers. In fact, ranges do predict that any attempt to grow 
a range will cause topological issues in the owning container; that's 
why SList wisely (IMHO :o)) includes a template parameter that tells 
whether its topology is fixed or flexible.

(Last but not least, .NET has arrays and slices. D's slices that 
actually have an array testis mesh real bad with .NET because in .NET 
you can't have a disowned slice.)


Andrei

Apr 30 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 30 Apr 2009 23:57:17 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

  No, Andrei it hasn't. A detailed paragraph (or more) explaining we you  
 think so should be included in the full RFC.

 There are several converging pieces of evidence. Each can be debated in  
 separation, yet together they make for a rather strong case.

 People have complained about the incredibly bad, unsafe, and slow  
 behavior of appending. Even before the considerable recent slowdown of  
 ~=, there were speed concerns. Worse than that, the completely weird  
 behavior of slices that grow to stomp on other slices cannot be defended.

 The idiom:

 array.length = n;
 array.length = 0;
 // append to array up to n...

 is rather disingenuous and looks odd to the casual reader.

 As predicted by the "range theory" which predicates that a range will  
 never grow, only shrink, slices should always shrink. Growing is a  
 foreign operation for a slice. This is a big problem that goes beyond a  
 need for purity or cleanliness - it's just terrible to have a slice  
 stomping around, and then uncontrollably being rebound elsewhere and so  
 on. People pass slices into functions, functions modify them and also  
 append to them, and depending on the phase of the moon, some updates are  
 being seen in the caller.

 So people have defined various types (such as ArrayCreator or  
 ArrayAppender or whatnot) that take care of that aspect. But really what  
 we're looking at is an Array type.

 Another issue is that we can get (partly) away with disowned slices  
 because of the garbage collector: whenever a slice is to be rebound,  
 whatever chunk it was bound to is left behind and will be duly  
 collected. If, however, we want to support other programming disciplines  
 that exercise stricter control over memory, the "parent" arrays must  
 become an explicit type that code can keep around and manipulate  
 directly.

 A design that has one container type and several slice types that crawl  
 it in various ways is very compelling, and I think it will be a huge  
 selling point for D. It would be odd, not natural, to have arrays as a  
 singularly distinct container that is in fact its own range. We get away  
 with container == range for arrays partly because arrays are a simple  
 structure, but that only blurs thinking and understanding of more  
 complex containers. In fact, ranges do predict that any attempt to grow  
 a range will cause topological issues in the owning container; that's  
 why SList wisely (IMHO :o)) includes a template parameter that tells  
 whether its topology is fixed or flexible.

 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)


 Andrei

I'm all for introducing Array type, but what would using it look like? Is it a
plain Array!(T), or is there any shortcut planned. One of the selling points of
slices is their ease of use. T[] is freaking awesome!

I also would like to see what would Phobos look like after introducing Arrays.

A good thing is that it's quite easy start implementing Array right now - all
you need is to disallow ~ and ~= operations on slices. Then, go through Phobos
code and see what got broken and start fixing it using newly written Array
template (compiler doesn't have to know anything about it). I beleve it will
become immediately clear whether D Arrays should be a reference or value type.

Apr 30 2009

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

Andrei Alexandrescu wrote:
 A design that has one container type and several slice types that crawl
 it in various ways is very compelling, and I think it will be a huge
 selling point for D. It would be odd, not natural, to have arrays as a
 singularly distinct container that is in fact its own range. We get away
 with container == range for arrays partly because arrays are a simple
 structure, but that only blurs thinking and understanding of more
 complex containers. In fact, ranges do predict that any attempt to grow
 a range will cause topological issues in the owning container; that's
 why SList wisely (IMHO :o)) includes a template parameter that tells
 whether its topology is fixed or flexible.

It sounds like you’re suggesting that arrays become like all containers,
and that slices become a kind of range over the array.

Functions that now take arrays (which are possibly slices) and write to
them but don’t shrink or append to them, should therefore be rewritten
to take the range that covers the array.

B.T.W.:  What’s the syntax for a range that accesses the entire
container in index order?

—Joel Salomon

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Joel C. Salomon wrote:
 Andrei Alexandrescu wrote:
 A design that has one container type and several slice types that crawl
 it in various ways is very compelling, and I think it will be a huge
 selling point for D. It would be odd, not natural, to have arrays as a
 singularly distinct container that is in fact its own range. We get away
 with container == range for arrays partly because arrays are a simple
 structure, but that only blurs thinking and understanding of more
 complex containers. In fact, ranges do predict that any attempt to grow
 a range will cause topological issues in the owning container; that's
 why SList wisely (IMHO :o)) includes a template parameter that tells
 whether its topology is fixed or flexible.

 
 It sounds like you’re suggesting that arrays become like all containers,
 and that slices become a kind of range over the array.

Yes. I think that that would be a good design.

 Functions that now take arrays (which are possibly slices) and write to
 them but don’t shrink or append to them, should therefore be rewritten
 to take the range that covers the array.

To clarify: T[] is a slice. It has one array testis in the form of the 
infamous ~=. And it has the problems that I described because of that 
gender ambiguity.

Most functions manipulate ranges (and therefore slices), not full array 
objects. So the addition of arrays would not be disruptive.

 B.T.W.:  What’s the syntax for a range that accesses the entire
 container in index order?

To get the "default" range for a container c, you write c[]. foreach 
automatically calls it. For a T[], asking a = b[] is an identity 
function. For a T[5], it happens to do the right thing.

Allow me to bring the std.range.SListRange example again, because it's a 
very good example outside of arrays. An approach used e.g. in LISP is to 
conflate lists as containers with lists as iterators in containers. So 
when you pass a list into a function it could not only look and change 
at the elements that the list contains, but it could rewire the nodes in 
the list any way it wants. The root of the list itself is a bit special 
because it is usually passed "by value" so the caller will hold the root 
even if the callee would want to even change that. This is not a huge 
trouble for LISP as lists are the focal data structure, but the behavior 
comes off as a bit odd for someone who'd like to manipulate lists and 
other structures in uniform ways.

So there comes the notion that the container is concerned with topology: 
how slots "sit" in memory, how they are arranged with respect to each 
other etc. The exact ways to manipulate topology and the tradeoffs 
involved are container-specific (for example it's cheap to prepend to a 
list but not an array). Ranges, on the other hand, do not allow 
topological changes - they only have uniform primitives for accessing 
the elements in containers, whereas they prevent any topological changes.

So a SListRange!(int, Topology.flexible) would be a container and offer 
topological changes such as insertAfter. When such a list is asked for a 
range (via c[]) it will return a SListRange!(int, Topology.fixed). That 
guy will know how to move about the host list, but would be unable to 
exact any changes to its topology. As such, SListRange!(int, 
Topology.fixed) is as useable as any other forward range.

Does any of this make sense? :o)


Andrei

Apr 30 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Does any of this make sense? :o)

Yes. I think problems aren't finished yet and more thinking is expected, but
you are getting somewhere (I can tell it also because I am starting to
understand. I am not bright. When even I understand it often means that the
design is meaningful).
Regarding the foreach on the items I don't know/understand the syntactic need
for the [] to scan.

Bye,
bearophile

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 Does any of this make sense? :o)

 
 Yes. I think problems aren't finished yet and more thinking is
 expected, but you are getting somewhere (I can tell it also because I
 am starting to understand. I am not bright.

Bright is only one :o).

 When even I understand it
 often means that the design is meaningful). Regarding the foreach on
 the items I don't know/understand the syntactic need for the [] to
 scan.

Oh, that's simple. A while ago people asked, how do I iterate a 
container (assuming the container is distinct from range). So I said, 
here's how I think:

Container!int c;
...
foreach (element; c.all)
{
    ...use element...
}

People said, that sucks! With opApply I don't need to append the .all 
thingie! So I told Walter to automatically call [] against the 
container. So people can now write:

foreach (element; c)
{
    ...use element...
}

All the container has to to is define opSlice() to return its "all" range.


Andrei

Apr 30 2009

superdan <super dan.org> writes:

Andrei Alexandrescu Wrote:

 Joel C. Salomon wrote:
 Andrei Alexandrescu wrote:
 A design that has one container type and several slice types that crawl
 it in various ways is very compelling, and I think it will be a huge
 selling point for D. It would be odd, not natural, to have arrays as a
 singularly distinct container that is in fact its own range. We get away
 with container == range for arrays partly because arrays are a simple
 structure, but that only blurs thinking and understanding of more
 complex containers. In fact, ranges do predict that any attempt to grow
 a range will cause topological issues in the owning container; that's
 why SList wisely (IMHO :o)) includes a template parameter that tells
 whether its topology is fixed or flexible.

 
 It sounds like you’re suggesting that arrays become like all containers,
 and that slices become a kind of range over the array.

 
 Yes. I think that that would be a good design.
 
 Functions that now take arrays (which are possibly slices) and write to
 them but don’t shrink or append to them, should therefore be rewritten
 to take the range that covers the array.

 
 To clarify: T[] is a slice. It has one array testis in the form of the 
 infamous ~=. And it has the problems that I described because of that 
 gender ambiguity.
 
 Most functions manipulate ranges (and therefore slices), not full array 
 objects. So the addition of arrays would not be disruptive.
 
 B.T.W.:  What’s the syntax for a range that accesses the entire
 container in index order?

 
 To get the "default" range for a container c, you write c[]. foreach 
 automatically calls it. For a T[], asking a = b[] is an identity 
 function. For a T[5], it happens to do the right thing.
 
 Allow me to bring the std.range.SListRange example again, because it's a 
 very good example outside of arrays. An approach used e.g. in LISP is to 
 conflate lists as containers with lists as iterators in containers. So 
 when you pass a list into a function it could not only look and change 
 at the elements that the list contains, but it could rewire the nodes in 
 the list any way it wants. The root of the list itself is a bit special 
 because it is usually passed "by value" so the caller will hold the root 
 even if the callee would want to even change that. This is not a huge 
 trouble for LISP as lists are the focal data structure, but the behavior 
 comes off as a bit odd for someone who'd like to manipulate lists and 
 other structures in uniform ways.
 
 So there comes the notion that the container is concerned with topology: 
 how slots "sit" in memory, how they are arranged with respect to each 
 other etc. The exact ways to manipulate topology and the tradeoffs 
 involved are container-specific (for example it's cheap to prepend to a 
 list but not an array). Ranges, on the other hand, do not allow 
 topological changes - they only have uniform primitives for accessing 
 the elements in containers, whereas they prevent any topological changes.
 
 So a SListRange!(int, Topology.flexible) would be a container and offer 
 topological changes such as insertAfter. When such a list is asked for a 
 range (via c[]) it will return a SListRange!(int, Topology.fixed). That 
 guy will know how to move about the host list, but would be unable to 
 exact any changes to its topology. As such, SListRange!(int, 
 Topology.fixed) is as useable as any other forward range.
 
 Does any of this make sense? :o)
 
 
 Andrei

fuck man u dance around da shit but never step in it. the clear example is a
file. messin' with a file is diff from messin' with ranges that read from da
file. forget the lists n shit. file is a real example coz u also have to be
sure of lifetime n shit. i mean u wanna close it eh.

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 It has become increasingly clear we'll need to support arrays in
 addition to slices.

  No, Andrei it hasn't. A detailed paragraph (or more) explaining we you  
 think so should be included in the full RFC.

 There are several converging pieces of evidence. Each can be debated in  
 separation, yet together they make for a rather strong case.

 People have complained about the incredibly bad, unsafe, and slow  
 behavior of appending. Even before the considerable recent slowdown of  
 ~=, there were speed concerns. Worse than that, the completely weird  
 behavior of slices that grow to stomp on other slices cannot be defended.

Actually, this is a problem due to an optimization in the current GC.  
Fixing it is rather trivial: simply store the maximum requested array end  
at either the beginning or end of the memory block.

 The idiom:

 array.length = n;
 array.length = 0;
 // append to array up to n...

 is rather disingenuous and looks odd to the casual reader.

My view is that this is an argument for a reserve function:
array.reserve = n;
// append to array up to n...

 As predicted by the "range theory" which predicates that a range will  
 never grow, only shrink, slices should always shrink. Growing is a  
 foreign operation for a slice. This is a big problem that goes beyond a  
 need for purity or cleanliness - it's just terrible to have a slice  
 stomping around, and then uncontrollably being rebound elsewhere and so  
 on. People pass slices into functions, functions modify them and also  
 append to them, and depending on the phase of the moon, some updates are  
 being seen in the caller.

I agree this is a problem, but the bugs are bugs in the implementation,  
not design. (see above)
But the bigger [ (x ~ y) may or may not create a new array ] resulting in  
either unseen intended updates or seen unintended updates affects both  
array and slice types.  Note that with the bugs fixed, this would only  
occur when concatenating to the logically complete array, which is why I  
say it affects both an array and slice types equally. Value semantics  
might help though.
I understand where you're coming from with "range theory", though I don't  
personally see that slices are impure: I view (x~y) as creating a  
logically new array and returning a view of it.

 So people have defined various types (such as ArrayCreator or  
 ArrayAppender or whatnot) that take care of that aspect. But really what  
 we're looking at is an Array type.

An Array type doesn't address the needs of ArrayCreator/ArrayAppender. The  
reason ArrayAppender exists is to optimize finding the array capacity.  
While this can be added to an array type, it can just as easily be added  
to a slice type. The main reason this enhancement got shot down long ago,  
was that the D community thought making arrays 3 words long, instead of 2,  
would result in a net performance loss. Personally, I'd love to see some  
real numbers proving or dis-proving this.

 Another issue is that we can get (partly) away with disowned slices  
 because of the garbage collector: whenever a slice is to be rebound,  
 whatever chunk it was bound to is left behind and will be duly  
 collected. If, however, we want to support other programming disciplines  
 that exercise stricter control over memory, the "parent" arrays must  
 become an explicit type that code can keep around and manipulate  
 directly.

I'm a little sad to hear you say this, as both Daniel Keep and I posted  
sample pseudo code on how to do slice types properly in both reference  
counted and manual managed memory systems. To reiterate:

struct T[]{
     T* ptr;
     size_t length;
     MemInfo* memInfo;
}

struct MemInfo {
     size_t refCount; //not included if not reference counting
     size_t capacity;
}

With meminfo being located at the start of the array's memory block. (i.e.  
com style)

 A design that has one container type and several slice types that crawl  
 it in various ways is very compelling, and I think it will be a huge  
 selling point for D. It would be odd, not natural, to have arrays as a  
 singularly distinct container that is in fact its own range. We get away  
 with container == range for arrays partly because arrays are a simple  
 structure, but that only blurs thinking and understanding of more  
 complex containers. In fact, ranges do predict that any attempt to grow  
 a range will cause topological issues in the owning container; that's  
 why SList wisely (IMHO :o)) includes a template parameter that tells  
 whether its topology is fixed or flexible.

I'd disagree with regard to a selling point/compelling. My lab maintains  
an open source C++ array/slice type pair as part of a larger robotics  
package. It's laid outed with an ArrayBase containing all the  
implementation. The Array type simply extends ArrayBase and adds memory  
management. And Slice extends Arraybase, only adding binding capabilities.  
And save for memory allocation, they are identical, i.e. the container and  
range are interchangeable.

 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)

I've been following the blog. I was surprised that Cristian didn't just  
make all D arrays .NET ArraySegments and be done with it, but there's  
probably issues I don't know about. I'd like to hear more on what exactly  
the issue was.

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 People have complained about the incredibly bad, unsafe, and slow 
 behavior of appending. Even before the considerable recent slowdown of 
 ~=, there were speed concerns. Worse than that, the completely weird 
 behavior of slices that grow to stomp on other slices cannot be defended.

 
 Actually, this is a problem due to an optimization in the current GC. 
 Fixing it is rather trivial: simply store the maximum requested array 
 end at either the beginning or end of the memory block.

How does an arbitrary slice get to the beginning/end of the memory 
block? I guess that's not a fat pointer anymore, it's an obese pointer.

 But the bigger [ (x ~ y) may or may not create a new array ]

x ~ y always creates a new array.

 I'm a little sad to hear you say this, as both Daniel Keep and I posted 
 sample pseudo code on how to do slice types properly in both reference 
 counted and manual managed memory systems.

No need to be sad. The code is obvious. What I was saying is that people 
actually need to get their hand on the container so they can control its 
scope.

 A design that has one container type and several slice types that 
 crawl it in various ways is very compelling, and I think it will be a 
 huge selling point for D. It would be odd, not natural, to have arrays 
 as a singularly distinct container that is in fact its own range. We 
 get away with container == range for arrays partly because arrays are 
 a simple structure, but that only blurs thinking and understanding of 
 more complex containers. In fact, ranges do predict that any attempt 
 to grow a range will cause topological issues in the owning container; 
 that's why SList wisely (IMHO :o)) includes a template parameter that 
 tells whether its topology is fixed or flexible.

 
 I'd disagree with regard to a selling point/compelling. My lab maintains 
 an open source C++ array/slice type pair as part of a larger robotics 
 package. It's laid outed with an ArrayBase containing all the 
 implementation. The Array type simply extends ArrayBase and adds memory 
 management. And Slice extends Arraybase, only adding binding 
 capabilities. And save for memory allocation, they are identical, i.e. 
 the container and range are interchangeable.

How about a hashtable? A tree? Are the ranges crawling them identical 
with the containers? Not at all.

Arrays are probably the worst example to use as an archetype for a 
design because they are very simple. Pretty much any design looks a bit 
baroque when only applied to arrays.

 (Last but not least, .NET has arrays and slices. D's slices that 
 actually have an array testis mesh real bad with .NET because in .NET 
 you can't have a disowned slice.)

 
 I've been following the blog. I was surprised that Cristian didn't just 
 make all D arrays .NET ArraySegments and be done with it, but there's 
 probably issues I don't know about. I'd like to hear more on what 
 exactly the issue was.

We discussed at length. Without being an expert in .NET, I can't tell 
what the problem is, but I do know that if there was a simpler solution, 
Cristi would have probably found it.


Andrei

Apr 30 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Apr 2009 20:52:05 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 People have complained about the incredibly bad, unsafe, and slow  
 behavior of appending. Even before the considerable recent slowdown of  
 ~=, there were speed concerns. Worse than that, the completely weird  
 behavior of slices that grow to stomp on other slices cannot be  
 defended.

  Actually, this is a problem due to an optimization in the current GC.  
 Fixing it is rather trivial: simply store the maximum requested array  
 end at either the beginning or end of the memory block.

 How does an arbitrary slice get to the beginning/end of the memory  
 block? I guess that's not a fat pointer anymore, it's an obese pointer.

Okay, so you have:

struct T[]{
     T* ptr;
     size_t length;
     MemInfo* memInfo;
}

struct MemInfo {
     size_t refCount; //not included if not reference counting
     size_t capacity;
}

And MemInfo is located in the first 2 words of the memory block.
Then the start of the memory block == memInfo*.
The end of the memory block to (cast(byte)memInfo* ) + memInfo.capacity;
T* and T*+length both point to locations between between the memory start  
and end (i.e. between memInfo* and memInfo* + capacity/(2*size_t) )

In terms of size, it keeps the pointer down to 3 words, which is only  
slightly chubbier than the current arrays. And given at the minimum you  
have to store a pointer the the ref count, its no larger than it has to  
be. (Exception: Manual memory management could with separate array types,  
could strip off this extra word)

 But the bigger [ (x ~ y) may or may not create a new array ]

 x ~ y always creates a new array.

Opps, sorry. x ~= y.

 I'm a little sad to hear you say this, as both Daniel Keep and I posted  
 sample pseudo code on how to do slice types properly in both reference  
 counted and manual managed memory systems.

 No need to be sad. The code is obvious. What I was saying is that people  
 actually need to get their hand on the container so they can control its  
 scope.

Could you clarify something for me. Is this a low-level implementation  
argument or a high-level semantic argument you're making?

 A design that has one container type and several slice types that  
 crawl it in various ways is very compelling, and I think it will be a  
 huge selling point for D. It would be odd, not natural, to have arrays  
 as a singularly distinct container that is in fact its own range. We  
 get away with container == range for arrays partly because arrays are  
 a simple structure, but that only blurs thinking and understanding of  
 more complex containers. In fact, ranges do predict that any attempt  
 to grow a range will cause topological issues in the owning container;  
 that's why SList wisely (IMHO :o)) includes a template parameter that  
 tells whether its topology is fixed or flexible.

  I'd disagree with regard to a selling point/compelling. My lab  
 maintains an open source C++ array/slice type pair as part of a larger  
 robotics package. It's laid outed with an ArrayBase containing all the  
 implementation. The Array type simply extends ArrayBase and adds memory  
 management. And Slice extends Arraybase, only adding binding  
 capabilities. And save for memory allocation, they are identical, i.e.  
 the container and range are interchangeable.

 How about a hashtable? A tree? Are the ranges crawling them identical  
 with the containers? Not at all.
 Arrays are probably the worst example to use as an archetype for a  
 design because they are very simple. Pretty much any design looks a bit  
 baroque when only applied to arrays.

My disagreement was limited to the disagree with regard to a selling  
point/compelling. Otherwise, I agree.
hmm... thinking out loud
Slices of lists, could be lists. But its a recipe for leaks and other  
issues.
Slices of trees, could be trees. Although these 'slices' couldn't actually  
walk the tree. (i.e. aren't ranges) And re-balancing causes logical bug.
But hash tables, queues, dequeues, stacks, etc don't have nice logical  
slices.

 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)

  I've been following the blog. I was surprised that Cristian didn't  
 just make all D arrays .NET ArraySegments and be done with it, but  
 there's probably issues I don't know about. I'd like to hear more on  
 what exactly the issue was.

 We discussed at length. Without being an expert in .NET, I can't tell  
 what the problem is, but I do know that if there was a simpler solution,  
 Cristi would have probably found it.


 Andrei

Apr 30 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-30 20:52:05 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 (Last but not least, .NET has arrays and slices. D's slices that 
 actually have an array testis mesh real bad with .NET because in .NET 
 you can't have a disowned slice.)

 
 I've been following the blog. I was surprised that Cristian didn't just 
 make all D arrays .NET ArraySegments and be done with it, but there's 
 probably issues I don't know about. I'd like to hear more on what 
 exactly the issue was.

 
 We discussed at length. Without being an expert in .NET, I can't tell 
 what the problem is, but I do know that if there was a simpler 
 solution, Cristi would have probably found it.

As I understand it, the issue is that the .NET API generally doesn't 
expect ArraySegments, but Array, so Cristi wants T[] to be an array, 
not an array segment, in order to make the API usable without having to 
duplicate ArraySegments into Arrays every time. I'm doubtful having a 
standard container type will help that much, as T[] (the slice) still 
won't be accepted by the .NET API and the programmer would simply have 
to always use the container.

That the .NET framework wants to use containers everywhere instead of 
slices is not a reason for D to bend to .NET concept of arrays and 
slices.

But...

That makes me think that perhaps my concept of "unique T[]" as the 
container (see my previous post) might be interesting. Cristi could 
make the compiler keep "unique T[]" in an Array and implicitly convert 
it to an ArraySegment as soon as it becomes non-unique. That would mean 
the .NET API could accept regular D dynamic arrays, as long as they're 
unique.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 06:45:20 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-04-30 20:52:05 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 (Last but not least, .NET has arrays and slices. D's slices that  
 actually have an array testis mesh real bad with .NET because in .NET  
 you can't have a disowned slice.)

  I've been following the blog. I was surprised that Cristian didn't  
 just make all D arrays .NET ArraySegments and be done with it, but  
 there's probably issues I don't know about. I'd like to hear more on  
 what exactly the issue was.

  We discussed at length. Without being an expert in .NET, I can't tell  
 what the problem is, but I do know that if there was a simpler  
 solution, Cristi would have probably found it.

 As I understand it, the issue is that the .NET API generally doesn't  
 expect ArraySegments, but Array, so Cristi wants T[] to be an array, not  
 an array segment, in order to make the API usable without having to  
 duplicate ArraySegments into Arrays every time. I'm doubtful having a  
 standard container type will help that much, as T[] (the slice) still  
 won't be accepted by the .NET API and the programmer would simply have  
 to always use the container.

 That the .NET framework wants to use containers everywhere instead of  
 slices is not a reason for D to bend to .NET concept of arrays and  
 slices.

I agree. D shouldn't go borrowing bad design from other languages.

May 01 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-30 15:57:17 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 As predicted by the "range theory" which predicates that a range will 
 never grow, only shrink, slices should always shrink. Growing is a 
 foreign operation for a slice. This is a big problem that goes beyond a 
 need for purity or cleanliness - it's just terrible to have a slice 
 stomping around, and then uncontrollably being rebound elsewhere and so 
 on. People pass slices into functions, functions modify them and also 
 append to them, and depending on the phase of the moon, some updates 
 are being seen in the caller.

I think D slices are a good concept for pointers to a fixed-size array. 
If you write new int[4], you're creating a fixed-size array of four 
ints, and getting a slice to the whole. If you concatenate it with some 
other array, it creates a new one.

As you like to point out, the fun starts when you have arrays of 
mutable elements and want to resize them. If you have an array of four 
elements and write array.length = 5, then you're perhaps creating a new 
one, disconnected from the previous one, and perhaps not. This pose no 
problem if your slice is the only one pointing to the memory block, 
otherwise the results are rather unpredictable.

Well, containers are no better at that. If you have some slices of a 
container and you grow the container you may or may not disconnect all 
your slices (unless you're creating an indirect slice as {container*, 
int start, int end} or course).

Which makes me think that perhaps we could introduce a unique type 
attribute and just forbid growing dynamic arrays unless they are 
flagged as unique. That'd make "unique T[]" behave as the container and 
"T[]" as the slice. And perhaps we could build that on top of the 
concept of uniqueness Bartosz wants to be expressible in D for the 
concurrency model, as he said in his latest post:

 In traditional type systems there is no way to express the requirement 
 that a message passed by reference must either be a monitor itself or 
 behave like a C++ unique_ptr (an object that leaves no aliases behind). 
 This requirement should be expressible in D, allowing the compiler to 
 check for its violations.

<http://bartoszmilewski.wordpress.com/2009/04/26/multithreading-with-d/>

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 06:26:19 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-04-30 15:57:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 As predicted by the "range theory" which predicates that a range will  
 never grow, only shrink, slices should always shrink. Growing is a  
 foreign operation for a slice. This is a big problem that goes beyond a  
 need for purity or cleanliness - it's just terrible to have a slice  
 stomping around, and then uncontrollably being rebound elsewhere and so  
 on. People pass slices into functions, functions modify them and also  
 append to them, and depending on the phase of the moon, some updates  
 are being seen in the caller.

 I think D slices are a good concept for pointers to a fixed-size array.  
 If you write new int[4], you're creating a fixed-size array of four  
 ints, and getting a slice to the whole. If you concatenate it with some  
 other array, it creates a new one.

 As you like to point out, the fun starts when you have arrays of mutable  
 elements and want to resize them. If you have an array of four elements  
 and write array.length = 5, then you're perhaps creating a new one,  
 disconnected from the previous one, and perhaps not. This pose no  
 problem if your slice is the only one pointing to the memory block,  
 otherwise the results are rather unpredictable.

 Well, containers are no better at that. If you have some slices of a  
 container and you grow the container you may or may not disconnect all  
 your slices (unless you're creating an indirect slice as {container*,  
 int start, int end} or course).

 Which makes me think that perhaps we could introduce a unique type  
 attribute and just forbid growing dynamic arrays unless they are flagged  
 as unique. That'd make "unique T[]" behave as the container and "T[]" as  
 the slice. And perhaps we could build that on top of the concept of  
 uniqueness Bartosz wants to be expressible in D for the concurrency  
 model, as he said in his latest post:

 In traditional type systems there is no way to express the requirement  
 that a message passed by reference must either be a monitor itself or  
 behave like a C++ unique_ptr (an object that leaves no aliases behind).  
 This requirement should be expressible in D, allowing the compiler to  
 check for its violations.

 <http://bartoszmilewski.wordpress.com/2009/04/26/multithreading-with-d/>

I think you might of misunderstood the concept of a unique/mobile type. If  
you take a slice or a reference of an unique, you null the source. Which  
makes using unique T logically invalid as a container. Also, shared unique  
T can't be converted to shared T. This incompatibility is the whole point  
of having a unique type in the first place, and negates your concept of  
using unique T as containers. And generally, unique T implies shared  
storage, and can not be used for thread-local arrays.

Lastly, what you're trying to fix is a bug in the implementation of arrays  
(see my posts in this thread) and not a bug in the design.

May 01 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 Lastly, what you're trying to fix is a bug in the implementation of 
 arrays (see my posts in this thread) and not a bug in the design.

I think your fix to the array-slice conflation is simply to make slices 
arrays by adding extra information. This approach is reasonable, but (1) 
will make slices bigger and slower, and (2) will port very poorly to 
most other containers.

Code using ad-hoc arrays in C often had to pass two words around 
(usually pointers+length). Similarly, code using iterators in C++ must 
pass two pointers/iterators around (begin+end). With D, we have a 1-1 
correspondence of that situation with slices, which makes for a great 
argument on why a slice that contains two boundaries is an awesome way 
to encapsulate a concept at no efficiency cost. Adding a third word to 
the slice would mark a step backwards in terms of data being trafficked 
around, and a questionable increase in expressiveness. An obese 
three-words slice would avoid stomping over other slices upon append, 
but would still exhibit a rather unpredictable append behavior: a 
function taking a slice and appending to it may or may not publish its 
changes to its caller.

A good design is to have containers and ranges, not container_ranges.


Andrei

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 14:00:35 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 Lastly, what you're trying to fix is a bug in the implementation of  
 arrays (see my posts in this thread) and not a bug in the design.

 I think your fix to the array-slice conflation is simply to make slices  
 arrays by adding extra information. This approach is reasonable, but (1)  
 will make slices bigger and slower, and

The extra information is on the heap, not on the stack, so slices are no  
bigger or slower than normal (for GC or reference counted memory; fully  
manual memory management could have a 3 word array type and a 2 word slice  
type)

 (2) will port very poorly to most other containers.

Well, these bugs don't affect other containers.

 Code using ad-hoc arrays in C often had to pass two words around  
 (usually pointers+length). Similarly, code using iterators in C++ must  
 pass two pointers/iterators around (begin+end). With D, we have a 1-1  
 correspondence of that situation with slices, which makes for a great  
 argument on why a slice that contains two boundaries is an awesome way  
 to encapsulate a concept at no efficiency cost. Adding a third word to  
 the slice would mark a step backwards in terms of data being trafficked  
 around, and a questionable increase in expressiveness. An obese  
 three-words slice would avoid stomping over other slices upon append,  
 but would still exhibit a rather unpredictable append behavior: a  
 function taking a slice and appending to it may or may not publish its  
 changes to its caller.

I have not suggested obese 3-slices (in general). Let's look at actual  
costs of your previous array + slice proposal and my array/slice proposal:

GC:
Array       2 words (begin*, end*)
Slice       2 words (begin*, end*)
Array/Slice 2 words (ptr*, length)

Reference counting:
Array       4 words (begin*, end*, end-of-store*, refcount*)
Slice       3 words (begin*, end*, owner*)
Array/Slice 3 words (begin*, end*, memInfo*)

Manual memory managment:
Array       3 words (begin*, end*, end-of-store*)
Slice       2 words (begin*, end*)
Array/Slice 3 words (ptr*, length, memInfo*)

And you known something interesting: a 4-word alice is actually faster  
that a 2 word slice, _if_ you use MMX or SSE instructions (even unaligned,  
though aligned is even faster).

 A good design is to have containers and ranges, not container_ranges.

I agree in general. But, containers don't talk the same language, ranges  
do. So array slices having a few extra features (i.e ~=), doesn't bother  
my abstractions: it's still a range.

May 01 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-05-01 13:44:58 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I think you might of misunderstood the concept of a unique/mobile type. 
 If  you take a slice or a reference of an unique, you null the source. 
 Which  makes using unique T logically invalid as a container.

Take note: I'm not expecting it to copy automatically, "unique T[]" 
isn't a value-type container. I'm solving the problem of slices 
detaching from one another when you grow them, which I don't think is a 
problem addressed by value-type containers either. If you grow the 
slice or container, previously-taken slices of it may or may not be 
left dangling. Allowing grow only on "unique T[]" makes sure there are 
no slice to be invalidated when you grow.

Also, generally type systems featuring unique also have that "lent" 
flag allowing you to pass the unique reference to functions which are 
required to not keep it anywhere after the function returns. So you 
could pass lent slices of your unique T[] to functions without breaking 
uniqueness of the reference.


 Also, shared unique  T can't be converted to shared T. This 
 incompatibility is the whole point  of having a unique type in the 
 first place, and negates your concept of  using unique T as containers.

Why couldn't it? You just lose (forever) the uniqueness of the 
reference when you copy it to a non-unique one.


 And generally, unique T implies shared  storage, and can not be used 
 for thread-local arrays.

Generally, but not necessarily. Any problem for unique be applicable to 
thread-local arrays?


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 06:23:46 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-05-01 13:44:58 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I think you might of misunderstood the concept of a unique/mobile type.  
 If  you take a slice or a reference of an unique, you null the source.  
 Which  makes using unique T logically invalid as a container.

 Take note: I'm not expecting it to copy automatically, "unique T[]"  
 isn't a value-type container. I'm solving the problem of slices  
 detaching from one another when you grow them, which I don't think is a  
 problem addressed by value-type containers either. If you grow the slice  
 or container, previously-taken slices of it may or may not be left  
 dangling. Allowing grow only on "unique T[]" makes sure there are no  
 slice to be invalidated when you grow.

 Also, generally type systems featuring unique also have that "lent" flag  
 allowing you to pass the unique reference to functions which are  
 required to not keep it anywhere after the function returns. So you  
 could pass lent slices of your unique T[] to functions without breaking  
 uniqueness of the reference.

By the way, the canonical system with a unique/mobile type (CSP) doesn't  
have a lent flag. It lets you circumvent uniqueness instead. Scope/lent's  
definitely better :)

Hmm... lent functions are a fairly restrictive subset, if you consider  
unique T[] to be long lived. However, in the more common string builder  
style of programming, it works just fine. (Though people will still want a  
capacity field). So maybe an array type with unique semantics and a  
capacity field is what's warranted (either language or library type).

Would you also allow T[] ~=? i.e. x~=y; => x = x~y; which is less  
efficient but still valid.

 Also, shared unique  T can't be converted to shared T. This  
 incompatibility is the whole point  of having a unique type in the  
 first place, and negates your concept of  using unique T as containers.

 Why couldn't it? You just lose (forever) the uniqueness of the reference  
 when you copy it to a non-unique one.

The whole point of unique is that it lacks any sharing protection  
mechanisms. Instead the object is protected by the fact that only one  
thread has access to it at a time. You could design the system so that  
uniques could be casted to local objects (I don't think it's a good idea,  
but you could), but casting them to shared objects is a recipe for  
disaster, due to races, etc.

 And generally, unique T implies shared  storage, and can not be used  
 for thread-local arrays.

 Generally, but not necessarily. Any problem for unique be applicable to  
 thread-local arrays?

Nope, the implication comes from the standard use case of cheaply passing  
data between threads.

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

Andrei, how do you do shared, lock-free containers as value types?

Apr 30 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.

Sigh! Already today, and even more tomorrow, this is often false for D too. In
my computer I have a cheap GPU that is sleeping while my D code runs. Even my
other core sleeps. And I am using one core at 32 bits only.
You will need ways to data-parallelize and other forms of parallel processing.
So maybe nornmal loops will not cuti it. I think the best way to go is
following the ideas of the Chapel language I have shown a bit here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=87311

Bye,
bearophile

May 01 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com> wrote=
:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops. =A0This is thankfully a problem that D array libs do not
 have. =A0If you think of it as a loop, go ahead and implement it as a
 loop.

 Sigh! Already today, and even more tomorrow, this is often false for D to=

o. In my computer I have a cheap GPU that is sleeping while my D code runs.=
 Even my other core sleeps. And I am using one core at 32 bits only.
 You will need ways to data-parallelize and other forms of parallel proces=

sing. So maybe nornmal loops will not cuti it.

Yeh.  If you want to use multiple cores you've got a whole 'nother can
o worms.  But at least I find that today most apps seem get by just
fine using a single core.  Strange though, aren't you the guy always
telling us how being able to express your algorithm clearly is often
more important than raw performance?

--bb

May 01 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 21:14:54 -0400, Bill Baxter <wbaxter gmail.com> wrote:

 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com>  
 wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops. �This is thankfully a problem that D array libs do not
 have. �If you think of it as a loop, go ahead and implement it as a
 loop.

 Sigh! Already today, and even more tomorrow, this is often false for D  
 too. In my computer I have a cheap GPU that is sleeping while my D code  
 runs. Even my other core sleeps. And I am using one core at 32 bits  
 only.
 You will need ways to data-parallelize and other forms of parallel  
 processing. So maybe nornmal loops will not cuti it.

 Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?

 --bb

Well, since I do GP-GPU work, the GPU algorithm is much algorithmically  
cleaner than the CPU algorithm. :) But I do know that this is very  
algorithm dependent.

May 02 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.

 Sigh! Already today, and even more tomorrow, this is often false for D too. In
my computer I have a cheap GPU that is sleeping while my D code runs. Even my
other core sleeps. And I am using one core at 32 bits only.
 You will need ways to data-parallelize and other forms of parallel processing.
So maybe nornmal loops will not cuti it.

 
 Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?
 
 --bb

I confess to being mighty skeptical about the whole multi-threaded, 
multi-core thing. I think we're going to find that there's only two 
practical uses of multi-core:
(1) embarressingly-parallel operations; and
(2) process-level concurrency.
I just don't believe that apps have as much opportunity for parallelism 
as people seem to think. There's just too many dependencies.
Sure, you can (say) with a game, split your AI into a seperate core from 
your graphics stuff, but that's only applicable for 2-4 cores. It 
doesn't work for 100+ cores.

(Which is why I think that broadening the opportunity for case (1) is 
the most promising avenue for actually using a host of cores).

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 04:17:29 -0400, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com>  
 wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.

 Sigh! Already today, and even more tomorrow, this is often false for D  
 too. In my computer I have a cheap GPU that is sleeping while my D  
 code runs. Even my other core sleeps. And I am using one core at 32  
 bits only.
 You will need ways to data-parallelize and other forms of parallel  
 processing. So maybe nornmal loops will not cuti it.

  Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?
  --bb

 I confess to being mighty skeptical about the whole multi-threaded,  
 multi-core thing. I think we're going to find that there's only two  
 practical uses of multi-core:
 (1) embarressingly-parallel operations; and
 (2) process-level concurrency.
 I just don't believe that apps have as much opportunity for parallelism  
 as people seem to think. There's just too many dependencies.
 Sure, you can (say) with a game, split your AI into a seperate core from  
 your graphics stuff, but that's only applicable for 2-4 cores. It  
 doesn't work for 100+ cores.

 (Which is why I think that broadening the opportunity for case (1) is  
 the most promising avenue for actually using a host of cores).

Actually, AI is mostly embarrassingly-parallel. The issue is the mostly  
part of that statement, which is why optimized reader/writer locks and STM  
are showing up in game engines. And really, >90% of the CPU time has been  
physics, which is both embarrassingly-parallel and being off-loaded onto  
the GPU/multi-cores.

May 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Don wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com> 
 wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.

 Sigh! Already today, and even more tomorrow, this is often false for 
 D too. In my computer I have a cheap GPU that is sleeping while my D 
 code runs. Even my other core sleeps. And I am using one core at 32 
 bits only.
 You will need ways to data-parallelize and other forms of parallel 
 processing. So maybe nornmal loops will not cuti it.

 Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?

 --bb

 
 I confess to being mighty skeptical about the whole multi-threaded, 
 multi-core thing. I think we're going to find that there's only two 
 practical uses of multi-core:
 (1) embarressingly-parallel operations; and
 (2) process-level concurrency.
 I just don't believe that apps have as much opportunity for parallelism 
 as people seem to think. There's just too many dependencies.
 Sure, you can (say) with a game, split your AI into a seperate core from 
 your graphics stuff, but that's only applicable for 2-4 cores. It 
 doesn't work for 100+ cores.

I had this bad dream where there's a language where it's trivial to use 
multiple CPUs. And I could see every Joe and John executing their 
trivial apps, each of which used all available CPUs. They had their 
programs and programlets run twice or four times as fast, but most of 
them ran in less than a couple of seconds anyway, and the longer ones 
spent most of their time waiting for external resources.

All it ended up with was a lot of work for the OS, the total throughput 
of the computer decreasing because now every CPU had to deal with every 
process, not to mention the increase in electricity consumption and heat 
because none of the CPUs could rest. And still nobody was using the GPU, 
MMX, SSE, etc.

Most of these programs consisted of sequences, with the odd selection or 
short iteration spread far and apart. And none of them used 
parallellizable data.

 (Which is why I think that broadening the opportunity for case (1) is 
 the most promising avenue for actually using a host of cores).

The more I think about it, the more I'm starting to believe that the 
average desktop or laptop won't see two dozen cores in the immediate 
future. And definitely, by the time there are more cores than processes 
on the average Windows PC, we're talking about gross wasteage.

OTOH, Serious Computing is different, of course. Corporate machine rooms 
would benefit from many cores. Virtual host servers, heavy-duty web 
servers, and of course scientific and statistical computing come to mind.

It's interesting to note that in the old days, machine room computers 
were totally different from PCs. Then they sort-of got together, as in 
machine rooms all of a sudden filled with regular PCs running Linux. And 
now, I see the trend again separating the PC from the machine room 
computers. Software for the latter might be the target for language 
features that utilize multiple CPUs.

May 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Georg Wrede:
 The more I think about it, the more I'm starting to believe that the 
 average desktop or laptop won't see two dozen cores in the immediate 
 future.

Too much late. My personal computer has already about 100 small cores in the
GPU, and using CUDA (and soon OpenCL) you are even able to use them for almost
general computations...
And then comes Nehalem:
http://en.wikipedia.org/wiki/Intel_Nehalem_(microarchitecture)

Bye,
bearophile

May 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

bearophile wrote:
 Georg Wrede:
 The more I think about it, the more I'm starting to believe that the 
 average desktop or laptop won't see two dozen cores in the immediate 
 future.

 
 Too much late. My personal computer has already about 100 small cores in the
GPU, and using CUDA (and soon OpenCL) you are even able to use them for almost
general computations...
 And then comes Nehalem:
 http://en.wikipedia.org/wiki/Intel_Nehalem_(microarchitecture)


http://en.wikipedia.org/wiki/Simultaneous_multithreading:

The latest MIPS architecture designs include an SMT system known as
"MIPS MT". MIPS MT provides for both heavyweight virtual processing
elements and lighter-weight hardware microthreads. RMI, a
Cupertino-based startup, is the first MIPS vendor to provide a processor
SOC based on 8 cores, each of which runs 4 threads. The threads can be
run in fine-grain mode where a different thread can be executed each
cycle. The threads can also be assigned priorities.

I hope this stuff is for the machine room. :-)

And recession helps...

May 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
Strange though, aren't you the guy always telling us how being able to express
your algorithm clearly is often more important than raw performance?<

Nope. I have said many things :-) One of the things I have said is that most of
the code of most programs isn't performance-critical, so for the
non-performance-critical code it's better to have idioms that allow to avoid
bugs and write in a short and expressive way. Forcing the programmer to write
all the program in a low level or all of it in a high level way is bad for
programmer productivity or overall program efficiency.

Also, regarding 'express your algorithm clearly' many of the things inside the
Chapel language look clear and not too much difficult to learn and use. I have
compiled the Chapel compiler and I have tried to write some small programs,
such experiments show me that I am able to write them with not many problems. I
can learn 20-30% of such language in a matter of hours. So it shows ways to
have fast data-parallel code that is simple enough to write. For example I am
trying to implement tiled-based 2D matrix processing in D. It's not an easy
thing to do in D, if you want to do it efficiently. In Chapel such (basic, for
efficiency) thing is quite easy, and partially done by the compiler already.

Bye,
bearophile

May 02 2009

D Programming

C/C++ Programming

Other

digitalmars.D - RFC: naming for FrontTransversal and Transversal ranges