digitalmars.D - what to do with postblit on the heap?

Steven Schveighoffer (29/29) Jun 20 2011 I have submitted a fix for bug 5272,

bearophile (7/11) Jun 20 2011 I think the current situation is not acceptable. This is a problem quite...

Steven Schveighoffer (22/40) Jun 20 2011 The compiler is the one passing the parameters to _d_arraycopy, so even ...
Jonathan M Davis (5/13) Jun 20 2011 Plain Old Datatype. It's a user-defined data type with member variables ...

bearophile (5/16) Jun 20 2011 Given that D is a system language, and the general usefulness and ubiqui...
Michel Fortin (17/53) Jun 20 2011 My feeling is that array appending and array assignment should be

Steven Schveighoffer (21/63) Jun 20 2011 BTW, I now feel that your request to make a distinction between move and...

Jonathan M Davis (5/70) Jun 20 2011 If an object is moved, neither the postblit nor the destructor should be...

Steven Schveighoffer (13/23) Jun 20 2011 Well, I think in this case it is being copied. It's put on the stack, a...

Jonathan M Davis (22/47) Jun 20 2011 Well, going from the stack to the heap probably is a copy. But moves sho...

Michel Fortin (29/59) Jun 20 2011 Well, if

Jonathan M Davis (9/37) Jun 20 2011 I would expect that to have move semantics. There's no need to create an...
Steven Schveighoffer (52/99) Jun 21 2011 Good question. I don't even know how the runtime could avoid calling

Michel Fortin (23/69) Jun 21 2011 ... and in the special case where the reference is a rvalue, then it

Steven Schveighoffer (10/16) Jun 21 2011 Another issue with appending a @disabled-postblit struct, what happens

Michel Fortin (10/27) Jun 21 2011 That's indeed a problem.

so (16/31) Jun 21 2011 It should be something else because move(tmp) in std.algorithm takes by ...

Michel Fortin (17/53) Jun 21 2011 Actually, no copy is needed. Move takes the argument by ref so it can

so (11/16) Jun 21 2011 T move(ref T a) {

Michel Fortin (17/37) Jun 21 2011 Actually, that depends on how you look at this.
Andrei Alexandrescu (6/23) Jun 21 2011 The rule that move and TDPL rely on but is not fully implemented is that...

Sean Kelly (5/7) Jun 21 2011 that returning a nonstatic local value never does a postblit nor a =

Andrei Alexandrescu (4/8) Jun 21 2011 Illegal. All D structs must be transparently relocatable without

so (5/10) Jun 21 2011 There was a similar discussion on struct constructors which ended up

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

I have submitted a fix for bug 5272,  
http://d.puremagic.com/issues/show_bug.cgi?id=5272 "Postblit not called on  
copying due to array append"

However, I am starting to realize that one of the major reasons for  
postblit is to match it with an equivalent dtor.

This works well when the struct is on the stack -- the posblit for  
instance increments a reference counter, then the dtor decrements the ref  
counter.

But when the data is on the heap, the destructor is *not* called.  So what  
happens to any ref-counted data that is on the heap?  It's never  
decremented.  Currently though, it might still work, because postblit  
isn't called when the data is on the heap!  So no increment, no decrement.

I think this is an artificial "success".  However, if the pull request I  
initiated is accepted, then postblit *will* be called on heap allocation,  
for instance if you append data.  This will further highlight the fact  
that the destructor is not being called.

So is it worth adding calls to postblit, knowing that the complement  
destructor is not going to be called?  I can see in some cases where it  
would be expected, and I can see other cases where it will be difficult to  
deal with.  IMO, the difficult cases are already broken anyways, but it  
just seems like they are not.

The other part of this puzzle that is missing is array assignment, for  
example a[] = b[] does not call postblits.  I cannot fix this because  
_d_arraycopy does not give me the typeinfo.

Anyone else have any thoughts?  I'm mixed as to whether this patch should  
be accepted without more comprehensive GC/compiler reform.  I feel its a  
step in the right direction, but that it will upset the balance in a few  
places (particularly ref-counting).

-Steve

Jun 20 2011

bearophile <bearophileHUGS lycos.com> writes:

Steven Schveighoffer:

 The other part of this puzzle that is missing is array assignment, for  
 example a[] = b[] does not call postblits.  I cannot fix this because  
 _d_arraycopy does not give me the typeinfo.

This seems fixable. Is it possible to rewrite _d_arraycopy?


 Anyone else have any thoughts?

I think the current situation is not acceptable. This is a problem quite worse
than _d_arraycopy because here some information is missing. Isn't this is the
same problem with struct destructors?

A solution is to add this information at runtime, a type tag to structs that
have a postblit and/or destructor. But then structs aren't PODs any more. There
are other places to store this information, like in some kind of associative
array.

Another solution is to forbid what the compiler can't guarantee. If a struct is
going to be used only where its type is known, then it's allowed to have
postblit and destructor. Is it possible to enforce this? I think it is. Here an
 annotation is useful to better manage this contract between programmer and
compiler.

Bye,
bearophile

Jun 20 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 20 Jun 2011 11:03:27 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:

 The other part of this puzzle that is missing is array assignment, for
 example a[] = b[] does not call postblits.  I cannot fix this because
 _d_arraycopy does not give me the typeinfo.

 This seems fixable. Is it possible to rewrite _d_arraycopy?

The compiler is the one passing the parameters to _d_arraycopy, so even if  
I change _d_arraycopy to accept a TypeInfo, the compiler needs to be fixed  
to send the TypeInfo.

I think this is really a no-brainer, because currently what is passed is  
the element size, which is contained within the TypeInfo.  I will be  
filing a bug on that.  But currently, I can't fix it.

 Anyone else have any thoughts?

 I think the current situation is not acceptable. This is a problem quite  
 worse than _d_arraycopy because here some information is missing. Isn't  
 this is the same problem with struct destructors?

This is an easy fix -- the typeinfo contains information of whether or not  
and how to run the postblit.  The larger problem is the GC not calling the  
destructor.

But my immediate question is -- is it better to half-fix the problem by  
committing my changes, or leave the issue alone?

 A solution is to add this information at runtime, a type tag to structs  
 that have a postblit and/or destructor. But then structs aren't PODs any  
 more. There are other places to store this information, like in some  
 kind of associative array.

Any solution that fixes the GC problem will have to store the typeinfo  
somehow associated with the block.  I think we may have more traction for  
this problem with a precise GC.

I don't think the right route is to store type info inside the struct  
itself.  This added overhead is not necessary for when the struct is  
stored on the stack.

 Another solution is to forbid what the compiler can't guarantee. If a  
 struct is going to be used only where its type is known, then it's  
 allowed to have postblit and destructor. Is it possible to enforce this?  
 I think it is. Here an  annotation is useful to better manage this  
 contract between programmer and compiler.

This is a possibility, making a struct only usable if it's inside another  
such struct or inside a class, or on the stack.

-Steve

Jun 20 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-06-20 11:56, Jose Armando Garcia wrote:
 On Mon, Jun 20, 2011 at 12:03 PM, bearophile <bearophileHUGS lycos.com> 

wrote:
 Steven Schveighoffer:
 A solution is to add this information at runtime, a type tag to structs
 that have a postblit and/or destructor. But then structs aren't PODs any
 more. There are other places to store this information, like in some
 kind of associative array.

 
 What are PODs?

Plain Old Datatype. It's a user-defined data type with member variables but no 
functions. It just holds data.

- Jonathan M Davis

Jun 20 2011

bearophile <bearophileHUGS lycos.com> writes:

Steven Schveighoffer:

 But my immediate question is -- is it better to half-fix the problem by  
 committing my changes, or leave the issue alone?

I suggest to leave the issue alone.


 Any solution that fixes the GC problem will have to store the typeinfo  
 somehow associated with the block.  I think we may have more traction for  
 this problem with a precise GC.
 
 I don't think the right route is to store type info inside the struct  
 itself.  This added overhead is not necessary for when the struct is  
 stored on the stack.

 This is a possibility, making a struct only usable if it's inside another  
 such struct or inside a class, or on the stack.

Given that D is a system language, and the general usefulness and ubiquity of
structs, a third possibility is to do both and add an attribute to help
enforcing what can't be done on PODs, or add more runtime info _on request_
where the programmer wants more flexible structs. This solves the situation,
but has the disadvantage of increasing D complexity a little.

Bye,
bearophile

Jun 20 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-06-20 10:34:14 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 I have submitted a fix for bug 5272,  
 http://d.puremagic.com/issues/show_bug.cgi?id=5272 "Postblit not called 
 on  copying due to array append"
 
 However, I am starting to realize that one of the major reasons for  
 postblit is to match it with an equivalent dtor.
 
 This works well when the struct is on the stack -- the posblit for  
 instance increments a reference counter, then the dtor decrements the 
 ref  counter.
 
 But when the data is on the heap, the destructor is *not* called.  So 
 what  happens to any ref-counted data that is on the heap?  It's never  
 decremented.  Currently though, it might still work, because postblit  
 isn't called when the data is on the heap!  So no increment, no 
 decrement.
 
 I think this is an artificial "success".  However, if the pull request 
 I  initiated is accepted, then postblit *will* be called on heap 
 allocation,  for instance if you append data.  This will further 
 highlight the fact  that the destructor is not being called.
 
 So is it worth adding calls to postblit, knowing that the complement  
 destructor is not going to be called?  I can see in some cases where it 
  would be expected, and I can see other cases where it will be 
 difficult to  deal with.  IMO, the difficult cases are already broken 
 anyways, but it  just seems like they are not.
 
 The other part of this puzzle that is missing is array assignment, for  
 example a[] = b[] does not call postblits.  I cannot fix this because  
 _d_arraycopy does not give me the typeinfo.
 
 Anyone else have any thoughts?  I'm mixed as to whether this patch 
 should  be accepted without more comprehensive GC/compiler reform.  I 
 feel its a  step in the right direction, but that it will upset the 
 balance in a few  places (particularly ref-counting).

My feeling is that array appending and array assignment should be 
considered a compiler issue first and foremost. The compiler needs to 
be fixed, and once that's done the runtime will need to be updated 
anyway to match the changes in the compiler. Your proposed fix for 
array assignment is a good start for when the compiler will provide the 
necessary info to the runtime, but applying it at this time will just 
fix some cases by breaking a few others: net improvement zero.

As for the issue that destructors aren't called for arrays on the heap, 
it's a serious problem. But it's also a separate problem that concerns 
purely the runtime, as far as I am aware of. Is there someone working 
on it?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 20 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 20 Jun 2011 16:45:44 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2011-06-20 10:34:14 -0400, "Steven Schveighoffer"  
 <schveiguy yahoo.com> said:

 I have submitted a fix for bug 5272,   
 http://d.puremagic.com/issues/show_bug.cgi?id=5272 "Postblit not called  
 on  copying due to array append"
  However, I am starting to realize that one of the major reasons for   
 postblit is to match it with an equivalent dtor.
  This works well when the struct is on the stack -- the posblit for   
 instance increments a reference counter, then the dtor decrements the  
 ref  counter.
  But when the data is on the heap, the destructor is *not* called.  So  
 what  happens to any ref-counted data that is on the heap?  It's never   
 decremented.  Currently though, it might still work, because postblit   
 isn't called when the data is on the heap!  So no increment, no  
 decrement.
  I think this is an artificial "success".  However, if the pull request  
 I  initiated is accepted, then postblit *will* be called on heap  
 allocation,  for instance if you append data.  This will further  
 highlight the fact  that the destructor is not being called.
  So is it worth adding calls to postblit, knowing that the complement   
 destructor is not going to be called?  I can see in some cases where it  
  would be expected, and I can see other cases where it will be  
 difficult to  deal with.  IMO, the difficult cases are already broken  
 anyways, but it  just seems like they are not.
  The other part of this puzzle that is missing is array assignment,  
 for  example a[] = b[] does not call postblits.  I cannot fix this  
 because  _d_arraycopy does not give me the typeinfo.
  Anyone else have any thoughts?  I'm mixed as to whether this patch  
 should  be accepted without more comprehensive GC/compiler reform.  I  
 feel its a  step in the right direction, but that it will upset the  
 balance in a few  places (particularly ref-counting).

 My feeling is that array appending and array assignment should be  
 considered a compiler issue first and foremost. The compiler needs to be  
 fixed, and once that's done the runtime will need to be updated anyway  
 to match the changes in the compiler. Your proposed fix for array  
 assignment is a good start for when the compiler will provide the  
 necessary info to the runtime, but applying it at this time will just  
 fix some cases by breaking a few others: net improvement zero.

BTW, I now feel that your request to make a distinction between move and  
copy is not required.  The compiler currently calls the destructor of  
temporaries, so it should also call postblit.  I don't think it can make  
the distinction between array appending and simply calling some other  
function.

If the issue of array assignment is fixed, do you think it's worth putting  
the change in, and then filing a bug against the GC?  I still think the  
current cases that "work" are fundamentally broken anyways.

For instance, in a ref-counted struct, if you appended it to an array,  
then removed all the stack-based references, the ref count goes to zero,  
even though the array still has a reference (I think someone filed a bug  
against std.stdio.File for this).

 As for the issue that destructors aren't called for arrays on the heap,  
 it's a serious problem. But it's also a separate problem that concerns  
 purely the runtime, as far as I am aware of. Is there someone working on  
 it?

I think we need precise scanning to get a complete solution.  Another  
option is to increase the information the array runtime stores in the  
memory block (currently it only stores the "used" length) and then hook  
the GC to call the dtors.  This might be a quick fix that doesn't require  
precise scanning, but it also fixes the most common case of allocating a  
single struct or an array of structs on the heap.

-Steve

Jun 20 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On 2011-06-20 15:12, Steven Schveighoffer wrote:
 On Mon, 20 Jun 2011 16:45:44 -0400, Michel Fortin
 
 <michel.fortin michelf.com> wrote:
 On 2011-06-20 10:34:14 -0400, "Steven Schveighoffer"
 
 <schveiguy yahoo.com> said:
 I have submitted a fix for bug 5272,
 http://d.puremagic.com/issues/show_bug.cgi?id=5272 "Postblit not called
 on copying due to array append"
 
 However, I am starting to realize that one of the major reasons for
 
 postblit is to match it with an equivalent dtor.
 
 This works well when the struct is on the stack -- the posblit for
 
 instance increments a reference counter, then the dtor decrements the
 ref counter.
 
 But when the data is on the heap, the destructor is *not* called. So
 
 what happens to any ref-counted data that is on the heap? It's never
 decremented. Currently though, it might still work, because postblit
 isn't called when the data is on the heap! So no increment, no
 decrement.
 
 I think this is an artificial "success". However, if the pull request
 
 I initiated is accepted, then postblit *will* be called on heap
 allocation, for instance if you append data. This will further
 highlight the fact that the destructor is not being called.
 
 So is it worth adding calls to postblit, knowing that the complement
 
 destructor is not going to be called? I can see in some cases where it
 
 would be expected, and I can see other cases where it will be
 
 difficult to deal with. IMO, the difficult cases are already broken
 anyways, but it just seems like they are not.
 
 The other part of this puzzle that is missing is array assignment,
 
 for example a[] = b[] does not call postblits. I cannot fix this
 because _d_arraycopy does not give me the typeinfo.
 
 Anyone else have any thoughts? I'm mixed as to whether this patch
 
 should be accepted without more comprehensive GC/compiler reform. I
 feel its a step in the right direction, but that it will upset the
 balance in a few places (particularly ref-counting).

 
 My feeling is that array appending and array assignment should be
 considered a compiler issue first and foremost. The compiler needs to be
 fixed, and once that's done the runtime will need to be updated anyway
 to match the changes in the compiler. Your proposed fix for array
 assignment is a good start for when the compiler will provide the
 necessary info to the runtime, but applying it at this time will just
 fix some cases by breaking a few others: net improvement zero.

 
 BTW, I now feel that your request to make a distinction between move and
 copy is not required. The compiler currently calls the destructor of
 temporaries, so it should also call postblit. I don't think it can make
 the distinction between array appending and simply calling some other
 function.

If an object is moved, neither the postblit nor the destructor should be 
called. The object is moved, not copied and destroyed. I believe that TDPL is 
very specific on that.

- Jonathan M Davis

Jun 20 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 20 Jun 2011 18:43:30 -0400, Jonathan M Davis <jmdavisProg gmx.com>
wrote:

 On 2011-06-20 15:12, Steven Schveighoffer wrote:
 BTW, I now feel that your request to make a distinction between move and
 copy is not required. The compiler currently calls the destructor of
 temporaries, so it should also call postblit. I don't think it can make
 the distinction between array appending and simply calling some other
 function.

 If an object is moved, neither the postblit nor the destructor should be
 called. The object is moved, not copied and destroyed. I believe that  
 TDPL is
 very specific on that.

Well, I think in this case it is being copied.  It's put on the stack, and
then copied to the heap inside the runtime function.  The runtime could be
passed a flag indicating the append is really a move, but I'm not sure
it's a good choice.  To me, not calling the postblit and dtor on a moved
struct is an optimization, no?  And you can't re-implement these semantics
for a normal function.  The one case I can think of is when an rvalue is
allowed to be passed by reference (which is exactly what's happening here).

Is there anything a postblit is allowed to do that would break a struct if
you disabled the postblit in this case?  I'm pretty sure internal pointers
are not supported, especially if move semantics do not call the postblit.

-Steve

Jun 20 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-06-20 16:07, Steven Schveighoffer wrote:
 On Mon, 20 Jun 2011 18:43:30 -0400, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 On 2011-06-20 15:12, Steven Schveighoffer wrote:
 BTW, I now feel that your request to make a distinction between move and
 copy is not required. The compiler currently calls the destructor of
 temporaries, so it should also call postblit. I don't think it can make
 the distinction between array appending and simply calling some other
 function.

 
 If an object is moved, neither the postblit nor the destructor should be
 called. The object is moved, not copied and destroyed. I believe that
 TDPL is
 very specific on that.

 
 Well, I think in this case it is being copied.  It's put on the stack, and
 then copied to the heap inside the runtime function.  The runtime could be
 passed a flag indicating the append is really a move, but I'm not sure
 it's a good choice.  To me, not calling the postblit and dtor on a moved
 struct is an optimization, no?  And you can't re-implement these semantics
 for a normal function.  The one case I can think of is when an rvalue is
 allowed to be passed by reference (which is exactly what's happening here).

Well, going from the stack to the heap probably is a copy. But moves shouldn't 
be calling the postblit or the destructor, and you seemed to be saying that 
they should. The main place that a move would occur that I can think would be 
when returning a value from a function, which is very different. And I don't 
think that avoiding the postblit is necessarily just an optimization. If the 
postblit really is skipped, then it's probably possible to return an object 
which cannot legally be copied (presumably due to some combination of 
reference or pointer member variables and const or immutable), though that 
wouldn't exactly be a typical situation, even if it actually is possible. It 
_is_ primarily an optimization to move rather than copy and destroy, but I'm 
not sure that it's _just_ an optimization.

 Is there anything a postblit is allowed to do that would break a struct if
 you disabled the postblit in this case?  I'm pretty sure internal pointers
 are not supported, especially if move semantics do not call the postblit.

If the struct had a pointer to a local member variable which the postblit 
would have deep-copied, then sure, not calling the postblit would screw with 
the struct. But that would screw with a struct which was returned from a 
function as well, and that's the prime place for the move semantics. That sort 
of struct is just plain badly designed, so I don't think that it's really 
something to worry about. I can't think of any other cases where it would be a 
problem though. Structs don't usually care where they live (aside from the 
issue of structs being designed to live on the stack and then not getting 
their destructor called because they're on the heap).

- Jonathan M Davis

Jun 20 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-06-20 18:12:11 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 On Mon, 20 Jun 2011 16:45:44 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 My feeling is that array appending and array assignment should be  
 considered a compiler issue first and foremost. The compiler needs to 
 be  fixed, and once that's done the runtime will need to be updated 
 anyway  to match the changes in the compiler. Your proposed fix for 
 array  assignment is a good start for when the compiler will provide 
 the  necessary info to the runtime, but applying it at this time will 
 just  fix some cases by breaking a few others: net improvement zero.

 
 BTW, I now feel that your request to make a distinction between move 
 and  copy is not required.  The compiler currently calls the destructor 
 of  temporaries, so it should also call postblit.  I don't think it can 
 make  the distinction between array appending and simply calling some 
 other  function.

Well, if

	a ~= S();

does result in a temporary which get copied and then destroyed, why 
have move semantics at all? Move semantics are not just an 
optimization, they actually change the semantics. If you have a struct 
with a  disabled postblit, should it still be appendable?


 If the issue of array assignment is fixed, do you think it's worth 
 putting  the change in, and then filing a bug against the GC?  I still 
 think the  current cases that "work" are fundamentally broken anyways.

That depends. I'm not too sure currently whether the S destructor is 
called for this code:

	a ~= S();

If the compiler currently calls the destructor on the temporary S 
struct, then your patch is actually a fix because it balances 
constructors and destructors correctly for the appending part (the bug 
is then that compiler should use move semantics but is using copy 
instead). If it doesn't call the destructor then your patch does 
introduce a bug for this case.

All in all, I don't think it's important enough to justify we waste 
hours debating in what order we should fix those bugs. Do what you 
think is right. If it becomes a problem or it introduces a bug here or 
there, we'll adjust, at worse that means a revert of your commit.


 As for the issue that destructors aren't called for arrays on the heap, 
  it's a serious problem. But it's also a separate problem that concerns 
  purely the runtime, as far as I am aware of. Is there someone working 
 on  it?

 
 I think we need precise scanning to get a complete solution.  Another  
 option is to increase the information the array runtime stores in the  
 memory block (currently it only stores the "used" length) and then hook 
  the GC to call the dtors.  This might be a quick fix that doesn't 
 require  precise scanning, but it also fixes the most common case of 
 allocating a  single struct or an array of structs on the heap.

The GC calling the destructor doesn't require precise scanning. 
Although it's true that both problems require adding type information 
to memory blocks, beyond that requirement they're both independent. 
It'd be really nice if struct destructors were called correctly.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 20 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-06-20 18:59, Michel Fortin wrote:
 On 2011-06-20 18:12:11 -0400, "Steven Schveighoffer"
 
 <schveiguy yahoo.com> said:
 On Mon, 20 Jun 2011 16:45:44 -0400, Michel Fortin
 
 <michel.fortin michelf.com> wrote:
 My feeling is that array appending and array assignment should be
 considered a compiler issue first and foremost. The compiler needs to
 be  fixed, and once that's done the runtime will need to be updated
 anyway  to match the changes in the compiler. Your proposed fix for
 array  assignment is a good start for when the compiler will provide
 the  necessary info to the runtime, but applying it at this time will
 just  fix some cases by breaking a few others: net improvement zero.

 
 BTW, I now feel that your request to make a distinction between move
 and  copy is not required.  The compiler currently calls the destructor
 of  temporaries, so it should also call postblit.  I don't think it can
 make  the distinction between array appending and simply calling some
 other  function.

 
 Well, if
 
 	a ~= S();
 
 does result in a temporary which get copied and then destroyed, why
 have move semantics at all? Move semantics are not just an
 optimization, they actually change the semantics. If you have a struct
 with a  disabled postblit, should it still be appendable?

I would expect that to have move semantics. There's no need to create and 
destroy a temporary. It's completely wasteful. A copy should only be happening 
when a copy _needs_ to happen. It doesn't need to happen here. Now, depending 
on what ~= did internally (assuming that it were an overloaded operator), then 
a copy may end up occurring inside of the function, but that shouldn't happen 
for the built-in ~= operator, and a well-written overloaded ~= should avoid 
the need to copy as well.

- Jonathan M Davis

Jun 20 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 20 Jun 2011 21:59:49 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2011-06-20 18:12:11 -0400, "Steven Schveighoffer"  
 <schveiguy yahoo.com> said:

 On Mon, 20 Jun 2011 16:45:44 -0400, Michel Fortin   
 <michel.fortin michelf.com> wrote:

 My feeling is that array appending and array assignment should be   
 considered a compiler issue first and foremost. The compiler needs to  
 be  fixed, and once that's done the runtime will need to be updated  
 anyway  to match the changes in the compiler. Your proposed fix for  
 array  assignment is a good start for when the compiler will provide  
 the  necessary info to the runtime, but applying it at this time will  
 just  fix some cases by breaking a few others: net improvement zero.

  BTW, I now feel that your request to make a distinction between move  
 and  copy is not required.  The compiler currently calls the destructor  
 of  temporaries, so it should also call postblit.  I don't think it can  
 make  the distinction between array appending and simply calling some  
 other  function.

 Well, if

 	a ~= S();

 does result in a temporary which get copied and then destroyed, why have  
 move semantics at all? Move semantics are not just an optimization, they  
 actually change the semantics. If you have a struct with a  disabled  
 postblit, should it still be appendable?

Good question.  I don't even know how the runtime could avoid calling  
postblit, there is no flag saying the postblit is disabled in the typeinfo  
(that I know of).

But think about it this way, if you have a function foo:

foo(S)(ref S s, S[] arr)
{
    arr[0] = s;
}

Isn't this copy semantics?  This is exactly how the D runtime gets the  
data.  The only difference is, the runtime function is allowed to accept a  
temporary as a reference (not possible in a normal function).

Now, you could force move semantics, if you know the argument is an  
rvalue, but I don't know enough about what postblit is used for in order  
to say it's fine to use move semantics to move the struct into the heap.

The reason I say move semantics are an optimization is because:

{
   S tmp;
   arr ~= tmp;
}

is essentially equivalent to:

arr ~= S();

But the former is copy semantics, the latter can be considered move.  It  
seems like a smart compiler during optimization could rewrite the former  
as the latter, unless the semantics truly are different.  Which is why I'm  
trying to figure out how postblit can be used ;)

 If the issue of array assignment is fixed, do you think it's worth  
 putting  the change in, and then filing a bug against the GC?  I still  
 think the  current cases that "work" are fundamentally broken anyways.

 That depends. I'm not too sure currently whether the S destructor is  
 called for this code:

 	a ~= S();

It is, I tested it.  I ran this code:


struct Test
{
    this(this) { writeln("copy done"); }
    void opAssign(Test rhs) { writeln("assignment done"); }
    ~this() { writeln("destructor called"); }
}

void main()
{
    Test[] tests = new Test[1];
    {
       // Test test;
       // tests ~= test;
       tests ~= Test();
    }
    writeln("done");
}

and saw "destructor called" in the output, no matter which option was  
commented out.

 All in all, I don't think it's important enough to justify we waste  
 hours debating in what order we should fix those bugs. Do what you think  
 is right. If it becomes a problem or it introduces a bug here or there,  
 we'll adjust, at worse that means a revert of your commit.

OK, then I'll push the change.  I already filed a bug against _d_arraycopy.

 As for the issue that destructors aren't called for arrays on the  
 heap,  it's a serious problem. But it's also a separate problem that  
 concerns  purely the runtime, as far as I am aware of. Is there  
 someone working on  it?

  I think we need precise scanning to get a complete solution.  Another   
 option is to increase the information the array runtime stores in the   
 memory block (currently it only stores the "used" length) and then hook  
  the GC to call the dtors.  This might be a quick fix that doesn't  
 require  precise scanning, but it also fixes the most common case of  
 allocating a  single struct or an array of structs on the heap.

 The GC calling the destructor doesn't require precise scanning. Although  
 it's true that both problems require adding type information to memory  
 blocks, beyond that requirement they're both independent. It'd be really  
 nice if struct destructors were called correctly.

Yes, the more I think about it, the more this solution looks attractive.   
All that is required is to flag the block as having a finalizer, store the  
TypeInfo pointer somewhere, and the GC should call it.

I'll put in a bugzilla enhancement so it's not forgotten.

-Steve

Jun 21 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-06-21 07:34:24 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 On Mon, 20 Jun 2011 21:59:49 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 Well, if
 
 	a ~= S();
 
 does result in a temporary which get copied and then destroyed, why 
 have  move semantics at all? Move semantics are not just an 
 optimization, they  actually change the semantics. If you have a struct 
 with a  disabled  postblit, should it still be appendable?

 
 Good question.  I don't even know how the runtime could avoid calling  
 postblit, there is no flag saying the postblit is disabled in the 
 typeinfo  (that I know of).
 
 But think about it this way, if you have a function foo:
 
 foo(S)(ref S s, S[] arr)
 {
     arr[0] = s;
 }
 
 Isn't this copy semantics?  This is exactly how the D runtime gets the  
 data.  The only difference is, the runtime function is allowed to 
 accept a  temporary as a reference (not possible in a normal function).

... and in the special case where the reference is a rvalue, then it 
should have move semantics. See below.


 Now, you could force move semantics, if you know the argument is an  
 rvalue, but I don't know enough about what postblit is used for in 
 order  to say it's fine to use move semantics to move the struct into 
 the heap.
 
 The reason I say move semantics are an optimization is because:
 
 {
    S tmp;
    arr ~= tmp;
 }
 
 is essentially equivalent to:
 
 arr ~= S();
 
 But the former is copy semantics, the latter can be considered move.  
 It  seems like a smart compiler during optimization could rewrite the 
 former  as the latter, unless the semantics truly are different.  Which 
 is why I'm  trying to figure out how postblit can be used ;)

Actually, this should be the equivalent:

	import std.algorithm;

	S tmp;
	arr ~= move(tmp);

While there is no doubt that 'moving' a struct can often be used as an 
optimization without changing the semantics, if you want the  disabled 
attribute to be useful on the postblit constructor then the language 
needs to define when its semantics require 'moving' data and whey then 
require 'copying' data, it can't let that only to the choice of the 
optimizer.

Things might be clearer if we had a move operator, but instead we have 
a 'move' function. There is only one case where I think we can assume 
to have move semantics: when a temporary (a rvalue) is assigned to 
somewhere. That's also all that's needed for the 'move' function to 
work. And that is broken currently when it comes to array appending.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 21 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 21 Jun 2011 08:25:40 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 While there is no doubt that 'moving' a struct can often be used as an  
 optimization without changing the semantics, if you want the  disabled  
 attribute to be useful on the postblit constructor then the language  
 needs to define when its semantics require 'moving' data and whey then  
 require 'copying' data, it can't let that only to the choice of the  
 optimizer.

Another issue with appending a  disabled-postblit struct, what happens  
when you have to reallocate a block to get more space?  This cannot  
possibly be a move, because the compiler has no idea at the time of  
appending whether anything else has a reference to the original data.  So  
should it just be a runtime error?

I'm starting to think that  disabled postblit structs *shouldn't* be able  
to be appended.

-Steve

Jun 21 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-06-21 08:38:05 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 On Tue, 21 Jun 2011 08:25:40 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 While there is no doubt that 'moving' a struct can often be used as an  
 optimization without changing the semantics, if you want the  disabled  
 attribute to be useful on the postblit constructor then the language  
 needs to define when its semantics require 'moving' data and whey then  
 require 'copying' data, it can't let that only to the choice of the  
 optimizer.

 
 Another issue with appending a  disabled-postblit struct, what happens  
 when you have to reallocate a block to get more space?  This cannot  
 possibly be a move, because the compiler has no idea at the time of  
 appending whether anything else has a reference to the original data.  
 So  should it just be a runtime error?

That's indeed a problem.

 I'm starting to think that  disabled postblit structs *shouldn't* be 
 able  to be appended.

That would make sense. It should be a compile-time error.

It would also turn appending using move to an optimization, because all 
the types you can append will be guarantied to be copyable.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 21 2011

so <so so.so> writes:

On Tue, 21 Jun 2011 15:25:40 +0300, Michel Fortin  
<michel.fortin michelf.com> wrote:

 Actually, this should be the equivalent:

 	import std.algorithm;

 	S tmp;
 	arr ~= move(tmp);

 While there is no doubt that 'moving' a struct can often be used as an  
 optimization without changing the semantics, if you want the  disabled  
 attribute to be useful on the postblit constructor then the language  
 needs to define when its semantics require 'moving' data and whey then  
 require 'copying' data, it can't let that only to the choice of the  
 optimizer.

 Things might be clearer if we had a move operator, but instead we have a  
 'move' function. There is only one case where I think we can assume to  
 have move semantics: when a temporary (a rvalue) is assigned to  
 somewhere. That's also all that's needed for the 'move' function to  
 work. And that is broken currently when it comes to array appending.

It should be something else because move(tmp) in std.algorithm takes by  
reference and returns by value by actually moving it, because of the value  
semantics in D, that the ability to differentiate value from reference it  
doesn't need any other syntax because this is much better.

I think it is pretty neat, yet i still have some trouble understanding its  
effect here.

S tmp;
arr ~= move(tmp); // would make an unnecessary copy.

Move should do some kind of a magic there and treat its argument like a  
value, and return it.

Something like:

move(ref T a)
   return cast(T)a;

Maybe it makes no sense at all but i tried!

Jun 21 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-06-21 09:24:29 -0400, so <so so.so> said:

 On Tue, 21 Jun 2011 15:25:40 +0300, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 Actually, this should be the equivalent:
 
 	import std.algorithm;
 
 	S tmp;
 	arr ~= move(tmp);
 
 While there is no doubt that 'moving' a struct can often be used as an  
 optimization without changing the semantics, if you want the  disabled  
 attribute to be useful on the postblit constructor then the language  
 needs to define when its semantics require 'moving' data and whey then  
 require 'copying' data, it can't let that only to the choice of the  
 optimizer.
 
 Things might be clearer if we had a move operator, but instead we have 
 a  'move' function. There is only one case where I think we can assume 
 to  have move semantics: when a temporary (a rvalue) is assigned to  
 somewhere. That's also all that's needed for the 'move' function to  
 work. And that is broken currently when it comes to array appending.

 
 It should be something else because move(tmp) in std.algorithm takes by 
  reference and returns by value by actually moving it, because of the 
 value  semantics in D, that the ability to differentiate value from 
 reference it  doesn't need any other syntax because this is much better.
 
 I think it is pretty neat, yet i still have some trouble understanding 
 its  effect here.
 
 S tmp;
 arr ~= move(tmp); // would make an unnecessary copy.
 
 Move should do some kind of a magic there and treat its argument like a 
  value, and return it.

Actually, no copy is needed. Move takes the argument by ref so it can 
obliterates it. Obliteration consists of replacing its bytes with those 
in S.init. That way if you have a smart pointer, it gets returned 
without having to update the reference count (since the source's 
content has been destroyed). It was effectively be moved, not copied.

Note 1: Currently 'move' obliterates the source only if the type has a 
destructor or a postblit. I think it should always do it, but without 
inlining that might be a performance bottleneck.

Note 2: Making move efficient in the case of appending might require a 
total rework of how the compiler interacts with the runtime. And I 
don't think you can optimize away all blitting unless the move function 
was treated specially by the compiler (or became a special operator).

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 21 2011

so <so so.so> writes:

On Tue, 21 Jun 2011 18:18:26 +0300, Michel Fortin  
<michel.fortin michelf.com> wrote:

 Actually, no copy is needed. Move takes the argument by ref so it can  
 obliterates it. Obliteration consists of replacing its bytes with those  
 in S.init. That way if you have a smart pointer, it gets returned  
 without having to update the reference count (since the source's content  
 has been destroyed). It was effectively be moved, not copied.

T move(ref T a) {
   T b;
   move(a, b);
   return b;
}

T a;
whatever = move(a);

If T is a struct, i don't see how a copy is not needed looking at the  
current state of move.

Jun 21 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-06-21 12:13:32 -0400, so <so so.so> said:

 On Tue, 21 Jun 2011 18:18:26 +0300, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 Actually, no copy is needed. Move takes the argument by ref so it can  
 obliterates it. Obliteration consists of replacing its bytes with those 
  in S.init. That way if you have a smart pointer, it gets returned  
 without having to update the reference count (since the source's 
 content  has been destroyed). It was effectively be moved, not copied.

 
 T move(ref T a) {
    T b;
    move(a, b);
    return b;
 }
 
 T a;
 whatever = move(a);
 
 If T is a struct, i don't see how a copy is not needed looking at the  
 current state of move.

Actually, that depends on how you look at this.

The essence of a move operation is that you just copy the bits and then 
obliterate the old ones. So yes, there's indeed a copy to do, but 
there's no need to call a copy constructor or a destructor because no 
new instance has been created, it has just been moved. If you don't 
call the copy constructor (postblit) then it's a move operation, not a 
copy operation, even though there's still a bitwise copy inside the 
move operation.

In the return statement above, 'b' gets copied to 'whatever', then 
disappears along with the stack frame belonging to the function. So it 
becomes a move operation. (And it's even more direct than that with the 
named-value optimization.)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 21 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

(resending)

On 6/21/11 11:13 AM, so wrote:
 On Tue, 21 Jun 2011 18:18:26 +0300, Michel Fortin
 <michel.fortin michelf.com> wrote:

 Actually, no copy is needed. Move takes the argument by ref so it can
 obliterates it. Obliteration consists of replacing its bytes with
 those in S.init. That way if you have a smart pointer, it gets
 returned without having to update the reference count (since the
 source's content has been destroyed). It was effectively be moved, not
 copied.

 T move(ref T a) {
 T b;
 move(a, b);
 return b;
 }

 T a;
 whatever = move(a);

 If T is a struct, i don't see how a copy is not needed looking at the
 current state of move.

The rule that move and TDPL rely on but is not fully implemented is that 
returning a nonstatic local value never does a postblit nor a destructor 
- it just copies the bits.

Andrei

Jun 21 2011

Sean Kelly <sean invisibleduck.org> writes:

On Jun 21, 2011, at 11:26 AM, Andrei Alexandrescu wrote:
=20
 The rule that move and TDPL rely on but is not fully implemented is =

that returning a nonstatic local value never does a postblit nor a =
destructor - it just copies the bits.

So it's effectively illegal to have a struct containing a pointer that =
references itself, correct?=

Jun 21 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/21/11 4:24 PM, Sean Kelly wrote:
 On Jun 21, 2011, at 11:26 AM, Andrei Alexandrescu wrote:
 The rule that move and TDPL rely on but is not fully implemented is that
returning a nonstatic local value never does a postblit nor a destructor - it
just copies the bits.

 So it's effectively illegal to have a struct containing a pointer that
references itself, correct?

Illegal. All D structs must be transparently relocatable without 
breaking their invariant.

Andrei

Jun 21 2011

so <so so.so> writes:

On Tue, 21 Jun 2011 04:59:49 +0300, Michel Fortin  
<michel.fortin michelf.com> wrote:

 Well, if

 	a ~= S();

 does result in a temporary which get copied and then destroyed, why have  
 move semantics at all? Move semantics are not just an optimization, they  
 actually change the semantics.

There was a similar discussion on struct constructors which ended up  
something like this, that it is an optimization.
I fully agree it is not, move exists just the reasons like this.

Jun 21 2011

D Programming

C/C++ Programming

Other

digitalmars.D - what to do with postblit on the heap?