www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Feedback Thread: DIP 1040--Copying, Moving, and Forwarding--Community

reply Mike Parker <aldacron gmail.com> writes:
This is the feedback thread for the first round of Community 
Review of DIP 1040, "Copying, Moving, and Forwarding".

===================================
**THIS IS NOT A DISCUSSION THREAD**

Posts in this thread must adhere to the feedback thread rules 
outlined in the Reviewer Guidelines (and listed at the bottom of 
this post).

https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

That document also provides guidelines on contributing feedback 
to a DIP review. Please read it before posting here. If you would 
like to discuss this DIP, please do so in the discussion thread:

https://forum.dlang.org/post/ncoqnixvllbjsxdzbyxi forum.dlang.org
==================================

You can find DIP 1040 here:

https://github.com/dlang/DIPs/blob/a9c553b0dbab1c2983a801b5e89b51c5c33d5180/DIPs/DIP1040.md

The review period will end at 11:59 PM ET on March 19, or when I 
make a post declaring it complete. Feedback posted to this thread 
after that point may be ignored.

At the end of this review round, the DIP will be moved into the 
Post-Community Round 1 state. Significant revisions resulting 
from this review round may cause the DIP manager to require 
another round of Community Review, otherwise the DIP will be 
queued for the Final Review.

==================================
Posts in this thread that do not adhere to the following rules 
will be deleted at the DIP author's discretion:

* All posts must be a direct reply to the DIP manager's initial 
post, with only two exceptions:

     - Any commenter may reply to their own posts to retract 
feedback contained in the original post

     - The DIP author may (and is encouraged to) reply to any 
feedback solely to acknowledge the feedback with agreement or 
disagreement (preferably with supporting reasons in the latter 
case)

* Feedback must be actionable, i.e., there must be some action 
the DIP author can choose to take in response to the feedback, 
such as changing details, adding new information, or even 
retracting the proposal.

* Feedback related to the merits of the proposal rather than to 
the contents of the DIP (e.g., "I'm against this DIP.") is 
allowed in Community Review (not Final Review), but must be 
backed by supporting arguments (e.g., "I'm against this DIP 
because..."). The supporting arguments must be reasonable. 
Obviously frivolous arguments waste everyone's time.

* Feedback should be clear and concise, preferably listed as 
bullet points (those who take the time to do an in-depth review 
and provide feedback in the form of answers to the questions in 
the documentation linked above will receive much gratitude). 
Information irrelevant to the DIP or which is not provided in 
service of clarifying the feedback is unwelcome.
Mar 05
next sibling parent Ben Jones <fake fake.fake> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".
Minor bug: The functions `gun` and `sun` in the last use section look like they're missing an S parameter.
Mar 05
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".
In the section on "Last Use":
 Module level functions that contain nested functions or lambda 
 functions that access an outer local variable will not be 
 subject to last use dataflow analysis. The reason is that a 
 pointer to the nested function or lambda could be escaped from 
 the containing function
This rule has the potential to cause "spooky action at a distance," where a change to one part of a function silently causes the meaning of seemingly-unrelated code in the same function to change. For example: struct EMO { /* ... */ } version (Before) void example() { int a; EMO b; fun(a); gun(b); // last use of b -> move } version (After) void example() { int a; EMO b; void helper() { fun(a); } // DFA disabled here helper(); gun(b); // no DFA -> copy } Ideally, only variables that are actually accessed from nested functions would have their last-use analysis disabled. In the example above, that would include `a`, which is accessed in `helper`, but not `b`.
Mar 05
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/5/2021 11:15 AM, Paul Backus wrote:
 On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community Review of DIP 
 1040, "Copying, Moving, and Forwarding".
In the section on "Last Use":
 Module level functions that contain nested functions or lambda functions that 
 access an outer local variable will not be subject to last use dataflow 
 analysis. The reason is that a pointer to the nested function or lambda could 
 be escaped from the containing function
This rule has the potential to cause "spooky action at a distance," where a change to one part of a function silently causes the meaning of seemingly-unrelated code in the same function to change. For example:     struct EMO { /* ... */ }     version (Before)     void example()     {         int a;         EMO b;         fun(a);         gun(b); // last use of b -> move     }     version (After)     void example()     {         int a;         EMO b;         void helper() { fun(a); } // DFA disabled here         helper();         gun(b); // no DFA -> copy     } Ideally, only variables that are actually accessed from nested functions would have their last-use analysis disabled. In the example above, that would include `a`, which is accessed in `helper`, but not `b`.
It is possible to do the DFA through nested functions. I plan on doing that with live functions. What the DIP should make clear, however, is that changing copies to moves should be regarded as implementation dependent, i.e. it is a mistake if the user attempts to rely on a particular sequence of moves and copies. Much like relying on this for NRVO is a mistake. NRVO has always been an optional optimization, and if it happens or not is difficult to determine just by looking at the code. In practice, the only problem that has cropped up has been pressure to find more cases where NRVO can be applied, so I think we're good with labeling the copy=>move semantics as an implementation-dependent optimization.
Mar 08
parent deadalnix <deadalnix gmail.com> writes:
On Monday, 8 March 2021 at 10:27:07 UTC, Walter Bright wrote:
 What the DIP should make clear, however, is that changing 
 copies to moves should be regarded as implementation dependent, 
 i.e. it is a mistake if the user attempts to rely on a 
 particular sequence of moves and copies. Much like relying on 
 this for NRVO is a mistake. NRVO has always been an optional 
 optimization, and if it happens or not is difficult to 
 determine just by looking at the code. In practice, the only 
 problem that has cropped up has been pressure to find more 
 cases where NRVO can be applied, so I think we're good with 
 labeling the copy=>move semantics as an 
 implementation-dependent optimization.
That seems like the best approach. It allows for thing like fixed point analysis to bail in case it needs too many iterations. In this case, the list of cases presented should be presented as examples rather than spec.
Mar 08
prev sibling next sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".
There should be at least one example demonstrating WHY a non-default move constructor might be useful, not just a syntactical description of HOW to declare one. How about including an example struct with an interior pointer that needs to be updated whenever it is moved or copied?
Mar 05
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".
First, very good proposal (in fact, I sent something very similar to Andrei, Walter and Atila in 2019, and I hope this had some influence over this proposal). For completeness, I will paste this email at the end of this reply. It cover some subtle but important point, notably: - The argument is invalid after this move, and is not destructed. - Moving an EMO means that the responsibility for destruction of that value moves along with it. This is something I stressed out at the time, because it has profound consequences at the ABI level. It is unfortunate that this design goal - while achieved - has dropped from the rationale. To stress out how important it is, let me contrast with C++. In C++, the caller remains in charge of the destruction of an object. This means that structs that are not trivially destructible, movable or copyable need to be passed by reference. This introduce many indirections in the generated code, but, worse, adds a ton of pressure on the alias analysis, which will prevent the optimizer to do its job properly. The second downside is that structs always need to have a "null" state. They need to remain destructible, even after the callee move the object. This forces the callee to do extra work in order to set the moved object into a null state and extra work from the caller to check for the null state during destruction. A concrete example of this is for instance that a lock which locks a mutex in its constructor and unlocks it in its destructor now must check for the mutex being nulled in its destruction, introducing an unecessary branch in all path that unlock the mutex. These are serious downside in the C++ way, and this proposal should make it explicit in its rationale that both problems must be avoided: - Ability to pass non POD as value even at the ABI level. - Ability to have movable structs without a "null" state. Another aspect of the proposal that needs improvement is the "Last Use" section. the description of this last use is incomplete - and maybe doesn't adopt the right approach, because listing all scenarii is always going to be a challenge. One such scenario is the case of divergent branches: S s; if (...) { // Even though it is fun(s); } else { // ... } The way to cover all base is to go for a fixed point analysis. A fixed point analysis approach would go ass follow: define a state for all local variable, then follow the control flow updating the state of said variable. When a merge happen in the control flow, verify that both both path through which you arrive at the merge point have a different state, then apply a fix and reprocess affected branches of the control flow, up to the point all merge point are consistent - you reached a fixed point. Adopting such an approach will ensure the behavior is not only defined for current control flow constructs, but also whatever might come up in the future. In the pasted email, I go over in more detail about what a fixed point analysis could look like for this. While it is likely not formal enough to make it into the DIP as this, it makes for a good starting point, IMO. ====== Hi both of you, This is a topic I discussed with Atila and Andrei informally, and now I'd to loop you in Walter. If this gains traction, it'll be time for a DIP, but I'd really like to gather some feedback before, as putting everything in a format that is formal enough is seriously time consuming. As I assume Walter already knows, C++ copy/destruction semantic is actually quite an important barrier to optimization. For the sake of making the point, I'm going to use unique_ptr and shared_ptr as example, and what we imagine equivalent in D would look like. We will also consider that the compiler inline ctor/dtor and copies. Let's consider this sample code: void foo(unique<T> ptr): unique<T> t = ...; foo(move(t)); The first thing we consider s that unique<T>, as well as shared<T> are not PODs. This is rather obvious, as this is the whole point of having them, if we wanted a POD, we'd use a raw pointer. In C++, non POD must have an address. in D, they must have an address for the ctor to run, but after this, the compiler is free to do whatever it wants as D struct are movable by default. An interesting side effect of that fact is that non POD are ALWAYS passed by reference in C++, never by value. The compiler creates a temporary in the caller in which the copy is stored. The code is lowered to something that look like this: foo(T** ptrref); T* t = ...; T* tmp = nullptr; swap(t, tmp); foo(&tmp); if (tmp != null) destroy(tmp); if (t != null) destroy(t); The compiler will have no problem figuring out that t is always null, and therefore optimize away its destruction, but the tmp one remains. A second order problem is that EVERY struct must have some sort of 'null' state. While this is fairly obvious what it is for a pointer (!) it is the source of many trap when it comes to correctness for others types of structs. Imagine a lock, for instance, it now has to check every time if the mutex it refers to has been nulled or not, introducing a branch every time you unlock the mutex. Now immagine that foo does something like: void foo(unique<T> ptr) { global = move(ptr); } Now foo end up having to go through an indirection to get the value of ptr, which right there is 3 cycles at best, a cache miss at worse. But worse, it MUST store null into ptr that the caller will then dutifully check is null and decide to not destroy anything. t will do so in 100% of the case, yet the compiler is unable to do anything to optimize it away. In addition, store to load forwarding has some bizarre perf edge cases on many CPUs (that is, loading an address that is still in the store buffer and hasn't hit cache yet) so it is something you'd like to see optimized. If we replace the unique<T> by a shared<T> in the example, it gets worse because the increment/decrement operation cannot be optimized away at all. This result in code where you got to sprinkle calls to move all over the place constantly and it's a guarantee that you'll forget to put it sometime and end up with a ton of unexpected copies. Here is what I propose. 1/ The ABI for non POD is the exact same as for POD. it means that the responsibility of the destruction falls on the the foo method in our example, not on the caller. This means it can be passed in registers in the case of unique ptr. The obvious gotcha is that the adress of the this pointer will be different in foo that it was when the copy constructor was called. D provides no guarantee on that front anyways, so I think we are good on that front. Another side effect is that we lose the "russian doll" effect of construction destruction when things are moved to another function. However, by specifying that the copy need to be made in LTR order for parameters, and destruction from the last parameter to the first, we end up with this remaining true almost all the time, which is probably the right tradeof. In our example, it means that foo can receive the unique<T> in a register, then the compiler can see that the value when we exit the function is always null and can optimize away the destruction. 2/ Decide when to copy using the following algorithm: We go over the function body, keeping track of a status per lvalue that exist: destroyed, alive or consumed. All parameters, globals, or value reached through indirections are considered initially alive. All locals are considered destroyed initially, then alive once the constructor as run/initial value is set. When an lvalue is converted into an rvalue, for instance when returned, when passed as argument, when assigned to another lvalue, etc... the following happens: - if it is alive, it is marked consumed. - if it is consumed, then we backtrack to the point at which it was consumed and insert a copy operation. The lvalue is reconsidered alive after the copy, the the algorithm restart from there. - if it is destroyed, then this is an error (this should basically never happen and is likely a compiler bug). When an lvalue needs to be destroyed, the compiler does as follow: - if it is alive, call the desructor. - if it is consumed, do nothing. - if it is already destroyed, then once again, you probably have a compiler bug. This is relatively straightforward, but there is one more tricky situation, is when you reach a "merge" point i the control flow, for instance, after an if/then/else construct. In this case, a variable can have different state for each codepath. The resolution happens as follow: - alive/alive : all good. - alive/destroyed: error. Some codepath did not initialize the variable or you have compiler bug. - alive/consume: backtrack the consume path and insert copies. The lvalue marked alive. - consume/destroyed: Mark destroyed. When an lvalue is assigned to, it's previous value is move to a temporary and then destroyed immediately after the assignment finishes. The temporary also keeps the state so destruction only occurs when the variable was alive. All rvalues must be immediately returned, assigned, passed as argument or destroyed by calling the constructor. Existing optimizations such as RVO naturaly arise out of this process, but this is much more generic and powerful, as it generate the minimal number of copies possible assuming structs are movable, which is what we want. In case of loops, the state tracked by the algorithm at the end of the loop can affect the merge at the top of the loop, so you may have to pass over the loop twice. Same wise, backtracking may cause you to have to go over some code twice, but if you play with the idea you'll figure out that it converge pretty damn quickly and that it is not easy to come up with example requiring many passes, and it ALWAYS converges, if for no other reason that it inserted copy at each point a C++ compiler would. In practice, this means that shared<T> t = ...; foo(t); Would simple move t's ownership to foo and never generate a copy. However, if later one does shared<T> t = ...; foo(t); bar(t); Then the call to foo will be patched by the compiler to insert a copy. In both cases, t's destructor will never be called because it is consumed by the end of the control flow, so destruction is a noop. This is really where we need to go IMO. We avoid so many copy by basically moving by default that and solution such as a RC solution will be significantly better than C++'s, and on par with stuff like ARC. In addition, the same algorithm can be used to ensure that all fields are initialized for constructors. The start in the destroyed state and must all be in the alive state by the end of the constructor's execution.
Mar 06
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/6/2021 12:28 PM, deadalnix wrote:
 First, very good proposal (in fact, I sent something very similar to Andrei, 
 Walter and Atila in 2019, and I hope this had some influence over this 
 proposal). For completeness, I will paste this email at the end of this reply.
 
 It cover some subtle but important point, notably:
   - The argument is invalid after this move, and is not destructed.
   - Moving an EMO means that the responsibility for destruction of that value 
 moves along with it.
 
 This is something I stressed out at the time, because it has profound 
 consequences at the ABI level. It is unfortunate that this design goal - while 
 achieved - has dropped from the rationale. To stress out how important it is, 
 let me contrast with C++.
 
 In C++, the caller remains in charge of the destruction of an object. This
means 
 that structs that are not trivially destructible, movable or copyable need to
be 
 passed by reference. This introduce many indirections in the generated code, 
 but, worse, adds a ton of pressure on the alias analysis, which will prevent
the 
 optimizer to do its job properly.
 The second downside is that structs always need to have a "null" state. They 
 need to remain destructible, even after the callee move the object. This
forces 
 the callee to do extra work in order to set the moved object into a null state 
 and extra work from the caller to check for the null state during destruction.
A 
 concrete example of this is for instance that a lock which locks a mutex in
its 
 constructor and unlocks it in its destructor now must check for the mutex
being 
 nulled in its destruction, introducing an unecessary branch in all path that 
 unlock the mutex.
 
 These are serious downside in the C++ way, and this proposal should make it 
 explicit in its rationale that both problems must be avoided:
   - Ability to pass non POD as value even at the ABI level.
   - Ability to have movable structs without a "null" state.
Ok, good idea, although to be clear this is implicit in the Description.
 Another aspect of the proposal that needs improvement is the "Last Use"
section. 
 the description of this last use is incomplete - and maybe doesn't adopt the 
 right approach, because listing all scenarii is always going to be a
challenge. 
 One such scenario is the case of divergent branches:
 
 S s;
 if (...) {
      // Even though it is
      fun(s);
 } else {
      // ...
 }
 
 The way to cover all base is to go for a fixed point analysis. A fixed point 
 analysis approach would go ass follow: define a state for all local variable, 
 then follow the control flow updating the state of said variable. When a merge 
 happen in the control flow, verify that both both path through which you
arrive 
 at the merge point have a different state, then apply a fix and reprocess 
 affected branches of the control flow, up to the point all merge point are 
 consistent - you reached a fixed point.
That's exactly how the live analysis is currently implemented. I didn't know it was called a fixed point analysis. The downside of this is the data flow analysis in live is a bit expensive. Perhaps the DIP should allow for the DFA being optional, and assume worst case (i.e. never last use) for an initial implementation.
 Adopting such an approach will ensure the behavior is not only defined for 
 current control flow constructs, but also whatever might come up in the
future. 
 In the pasted email, I go over in more detail about what a fixed point
analysis 
 could look like for this. While it is likely not formal enough to make it into 
 the DIP as this, it makes for a good starting point, IMO.
The idea with DFA is to not think in terms of if, while, foreach, etc. But to think instead of flow as nodes connected by edges. In other words, replace all the structured code with goto's. Anything that can be lowered to gotos will then work naturally. Andrei and I have discussed "last use" optimizations for over a decade :-) but I always balked because it needed DFA. Thanks for reposting the email. It's full of good information.
Mar 08
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08.03.21 11:21, Walter Bright wrote:
 The way to cover all base is to go for a fixed point analysis. A fixed 
 point analysis approach would go ass follow: define a state for all 
 local variable, then follow the control flow updating the state of 
 said variable. When a merge happen in the control flow, verify that 
 both both path through which you arrive at the merge point have a 
 different state, then apply a fix and reprocess affected branches of 
 the control flow, up to the point all merge point are consistent - you 
 reached a fixed point.
That's exactly how the live analysis is currently implemented. I didn't know it was called a fixed point analysis.
The general framework is also called abstract interpretation, where the merge operation is called a join of abstract elements. There's also a meet operation that can introduce information from conditionals into the state. (E.g. if you want to do value range propagation this can be useful.) In general, abstract interpretation does not converge to a fixed point in finite time, then you need to introduce some sort of widening operator. In practice, this is often the main challenge to make it work.
Mar 08
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 8 March 2021 at 11:28:53 UTC, Timon Gehr wrote:
 On 08.03.21 11:21, Walter Bright wrote:
 [...]
That's exactly how the live analysis is currently implemented. I didn't know it was called a fixed point analysis.
The general framework is also called abstract interpretation, where the merge operation is called a join of abstract elements. There's also a meet operation that can introduce information from conditionals into the state. (E.g. if you want to do value range propagation this can be useful.) In general, abstract interpretation does not converge to a fixed point in finite time, then you need to introduce some sort of widening operator. In practice, this is often the main challenge to make it work.
Just curious, is there any connection here to symbolic execution?
Mar 08
next sibling parent Tobias Pankrath <tobias+dlang pankrath.net> writes:
On Monday, 8 March 2021 at 14:01:33 UTC, Imperatorn wrote:
 Just curious, is there any connection here to symbolic 
 execution?
When I look at the wikipedia article about symbolic execution, it looks a lot like symbolic model checking, and Patrick Cousot has somewhere shown that model checking can be seen as a subform of abstract interpretation.
Mar 08
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08.03.21 15:01, Imperatorn wrote:
 On Monday, 8 March 2021 at 11:28:53 UTC, Timon Gehr wrote:
 On 08.03.21 11:21, Walter Bright wrote:
 [...]
That's exactly how the live analysis is currently implemented. I didn't know it was called a fixed point analysis.
The general framework is also called abstract interpretation, where the merge operation is called a join of abstract elements. There's also a meet operation that can introduce information from conditionals into the state. (E.g. if you want to do value range propagation this can be useful.) In general, abstract interpretation does not converge to a fixed point in finite time, then you need to introduce some sort of widening operator. In practice, this is often the main challenge to make it work.
Just curious, is there any connection here to symbolic execution?
Most program analysis is abstract interpretation in some way as it is, well, abstract. You can consider symbolic execution as an abstract domain consisting of symbolic formulas such that the concretization function γ maps each symbolic formula to the set of program states that satisfy it. Possible choices for meet and join are symbolic conjunction and disjunction. fully model some aspects of the program semantics you will get the Widening for this domain is invariant synthesis and if you do it (automated or manually), you get something that's getting close to what's usually called a program verifier instead of a static analyzer.
Mar 08
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 8 March 2021 at 17:42:16 UTC, Timon Gehr wrote:
 On 08.03.21 15:01, Imperatorn wrote:
 On Monday, 8 March 2021 at 11:28:53 UTC, Timon Gehr wrote:
 [...]
Just curious, is there any connection here to symbolic execution?
Most program analysis is abstract interpretation in some way as it is, well, abstract. You can consider symbolic execution as an abstract domain consisting of symbolic formulas such that the concretization function γ maps each symbolic formula to the set of program states that satisfy it. Possible choices for meet and join are symbolic conjunction and disjunction. execution does not fully model some aspects of the program semantics you will get the ordinary soundness condition f[γ(φ)] Widening for this domain is invariant synthesis and if you do it (automated or manually), you get something that's getting close to what's usually called a program verifier instead of a static analyzer.
Interesting. That sounds quite powerful. But what about undecidability? Wait... The abstract semantics is a super set here right? Hmm, so that's why that works. This is actually kinda cool. Do you have any good books on the subject to recommend?
Mar 08
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 8 March 2021 at 10:21:37 UTC, Walter Bright wrote:
 Ok, good idea, although to be clear this is implicit in the 
 Description.
I assumed so, but IMO this is very much adding to the rationale/requirement part of the DIP, because this will be forgotten or overlooked by most readers. When it is in there then it is easy to point further discussions that break this assumption to it. Without it, this has to be explained again and again.
Mar 08
parent Mike Parker <aldacron gmail.com> writes:
On Monday, 8 March 2021 at 21:50:19 UTC, deadalnix wrote:
 On Monday, 8 March 2021 at 10:21:37 UTC, Walter Bright wrote:
 Ok, good idea, although to be clear this is implicit in the 
 Description.
I would like to remind everyone of the Feedback Thread rules. All posts must be a reply to my original post and must provide actionable feedback. DIP authors can reply to feedback, but there should be no replies to their posts unless they ask for clarity. Discussions belong in the discussion thread. I’ll let all the posts above this one stand, but I will delete any further such posts.
Mar 08
prev sibling next sibling parent Atila Neves <atila.neves gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".

 ===================================
 **THIS IS NOT A DISCUSSION THREAD**

 Posts in this thread must adhere to the feedback thread rules 
 outlined in the Reviewer Guidelines (and listed at the bottom 
 of this post).

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 That document also provides guidelines on contributing feedback 
 to a DIP review. Please read it before posting here. If you 
 would like to discuss this DIP, please do so in the discussion 
 thread:

 https://forum.dlang.org/post/ncoqnixvllbjsxdzbyxi forum.dlang.org
 ==================================

 You can find DIP 1040 here:

 https://github.com/dlang/DIPs/blob/a9c553b0dbab1c2983a801b5e89b51c5c33d5180/DIPs/DIP1040.md

 The review period will end at 11:59 PM ET on March 19, or when 
 I make a post declaring it complete. Feedback posted to this 
 thread after that point may be ignored.

 At the end of this review round, the DIP will be moved into the 
 Post-Community Round 1 state. Significant revisions resulting 
 from this review round may cause the DIP manager to require 
 another round of Community Review, otherwise the DIP will be 
 queued for the Final Review.

 ==================================
 Posts in this thread that do not adhere to the following rules 
 will be deleted at the DIP author's discretion:

 * All posts must be a direct reply to the DIP manager's initial 
 post, with only two exceptions:

     - Any commenter may reply to their own posts to retract 
 feedback contained in the original post

     - The DIP author may (and is encouraged to) reply to any 
 feedback solely to acknowledge the feedback with agreement or 
 disagreement (preferably with supporting reasons in the latter 
 case)

 * Feedback must be actionable, i.e., there must be some action 
 the DIP author can choose to take in response to the feedback, 
 such as changing details, adding new information, or even 
 retracting the proposal.

 * Feedback related to the merits of the proposal rather than to 
 the contents of the DIP (e.g., "I'm against this DIP.") is 
 allowed in Community Review (not Final Review), but must be 
 backed by supporting arguments (e.g., "I'm against this DIP 
 because..."). The supporting arguments must be reasonable. 
 Obviously frivolous arguments waste everyone's time.

 * Feedback should be clear and concise, preferably listed as 
 bullet points (those who take the time to do an in-depth review 
 and provide feedback in the form of answers to the questions in 
 the documentation linked above will receive much gratitude). 
 Information irrelevant to the DIP or which is not provided in 
 service of clarifying the feedback is unwelcome.
Awesome! Now for the feedback. I think the DIP would be helped by links to definitions that come after their first usage, or being restructured so that such links are no longer necessary. There are multiple cases, but the first example I ran into was "The argument is invalid after this move, and is not destructed.", and I wanted to write "what does 'invalid' mean?". As it turns out it's defined later on in the document.
 Introduces the notion of a Moveable Reference
But then the DIP refers to these as "move refs" afterwards.
 where s is constructed into the parameter.
This confused me; it also seems to me that deleting it doesn't alter what the section is trying to say.
 A Move Constructor that throws is illegal.
I assume this means it'll fail to compile?
 The Move Assignment Operator is nothrow, even if nothrow is not 
 explicitly specified.
And also illegal if it happens to throw?
 An EMO is a struct that has both a Move Constructor and a Move 
 Assignment Operator
Including a compiler-generated version of either function?
 A Move Ref is a parameter that is a reference to an EMO
I eventually understood what this meant, but this confused me when I read it the first time. I'd reword it to mention that the syntax looks like a by-value parameter but ends up being passed by reference. It also confused me that the 2nd function had `ref` in there.
  If NRVO cannot be performed, s is copied to the return value 
 on the caller's stack.
Why is it copied instead of moved?
 When the call is made to f(), a copy is made.
I guess this only applies to f(S())?
 While it appears that C++ best practice is to use rvalue 
 references instead of values for parameters
Not always, sometimes the best practice can be to pass by value then move if a copy has to be made anyway: https://stackoverflow.com/questions/16724657/why-do-we-copy-then-move
Mar 10
prev sibling next sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".
From the DIP:
 A Move Assignment Operator is a struct member assignment 
 operator that moves, rather than copies, the argument 
 corresponding to its first parameter into the constructed 
 object. After the move is complete, the destructor is called on 
 the original contents of the constructed object. The argument 
 is invalid after this move, and is not destructed.
This paragraph seems garbled to me. - What is "the constructed object"? It's ambiguous: presumably both `this` and the source argument to the move `opAssign` were initialized at some time in the past, otherwise why call the move `opAssign` instead of the move constructor? Generally, the phrase "the constructed object" is terribly confusing and for clarity should be replaced throughout the DIP with less ambiguous phrasing, such as "the source of the move" or "the destination of the move". - "After the move is complete, the destructor is called on the original contents of the constructed object." This makes no sense: the original contents of the destination of the move cannot be destroyed *after* the move, as it has already been overwritten by the move, and is gone. It would have to either be destroyed *before* the move, or copied to a temporary first, which seems like a pointless waste. - If the destination is to be destroyed before the move, then there seems to be no need for the separate move `opAssign` at all; the compiler should instead just call the destructor on the destination, if it has already been initialized, before calling the move constructor. - If there is some good reason for the distinct move `opAssign` that I have missed, it should be explained clearly in the DIP.
Mar 10
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review of DIP 1040, "Copying, Moving, and Forwarding".
It is labourous for the reader to get the difference between move and copy constructors, which is that copy constructors have a `ref` parameter. It'd be easier if this was explicitly mentioned. I don't like the term "move assignment operator". The operator is the same assignment operator we already have, no need to rename it. Just talk about how identity assignments are handled when the argument can be moved. You can call that a move assignment, but leave the trailing "operator" out, since there's no new operator. There's no discussion how a move constructor will interact with a possible `opPostMove` in a struct or one of it's fields (DIP1014). As for all the stuff about EMOs, I don't like that at all. We already allow the compiler to move a struct instead of copying it, if it can detect a last use. And also we have the possibility to manually elide copying via `std.algorithm.move`. If I recall TDPL book correctly, returning a value or passing a Rvalue to a function is already guaranteed to move. And those rules are for all structs, not just for those that have some fancy operator overloads. I also don't like all the rules about detecting last uses. Those are optimizations, not something the language spec should complicate itself for. There is a breaking change that hasn't been mentioned. An identity assignment with no `ref` for the argument, or an `auto ref` argument, is going to become a move assignment. Thus it will break if it throws something. All in all, I like the idea of move constructor - it is the same improvement over `opPostMove` that copy constructors are over postblits. But I cannot say the same about rest of the DIP.
Mar 16
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/16/2021 10:19 AM, Dukc wrote:
 It is labourous for the reader to get the difference between move and copy 
 constructors, which is that copy constructors have a `ref` parameter. It'd be 
 easier if this was explicitly mentioned.
There'll be some more text added to deal with Rvalue Constructors, that should take care of this.
 I don't like the term "move assignment operator". The operator is the same 
 assignment operator we already have, no need to rename it. Just talk about how 
 identity assignments are handled when the argument can be moved. You can call 
 that a move assignment, but leave the trailing "operator" out, since there's
no 
 new operator.
It's good to distinguish between a move and a copy.
 There's no discussion how a move constructor will interact with a possible 
 `opPostMove` in a struct or one of it's fields (DIP1014).
https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1014.md Although #DIP1014 was approved, this DIP replaces it. That should be covered in the text.
 As for all the stuff about EMOs, I don't like that at all. We already allow
the 
 compiler to move a struct instead of copying it, if it can detect a last use. 
 And also we have the possibility to manually elide copying via 
 `std.algorithm.move`. If I recall TDPL book correctly, returning a value or 
 passing a Rvalue to a function is already guaranteed to move. And those rules 
 are for all structs, not just for those that have some fancy operator
overloads.
 
 I also don't like all the rules about detecting last uses. Those are 
 optimizations, not something the language spec should complicate itself for.
 
 There is a breaking change that hasn't been mentioned. An identity assignment 
 with no `ref` for the argument, or an `auto ref` argument, is going to become
a 
 move assignment. Thus it will break if it throws something.
Only if the move assignment is defined, so it shouldn't be breaking.
 All in all, I like the idea of move constructor - it is the same improvement 
 over `opPostMove` that copy constructors are over postblits. But I cannot say 
 the same about rest of the DIP.
Mar 20
prev sibling next sibling parent reply Elronnd <elronnd elronnd.net> writes:
 If a Move Constructor is not defined for a struct that has a 
 Move Assignment Operator, a default Move Constructor is defined 
 and implemented as a move for each of its fields, in lexical 
 order.
I think the order should be undefined. Forcing the moves to happen in lexical order may inhibit some optimizations; and code which relies on the order of implicit moves is fragile anyway and shouldn't be encouraged. If the order is important for legitimate reasons, then those reasons should be made explicit in the code with an explicit move constructor.
Mar 17
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/17/2021 11:14 PM, Elronnd wrote:
 If a Move Constructor is not defined for a struct that has a Move Assignment 
 Operator, a default Move Constructor is defined and implemented as a move for 
 each of its fields, in lexical order.
I think the order should be undefined.  Forcing the moves to happen in lexical order may inhibit some optimizations; and code which relies on the order of implicit moves is fragile anyway and shouldn't be encouraged.  If the order is important for legitimate reasons, then those reasons should be made explicit in the code with an explicit move constructor.
The trouble with that is the user may inadvertently rely on the implementation-defined order, and his code will break when using another implementation. The optimization advantage, if any, would be slight and not worth the risk.
Mar 20
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 [...]
Looking at how the discussion is going, this seems to me like this DIP should be split into two DIPs because it's really doing two things. One of these things seems to generally be agreed upon, while the other is quite intensively discussed. 1/ The transformation of copy/destruct couples into move if the object isn't used in between the copy and the destruction. This seems to be fairly uncontroversial and the proposal is sound. 2/ The addition of new move constructor/assignment operators. There seems to be a lot of discussion around these, and I am myself not convinced that the design is sound. Splitting this DIP in two would allow to move forward with 1/ relatively quickly and provide value to D user right away, while providing time and space for 2/ to ironed out.
Mar 18
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/18/2021 4:22 PM, deadalnix wrote:
 On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 [...]
Looking at how the discussion is going, this seems to me like this DIP should be split into two DIPs because it's really doing two things. One of these things seems to generally be agreed upon, while the other is quite intensively discussed. 1/ The transformation of copy/destruct couples into move if the object isn't used in between the copy and the destruction. This seems to be fairly uncontroversial and the proposal is sound. 2/ The addition of new move constructor/assignment operators. There seems to be a lot of discussion around these, and I am myself not convinced that the design is sound. Splitting this DIP in two would allow to move forward with 1/ relatively quickly and provide value to D user right away, while providing time and space for 2/ to ironed out.
While this is an attractive idea, (1) without some help from the programmer via adding the Move functions seems likely to cause problems.
Mar 20
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
The Move Constructor is defined as:

     this(S s) { ... } // move constructor

It's been pointed out (not sure which post) that this conflicts with an Rvalue 
Constructor, which has the same syntax:

     this(S s) { ... } // rvalue constructor

Note that an Rvalue Constructor is *not* considered to be a Copy Constructor:

     this(ref S s) { ... } // copy constructor

So the syntax is already in use. Need to resolve the conflict.

---------

The Rvalue Constructor and Copy Constructor are currently not allowed to
coexist:

     this(S);
     this(ref S); // error, can't have both an rvalue and copy constructor

which suggests a straightforward resolution: it's an Rvalue Constructor if 
there's no Copy Constructor, and a Move Constructor if there is.

While this sounds confusing, I suspect in practice the MC is simply a more 
advanced RC.
Mar 20
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
 
 The review period will end at 11:59 PM ET on March 19, or when 
 I make a post declaring it complete. Feedback posted to this 
 thread after that point may be ignored.
This review round is complete. Thanks to everyone who participated. The DIP authors may reply to unanswered feedback, but I ask that others please refrain from leaving any more posts in this thread.
Mar 20
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On Fri, Mar 5, 2021 at 10:25 PM Mike Parker via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 This is the feedback thread for the first round of Community
 Review of DIP 1040, "Copying, Moving, and Forwarding".
I'm really late to the party here, and there's a lot to catch up, so maybe I'm repeating discussion that already happened... but I think there's a few points that need consideration. First, I like this DIP, I think this is good for D. That said, I also care deeply about C++ compatibility, and this DIP introduces some serious issues. 1. This changes the extern(C++) ABI, and it's a major breaking change, yet it's not mentioned in the breaking changes section; every by-val constructor call will have an ABI change, and C++ compatibility will completely explode. 2. By-val declarations matching f(T&&) is not a great default. It's relatively uncommon in C++ for these arguments to exist unless it makes specific sense, and many cases where it is present C++ uses 'universal references', that infer rval-ness. 3. ...and even when they do exist, this DIP has different ABI semantics! The C++ `T&&` function still expects the caller to destruct the argument, but in D, the caller will never destruct a moved argument. You talk about this, saying that it's a requirement to pay special attention to the semantic difference, but I don't think that's reasonable. You're going to have endless reports of issues from people that don't read or don't *understand* the difference, and complain their program is crashing. 4. Is it that C++'s criteria for an EMO matches your definition, or will there emerge cases where a differing EMO assessment will produce a mismatch in calling convention? I think there are more problems I can imagine if I dive into the weeds, but at least these need to be addressed in the DIP. My off-the-cuff thoughts are: 1 & 2. Rather than applying EMO semantics to extern(C++) functions by default, and using value to nominate the default behaviour; since you've accepted an attribute just for C++ disambiguation, why not assume by-val semantics as default, and require an rval-ref attribute instead to select the C++ move ABI? 3. Since your EMO semantics don't actually apply to C++, that is, destruction responsibility does NOT carry into the call, I don't think EMO semantics as described actually apply to C++ at all, and shouldn't be attempted. We should use a separate solution for C++ compat (like the attribute I mention above) and EMO semantics have no impact on C++. So, in the presence of extern(C++), f(T arg) behaves 'traditionally' (as you describe when applying ` value`), and maybe something like f(T rvalref arg) introduces the C++ rules (together with some attention to overloading). EMO semantics as described have no applicability to C++. Your C++ compatibility section needs a complete do-over as far as I'm concerned. They don't improve the situation for C++ compatibility, they make it worse. Perhaps it should be tackled as a separate matter, and in that event, I think the simplest thing to do is to assume f(T) for EMO remains by-val for extern(C++), that is "EMO calling only applies to extern(D) code", and then we make an rvalref attribute that applies only to extern(C++) for the sake of matching calling convention in a future DIP? I might be interested in attempting a DIP that adds an extern(C++)-only rval-ref attribute on top of this DIP; It needs more work. Existing work (for instance, last SAOC) was taken from the perspective that it solves move semantics in D at the same time as interfacing C++. With this DIP addressing elaborate move in D, I think a new approach which ONLY addresses interfacing C++ is an easier sell, and keeps the feature much more isolated.
Mar 22
next sibling parent Max Haughton <maxhaton gmail.com> writes:
On Tuesday, 23 March 2021 at 00:35:31 UTC, Manu wrote:
 On Fri, Mar 5, 2021 at 10:25 PM Mike Parker via Digitalmars-d < 
 digitalmars-d puremagic.com> wrote:

 [...]
I'm really late to the party here, and there's a lot to catch up, so maybe I'm repeating discussion that already happened... but I think there's a few points that need consideration. [...]
Thank you for the feedback, I'm still mulling over it, but the idea of having move semantics for D that we can make the best they can possibly be, and then a solution for C++ compatibility seems like a good one. I'm not a huge fan of magic quasi-UDA things on parameters personally. One thing I have put to Walter and would like more opinions on, is being a little more like Rust and less like C++ by having more violent move semantics that (often described as destructive) bypass the .init process described here. Although I'd like it, I'm not convinced of how practical it is, but it would greatly simplify analysis of the program both for a static analyzer (presumably the successor to ob.d) and for the programmers brain (e.g. think about how many value categories C++ has, and how use of move semantics in C++ can end up introducing a subtle new state to objects).
Mar 22
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 23 March 2021 at 00:35:31 UTC, Manu wrote:
 I'm really late to the party here, and there's a lot to catch 
 up, so maybe I'm repeating discussion that already happened... 
 but I think there's a few points that need consideration.
The extern(C++) case has not been discussed much so far. It goes without saying that it needs to be as close as C++ as it can be. I assumed the DIP to not apply to extern(C++).
Mar 23
parent reply Manu <turkeyman gmail.com> writes:
On Tue, Mar 23, 2021 at 10:05 PM deadalnix via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Tuesday, 23 March 2021 at 00:35:31 UTC, Manu wrote:
 I'm really late to the party here, and there's a lot to catch
 up, so maybe I'm repeating discussion that already happened...
 but I think there's a few points that need consideration.
The extern(C++) case has not been discussed much so far. It goes without saying that it needs to be as close as C++ as it can be. I assumed the DIP to not apply to extern(C++).
The DIP mentions C++ compat, and it's bad. I think it would be best if the whole DIP apply only to extern(D), and we'll address C++ compat separately.
Mar 23
parent Max Haughton <maxhaton gmail.com> writes:
On Wednesday, 24 March 2021 at 00:14:27 UTC, Manu wrote:
 On Tue, Mar 23, 2021 at 10:05 PM deadalnix via Digitalmars-d < 
 digitalmars-d puremagic.com> wrote:

 On Tuesday, 23 March 2021 at 00:35:31 UTC, Manu wrote:
 I'm really late to the party here, and there's a lot to 
 catch up, so maybe I'm repeating discussion that already 
 happened... but I think there's a few points that need 
 consideration.
The extern(C++) case has not been discussed much so far. It goes without saying that it needs to be as close as C++ as it can be. I assumed the DIP to not apply to extern(C++).
The DIP mentions C++ compat, and it's bad. I think it would be best if the whole DIP apply only to extern(D), and we'll address C++ compat separately.
I think this is the right approach.
Mar 23