www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9238] New: Support rvalue references

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238

           Summary: Support rvalue references
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: bugzilla digitalmars.com
            Blocks: 9218



16:49:22 PST ---
Discussion here:

http://forum.dlang.org/thread/4F84D6DD.5090405 digitalmars.com#post-4F84D6DD.5090405:40digitalmars.com

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 28 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich gmail.com



07:35:01 PST ---
Does it really block Issue 9218? We've had a discussion in the forums recently
to make `auto ref` a non-template by making the compiler convert this call:

void main()
{
    auto b = S() > S();  // assume S has 'int opCmp(const ref A a) const'
}

Into this:

void main()
{
    S _hidden1, hidden2;
    auto b = _hidden1 > _hidden2;
}

See
http://forum.dlang.org/thread/mailman.2989.1356370854.5162.digitalmars-d puremagic.com?page=2#post-kbcc62:24192v:242:40digitalmars.com

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




07:36:31 PST ---

 Does it really block Issue 9218? We've had a discussion in the forums recently
 to make `auto ref` a non-template by making the compiler convert this call:
     auto b = S() > S();  // assume S has 'int opCmp(const ref A a) const'
I think I meant: int opCmp()(const auto ref A a) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




07:37:52 PST ---


 Does it really block Issue 9218? We've had a discussion in the forums recently
 to make `auto ref` a non-template by making the compiler convert this call:
     auto b = S() > S();  // assume S has 'int opCmp(const ref A a) const'
I think I meant: int opCmp()(const auto ref A a)
Argh, this: int opCmp(const auto ref A a) Essentially it isn't a template, but special enough that the compiler converts literals into hidden lvalues which it passes to the function. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




I'd like to propose using `in ref` rather than `auto ref` for the purpose.

Reasons:

1. `in ref` implies `const scope ref`. 

If the reference binds temporary rvalue, its address must not escape. We don't
have correct `scope` semantics yet, but we can allow the semantic as a limited
case.

2. `in ref` is recently allowed from 2.060, by fixing issue 8105.

https://github.com/d-programming-language/dmd/commit/687044996a06535210801577e5d68b72edfa3985

We can guess that many programmers don't use `in ref`.

3. For normal function, we cannot implement the exact `auto ref` semantics as
same as for template function.

That means, `auto ref` must be used with template function.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com



PST ---
I'm against the in ref idea.

1. It subverts what in currently does. You _really_ don't want to require that
scope be used in this case. Once scope has actually been fixed to actually
check for escaping references, you'll either end up with conflicting behavior
with in depending on whether it's ref or not, or you'll end up with scope's
restrictions on it, which would be horrendously over-restrictive. Not to
mention, I'd argue that in is already too overloaded as it is. Too many people
use it because they like they idea that it's the opposite of out without taking
into account that it means not only const but _scope_. We should _not_
encourage it's use further, let alone give it a conflicting meaning. It's
causing enough trouble as it is.

2. I think that that the fact that auto ref allows you to accept both rvalues
and lvalues without const is very valuable. Yes, that means that if the
function actually mutates the parameter, then lvalue arguments will get mutated
whereas the change to rvalue arguments will be lost, but it means that you can
get the efficiency benefit without requiring const. And given how restrictive
D's const is, lots of people are avoiding it, and there are plenty of
legitimate use cases where you _can't_ use it.

So, I'd strongly argue for using auto ref such that

void foo(auto ref int param);

became

void foo(ref param);

and

foo(bar());

gets lowered to something like

auto _temp = bar();
foo(bar());

Then if you _don't_ want foo to be able to mutate its argument, you use const

void foo(auto ref const int param);

and if you don't care, you don't have to. And of course, if you _want_ it to
mutate the argument, then you just use plain ref.

I am _extremely_ leery of overloading in any further, and we do _not_ want to
these types of parameters to have have scope on them.

Honestly, if it were up to me, we'd make in on function parameters outright
illegal, since I think that overloading it like we already have is confusing
and is going to cause a lot of problems once scope is fixed. Let's not make the
problem worse.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238





A serious problem is:

We cannot make "rvalue references" with template functions, if we use `auto
ref`.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




PST ---
 We cannot make "rvalue references" with template functions, if we use `auto
ref`. And what's the problem with leaving auto ref as it is with templated functions and then making it work as previously described with non-templated functions? rvalues already work just fine with auto ref and templated functions. It's just non-templated functions which lack a solution. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238





 And what's the problem with leaving auto ref as it is with templated functions
In recent, I have used `auto ref` in std.algorithm.forward. Yes, ideally we can remove current 'templated auto ref'. There is an alternative solution (Make two overloaded functions - one receives rvalue, and the other receives lvalue -, and then disable either one). But it is a breaking change, as you say "It subverts what in currently does". `in ref` is very recently allowed from 2.060. `auto ref` is from 2.038. Then, removing current `auto ref` is much impact than changing `in ref` meaning.
 rvalues already work just fine with auto ref and templated functions. It's just
 non-templated functions which lack a solution.
 Once scope has actually been fixed to actually check for escaping references,
you'll either end up with conflicting behavior with in depending on whether
it's ref or not, or you'll end up with scope's restrictions on it, which would
be horrendously over-restrictive. 
At least it is a necessary restriction for `in ref`. For example, we should not allow following code. ref T foo(in ref T t) { return t; } If foo _actually_ receives a lvalue, returning t by ref is valid. But, if foo receives an rvalue, foo accidentally returns a dangling reference, and it's completely unsafe. So, we must select a conservative way at the point. --- Here, I want to double-check the feature which is discussed. Current `auto ref` with template function makes one or more template instances based on the actual argument lvalue-ness. It might cause template bloating, and for big size rvalue, object bit-copy is inefficient. On the other hand, the discussed feature in the forum is as like "const T& in C++". It can bind both lvalue and rvalue, and it will be passed to function via "reference" (e.g. pointer). And, it works with non-template functions, and template instantiation is not related. Right? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




PST ---
The current proposal is to leave auto ref for templates exactly as it is but to
make auto ref work with non-templated functions differently. I don't know how
const& is implemented in C++, so I don't know how close the proposal is to
that, though the use case would be similar. The proposal for non-templated
functions would be that

auto foo(auto ref T param) {...}

would become

auto foo(ref T param) {...}

and that if it were called with an rvalue, a local variable would be declared
to hold that rvalue so that it could be passed to the function by ref, and that
variable would leave scope as soon as the statement with the function call
completed. So,

foo(bar());

would become something like

auto _temp = bar();
foo(_temp);
//_temp leaves scope and is destroyed here

That way, auto ref would work with non-templated functions. But it was _not_
proposed that templated functions would change at all.

As for scope and auto ref / in ref, ref alone has the problem. You can do
something like

auto ref bar(ref int i)
{
    return bar;
}

auto ref foo()
{
    int i;
    return bar(i);
}

and you've now escaped a reference. The fact that auto ref could take an rvalue
has zero effect on that. ref is plenty. So, unless you're proposing that ref in
general use scope, I don't think that requiring that scope be used with auto
ref / in ref fixes much.

Also, I think that requiring that const be used is a big problem. const in D is
far more restrictive than it is in C++, so making it so that our counterpart to
C++'s const& has to use const is far too restrictive. auto ref with templates
works without const just fine. You run the risk of mutating the lvalue inside
the function, because there's no protection against it, but if you want to
prevent that you can just use const, and plenty of code _can't_ use const. So,
allowing auto ref to work without const is valuable, and I think that the
non-templated solution should do the same.

auto ref should basically be saying that the programmer wants unnecessary
copies to be avoided and doesn't care about protecting against lvalues being
mutated, whereas auto ref const says that they want to avoid unnecessary copies
and are willing to put up with the extra restrictions of const to get the
guarantee that lvalues won't be mutated.

in ref goes against that goal.

I think that we should either use auto ref for non-templated functions as I've
described (without touching how templated functions work at all) or that we
should come up with a new keyword to indicate the new thing that we want (even
if it starts with   rather than being an actual keyword). Overloading in
further is a bad idea IMHO, and I think that requiring either scope or const is
a bad idea. Certainly, if we need scope in this situation, then we need scope
for _all_ situations where ref is used, not just this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 30 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238


Andrei Alexandrescu <andrei erdani.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei erdani.com



15:03:08 PST ---
Desiderata
==========

Design choices may sometimes invalidate important use cases, so let's start
with what we'd like to have:

1. Safety

We'd like most or all uses of ref to be safe. If not all are safe, we should
have easy means to distinguish safe from unsafe cases statically. If that's not
possible, we should be able to enforce safety with simple runtime checks in
 safe code.

2. Efficient passing of values

The canonical use case of ref parameters is to allow the callee to modify a
value in the caller. However, a significant secondary use case is as an
optimization for passing arguments into a function. In such cases, the caller
is not concerned with mutation and may actually want to prevent it. The
remaining problem is that ref traditionally assumes the caller holds an actual
lvalue, whereas in such cases the caller may want to pass an rvalue.

3. Transparently returning references to ref parameters

One important use case is functions that return one of their reference
parameters, the simplest being:

ref T identity(T)(ref T obj) { return obj; }

We'd like to allow identity and to make it safe by design. If we don't, we
disallow a family of use cases such as min() and max() that return by
reference, call chaining idioms etc.

4. Sealed containers

This important use case is motivated by efficient and safe allocators. We want
to support scoped and region-based allocation, and at the same time we want to
combine such allocators with containers that return references to their data.

Consider as a simple example a scoped container:

struct ScopedContainer(T)
{
    private T[] payload;
    this(size_t n) { payload = new T[n]; }
    this(this) { payload = payload.dup; }
    ~this() { delete payload; }
    void opAssign(ref ScopedContainer rhs) {
      payload = rhs.payload.dup;
    }
    ref T opIndex(size_t n) { return payload[n]; }
}

The container eagerly allocates its state and deallocates it when it leaves
scope. We'd like to allow opIndex to typecheck and guarantee safety.

5. Simplicity

We wish to get the design right with maximum economy in language design. One
thing easily forgotten when focusing minutia while carrying significant context
in mind is that whatever language additions we make come on top of an already
large machinery.

There have been ideas based on defining "scope ref", "in ref", or " attribute
ref". We'd like to avoid such and instead make sure plain "ref" is useful,
safe, and easy to understand.  

------------

These desiderata and the interactions among them impose constraints on the
design space. In the following post I'll sketch some possible designs dictated
by prioritizing desiderata, and analyze the emerging tradeoffs.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




PST ---
 There have been ideas based on defining "scope ref", "in ref", or " attribute
 ref". We'd like to avoid such and instead make sure plain "ref" is useful,
 safe, and easy to understand.  
I would argue that it's vital that ref which requires an lvalue and ref which doesn't care whether it's given an lvalue or rvalue be distinguished. You're just begging for bugs otherwise. It should be clear in a function's signature whether it's intending to take an argument by ref and mutate it or whether it's simply trying to avoid unnecessary copying. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 09 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




16:07:04 PST ---

================================

One possible design is to give desideratum "4. Sealed containers" priority and
start from there.

Continuing the ScopedContainer example, we notice that to make it work we need
the lifetime of c[n] is bounded by the lifetime of c. We set out to enforce
that statically. The simplest and most conservative rule would be:

----------
For functions returning ref, the lifetime of the returned object spans at least
through the scope of the caller.
----------

Impact on desiderata:

To enforce safety we'd need to disallow any ref-returning function from
returning a value with too short a scope. Examples:

ref int fun(int a) { return a; }
// Error: escapes address of by-value parameter

ref int gun() { int a; return a; }
// Error: escapes address of local

ref int hun() { return *(new int); }
// fine

ref int iun(int* p) { return *p; }
// fine

ref int identity(ref int a) { return a; }
// Should work

This last function typechecks if and only if the argument is guaranteed to have
a lifetime that expands through the end of the scope of the caller. In turn, if
we want to observe (2) and allow rvalues to bind to ref, that means any rvalue
created in the caller must exist through the end of the scope in which the
rvalue was created. This is a larger extent than what D currently allows
(destroy rvalues immediately after the call) and also larger than what C++
allows (destroy rvalues at the end of the full expression). It is unclear
whether this has bad consequences; probably not.

One interesting consequence is that ref returns are intransitive, i.e. cannot
be passed "up". Consider:

ref int identityImpl(ref int a) { return a; }
ref int identity(ref int a) { return identityImpl(a); }

Under the rule above this code won't compile although it is safe. This is
because from the viewpoint of identity(), identityImpl returns an int that can
only last through the scope of identity(). Attempting to return that is
tantamount to returning a local as far as identity() is concerned, so it won't
typecheck.

This limitation is rather severe. One obvious issue is that creating wrappers
around objects will be seriously limited. For example, a range can't forward
the front of a member:

struct Range {
  private AnotherRange _source;
  // ... inside some Range implementation ...
  ref T front() { return _source.front; } // error
}

Summary
=======

1. Design is safe
2. Rvalues can be bound to ref (subject to unrelated limitations) ONLY if the
lifetime of rvalues is prolonged through the end of the scope they're created
in. (Assessment: fine)
3. Implementing identity(): possible but intransitive, i.e. references can't be
passed up call chains. (Asessment: limitation is problematic.)
4. Sealed containers: possible and safe, but present wrapping problems due to
(3).
5. Simplicity: good

I'll next present a refinement of this design that improves on its
disadvantages without losing the advantages.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




18:32:23 PST ---

============================================


upwards even when it's safe to do so. To improve on that, let's devise a
refined rule:

----------
For functions returning ref, the lifetime of the returned object spans at least
the lifetime of its shortest-lived argument.
----------

Impact on desiderata:

Reconsidering the troublesome example:

ref int identityImpl(ref int a) { return a; }
ref int identity(ref int a) { return identityImpl(a); }

When compiling identity(), the compiler (without seeing the body of
identityImpl) figures that the lifetime of the value returned by
identityImpl(a) is at least as long as the lifetime of a itself. Therefore
identity() typechecks because it is allowed to return a proper.

Safety is still guaranteed however. This is because a function can never escape
a reference to an object of shorter lifetime than the lifetime of the
reference. Reconsidering the front() example:

struct Range {
  private AnotherRange _source;
  // ... inside some Range implementation ...
  ref T front() { return _source.front; } // fine
}

front() compiles because front is really a regular function taking a "ref Range
this". Then _source is scoped inside "this" so from a lifetime standpoint
"this", _source, and the result are in good order.

ref int fun() {
   Range r;
   return r.front; // error
}

fun() does not compile because the call r.front returns a value with the
lifetime of r, so returning a ref is tantamount to escaping the address of a
local.

ref int gun(Range r) {
   return r.front; // error
}

This also doesn't compile because the result of r.front has the lifetime of r,
which is passed by value into gun.

ref int gun(ref Range r) {
   return r.front; // fine
}

This does work because the result has the same lifetime as r.

The question remains on how to handle rvalues bound to ref parameters. The
previous design required that rvalues live as long as the scope, and this
design would allow that too. But this design also allows the C++-style
destruction of rvalues: in the call foo(bar()), if foo returns a ref, it must
be used immediately because bar will be destroyed at the end of the full
expression.

If we want to keep the current D rule of destroying rvalue parameters right
after the call to the function, that effectively disallows any use of the ref
result. This may actually be a meaningful choice.

The largest problem of this design is lifetime pollution. Consider the
ScopedContainer example:

ref T opIndex(size_t n) { return payload_[n]; }

In the call c[42], the shortest lifetime is actually that of n, which binds to
the rvalue 42. So the compiler is forced to a shorter guarantee of the result
lifetime than the actual lifetime, because of an unrelated parameter.

Summary
=======

1. Design is safe
2. Design allows binding rvalues to ref parameters. For usability, temporaries
must last at least as long as the current expression (C++ style).
3. Returning ref parameters works with fewer restrictions than the previous
design.
4. Sealed containers are implementable.
5. Difficulty is moderate on the implementation side and moderate on the user
side.

Next iteration of the design will attempt to refine the lifetime of results so
as to avoid pollution.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9238




12:03:18 PDT ---
Adding an example that should work by Steve:
http://forum.dlang.org/thread/ylebrhjnrrcajnvtthtt forum.dlang.org?page=11

struct S
{
    int x;
    ref S opOpAssign(string op : "+")(ref S other) { x += other.x; return  
this;}
}

ref S add5(ref S s)
{
    auto o = S(5);
    return s += o;
}

void main()
{
    auto s = S(5);
    S s2 = add5(s);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 23 2013