www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Slice expressions - exact evaluation order, dollar

reply kinke <noone nowhere.com> writes:
The following snippet is interesting:

<<<
__gshared int step = 0;
__gshared int[] globalArray;

ref int[] getBase()
{
     assert(step == 0);
     ++step;
     return globalArray;
}

int getLowerBound(size_t dollar)
{
     assert(step == 1);
     ++step;
     assert(dollar == 0);
     globalArray = [ 666 ];
     return 1;
}

int getUpperBound(size_t dollar)
{
     assert(step == 2);
     ++step;
     assert(dollar == 1);
     globalArray = [ 1, 2, 3 ];
     return 3;
}


void main()
{
     auto r = getBase()[getLowerBound($) .. getUpperBound($)];
     assert(r == [ 2, 3 ]);
}

Firstly, it fails with DMD 2.071 because $ in the upper bound expression is 0, i.e., it doesn't reflect the updated length (1) after evaluating the lower bound expression. LDC does. Secondly, DMD 2.071 throws a RangeError, most likely because it's using the initial length for the bounds checks too. Most interesting IMO though is the question when the slicee's pointer is to be loaded. This is only relevant if the base is an lvalue and may therefore be modified when evaluating the bound expressions. Should the returned slice be based on the slicee's buffer before or after evaluating the bounds expressions? This has been triggered by https://github.com/ldc-developers/ldc/issues/1433 as LDC loads the pointer before evaluating the bounds.
Jun 17 2016
next sibling parent kinke <noone nowhere.com> writes:
Ping. Let's clearly define these hairy evaluation order details 
and add corresponding tests; that'd be another advantage over C++.
Jun 25 2016
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 17.06.2016 21:59, kinke wrote:
 Most interesting IMO though is the question when the slicee's pointer is
 to be loaded. This is only relevant if the base is an lvalue and may
 therefore be modified when evaluating the bound expressions. Should the
 returned slice be based on the slicee's buffer before or after
 evaluating the bounds expressions?
 This has been triggered by
 https://github.com/ldc-developers/ldc/issues/1433 as LDC loads the
 pointer before evaluating the bounds.
Evaluation order should be strictly left-to-right. DMD and GDC get it wrong here.
Jun 25 2016
next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 26 June 2016 at 03:30, Timon Gehr via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 17.06.2016 21:59, kinke wrote:
 Most interesting IMO though is the question when the slicee's pointer is
 to be loaded. This is only relevant if the base is an lvalue and may
 therefore be modified when evaluating the bound expressions. Should the
 returned slice be based on the slicee's buffer before or after
 evaluating the bounds expressions?
 This has been triggered by
 https://github.com/ldc-developers/ldc/issues/1433 as LDC loads the
 pointer before evaluating the bounds.
Evaluation order should be strictly left-to-right. DMD and GDC get it wrong here.
It is evaluated left-to-right. getBase() -> getLowerBound() -> getUpperBound().
Jun 26 2016
prev sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 26 June 2016 at 09:36, Iain Buclaw <ibuclaw gdcproject.org> wrote:

 On 26 June 2016 at 03:30, Timon Gehr via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 17.06.2016 21:59, kinke wrote:
 Most interesting IMO though is the question when the slicee's pointer is
 to be loaded. This is only relevant if the base is an lvalue and may
 therefore be modified when evaluating the bound expressions. Should the
 returned slice be based on the slicee's buffer before or after
 evaluating the bounds expressions?
 This has been triggered by
 https://github.com/ldc-developers/ldc/issues/1433 as LDC loads the
 pointer before evaluating the bounds.
Evaluation order should be strictly left-to-right. DMD and GDC get it
wrong
 here.
It is evaluated left-to-right. getBase() -> getLowerBound() -> getUpperBound().
Ah, I see what you mean. I think you may be using an old GDC version. Before I used to cache the result of getBase(). Old codegen: _base = *(getBase()); _lwr = getLowerBound(_base.length); _upr = getUpperBound(_base.length); r = {.length=(_upr - _lwr), .ptr=_base.ptr + _lwr * 4}; --- Now when creating temporaries of references, the reference is stabilized instead. New codegen: *(_ptr = getBase()); _lwr = getLowerBound(_ptr.length); _upr = getUpperBound(_ptr.length); r = {.length=(_upr - _lwr), .ptr=_ptr.ptr + _lwr * 4}; --- I suggest you fix LDC if it doesn't already do this. :-)
Jun 26 2016
next sibling parent kinke <noone nowhere.com> writes:
On Sunday, 26 June 2016 at 08:08:58 UTC, Iain Buclaw wrote:
 Now when creating temporaries of references, the reference is 
 stabilized instead.

 New codegen:

 *(_ptr = getBase());
 _lwr = getLowerBound(_ptr.length);
 _upr = getUpperBound(_ptr.length);
 r = {.length=(_upr - _lwr), .ptr=_ptr.ptr + _lwr * 4};
 ---

 I suggest you fix LDC if it doesn't already do this. :-)
Thx for the replies - so my testcase works for GDC already? So since what GDC is doing is what I came up for independently for
Jun 26 2016
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 26.06.2016 10:08, Iain Buclaw via Digitalmars-d wrote:
      > Evaluation order should be strictly left-to-right. DMD and GDC
     get it wrong
      > here.
      >

     It is evaluated left-to-right. getBase() -> getLowerBound() ->
     getUpperBound().


 Ah, I see what you mean.  I think you may be using an old GDC version.
 Before I used to cache the result of getBase().

 Old codegen:

 _base = *(getBase());
 _lwr = getLowerBound(_base.length);
 _upr = getUpperBound(_base.length);
 r = {.length=(_upr - _lwr), .ptr=_base.ptr + _lwr * 4};

 ---
This seems to be what I'd expect. It's also what CTFE does. CTFE and run time behaviour should be identical. (So either one of them needs to be fixed.)
 Now when creating temporaries of references, the reference is stabilized
 instead.

 New codegen:

 *(_ptr = getBase());
 _lwr = getLowerBound(_ptr.length);
 _upr = getUpperBound(_ptr.length);
 r = {.length=(_upr - _lwr), .ptr=_ptr.ptr + _lwr * 4};
 ---

 I suggest you fix LDC if it doesn't already do this. :-)
I'm not convinced this is a good idea. It makes (()=>base)()[lwr()..upr()] behave differently from base[lwr()..upr()].
Jun 26 2016
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 26 June 2016 at 14:33, Timon Gehr via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 26.06.2016 10:08, Iain Buclaw via Digitalmars-d wrote:
 Old codegen:

 _base = *(getBase());
 _lwr = getLowerBound(_base.length);
 _upr = getUpperBound(_base.length);
 r = {.length=(_upr - _lwr), .ptr=_base.ptr + _lwr * 4};

 ---
This seems to be what I'd expect. It's also what CTFE does. CTFE and run time behaviour should be identical. (So either one of them needs to be fixed.)
Very likely CTFE. Anyway, this isn't the only thing where CTFE and Runtime do things differently.
 Now when creating temporaries of references, the reference is stabilized
 instead.

 New codegen:

 *(_ptr = getBase());
 _lwr = getLowerBound(_ptr.length);
 _upr = getUpperBound(_ptr.length);
 r = {.length=(_upr - _lwr), .ptr=_ptr.ptr + _lwr * 4};
 ---

 I suggest you fix LDC if it doesn't already do this. :-)
I'm not convinced this is a good idea. It makes (()=>base)()[lwr()..upr()] behave differently from base[lwr()..upr()].
No, sorry, I'm afraid you are wrong there. They should both behave exactly the same. I may need to step aside and explain what changed in GDC, as it had nothing to do with this LDC bug. ==> Step What made this subtle change was in relation to fixing bug 42 and 228 in GDC, which involved turning on TREE_ADDRESSABLE(type) bit in our codegen trees, which in turn makes NRVO work consistently regardless of optimization flags used - no more optimizer being confused by us "faking it". How is the above jargon related? Well, one of the problems faced was that it must be ensured that lvalues continue being lvalues when considering creating a temporary in the codegen pass. Lvalue references must have the reference stabilized, not the value that is being dereferenced. This also came with an added assurance that GDC will now *never* create a temporary of a decl with a cpctor or dtor, else it'll die with an internal compiler error trying. :-) <== Step (() => base)[lwr()..up()] will make a temporary of (() => base), but guarantees that references are stabilized first. base[lwr()..upr()] will create no temporary if base has no side effects. And so if lwr() modifies base, then upr() will get the updated copy.
Jun 26 2016
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 26.06.2016 20:08, Iain Buclaw via Digitalmars-d wrote:
 On 26 June 2016 at 14:33, Timon Gehr via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 26.06.2016 10:08, Iain Buclaw via Digitalmars-d wrote:
 Old codegen:

 _base = *(getBase());
 _lwr = getLowerBound(_base.length);
 _upr = getUpperBound(_base.length);
 r = {.length=(_upr - _lwr), .ptr=_base.ptr + _lwr * 4};

 ---
This seems to be what I'd expect. It's also what CTFE does. CTFE and run time behaviour should be identical. (So either one of them needs to be fixed.)
Very likely CTFE. Anyway, this isn't the only thing where CTFE and Runtime do things differently. ...
All arbitrary differences should be eradicated.
 Now when creating temporaries of references, the reference is stabilized
 instead.

 New codegen:

 *(_ptr = getBase());
 _lwr = getLowerBound(_ptr.length);
 _upr = getUpperBound(_ptr.length);
 r = {.length=(_upr - _lwr), .ptr=_ptr.ptr + _lwr * 4};
 ---

 I suggest you fix LDC if it doesn't already do this. :-)
I'm not convinced this is a good idea. It makes (()=>base)()[lwr()..upr()] behave differently from base[lwr()..upr()].
No, sorry, I'm afraid you are wrong there. They should both behave exactly the same. ...
I don't see how that is possible, unless I misunderstood your previous explanation. As far as I understand, for the first expression, code gen will generate a reference to a temporary copy of base, and for the second expression, it will generate a reference to base directly. If lwr() or upr() then update the ptr and/or the length of base, those changes will be seen for the second slice expression, but not for the first.
 I may need to step aside and explain what changed in GDC, as it had
 nothing to do with this LDC bug.

 ==> Step

 What made this subtle change was in relation to fixing bug 42 and 228
 in GDC, which involved turning on TREE_ADDRESSABLE(type) bit in our
 codegen trees, which in turn makes NRVO work consistently regardless
 of optimization flags used - no more optimizer being confused by us
 "faking it".

 How is the above jargon related? Well, one of the problems faced was
 that it must be ensured that lvalues continue being lvalues when
 considering creating a temporary in the codegen pass.  Lvalue
 references must have the reference stabilized, not the value that is
 being dereferenced.  This also came with an added assurance that GDC
 will now *never* create a temporary of a decl with a cpctor or dtor,
 else it'll die with an internal compiler error trying. :-)
 ...
What is the justification why the base should be evaluated as an lvalue?
 <== Step

 (() => base)[lwr()..up()] will make a temporary of (() => base), but
 guarantees that references are stabilized first.
(I assume you meant (() => base)()[lwr()..upr()].) The lambda returns by value, so you will stabilize the reference to a temporary copy of base? (Unless I misunderstand your terminology.)
 base[lwr()..upr()] will create no temporary if base has no side
 effects.  And so if lwr() modifies base, then upr() will get the
 updated copy.
Yes, it is clear that upr() should see modifications to memory that lwr() makes. The point is that the slice expression itself does or does not see the updates based on whether I wrap base in a lambda or not.
Jun 26 2016
next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 27 June 2016 at 04:38, Timon Gehr via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 26.06.2016 20:08, Iain Buclaw via Digitalmars-d wrote:
 On 26 June 2016 at 14:33, Timon Gehr via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 26.06.2016 10:08, Iain Buclaw via Digitalmars-d wrote:
 Old codegen:

 _base = *(getBase());
 _lwr = getLowerBound(_base.length);
 _upr = getUpperBound(_base.length);
 r = {.length=(_upr - _lwr), .ptr=_base.ptr + _lwr * 4};

 ---
This seems to be what I'd expect. It's also what CTFE does. CTFE and run time behaviour should be identical. (So either one of them needs to be fixed.)
Very likely CTFE. Anyway, this isn't the only thing where CTFE and Runtime do things differently. ...
All arbitrary differences should be eradicated.
 Now when creating temporaries of references, the reference is stabilized
 instead.

 New codegen:

 *(_ptr = getBase());
 _lwr = getLowerBound(_ptr.length);
 _upr = getUpperBound(_ptr.length);
 r = {.length=(_upr - _lwr), .ptr=_ptr.ptr + _lwr * 4};
 ---

 I suggest you fix LDC if it doesn't already do this. :-)
I'm not convinced this is a good idea. It makes (()=>base)()[lwr()..upr()] behave differently from base[lwr()..upr()].
No, sorry, I'm afraid you are wrong there. They should both behave exactly the same. ...
I don't see how that is possible, unless I misunderstood your previous explanation. As far as I understand, for the first expression, code gen will generate a reference to a temporary copy of base, and for the second expression, it will generate a reference to base directly. If lwr() or upr() then update the ptr and/or the length of base, those changes will be seen for the second slice expression, but not for the first.
 I may need to step aside and explain what changed in GDC, as it had
 nothing to do with this LDC bug.

 ==> Step

 What made this subtle change was in relation to fixing bug 42 and 228
 in GDC, which involved turning on TREE_ADDRESSABLE(type) bit in our
 codegen trees, which in turn makes NRVO work consistently regardless
 of optimization flags used - no more optimizer being confused by us
 "faking it".

 How is the above jargon related? Well, one of the problems faced was
 that it must be ensured that lvalues continue being lvalues when
 considering creating a temporary in the codegen pass.  Lvalue
 references must have the reference stabilized, not the value that is
 being dereferenced.  This also came with an added assurance that GDC
 will now *never* create a temporary of a decl with a cpctor or dtor,
 else it'll die with an internal compiler error trying. :-)
 ...
What is the justification why the base should be evaluated as an lvalue?
Because changes made to a temporary get lost as they never bind back to the original reference. Regardless, creating a temporary of a struct with a cpctor violates the semantics of the type - it's the job of the frontend to generate all the code for lifetime management for us. (Sorry for the belated response, I have been distracted).
Jul 12 2016
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12.07.2016 23:56, Iain Buclaw via Digitalmars-d wrote:
What is the justification why the base should be evaluated as an lvalue?
Because changes made to a temporary get lost as they never bind back to the original reference. ...
Which I'd expect. It is just like: int x = 0; assert(3 == ++x + ++x); If the first '++x' was evaluated by reference, this would be 4, not 3.
 Regardless, creating a temporary of a struct with a cpctor violates
 the semantics of the type - it's the job of the frontend to generate
 all the code for lifetime management for us.
 ...
Yes, but the front end can also be wrong. What is unclear here is if/why the front end should evaluate the array base by reference.
 (Sorry for the belated response, I have been distracted).
(Me too.)
Jul 18 2016
prev sibling parent reply kinke <noone nowhere.com> writes:
On Monday, 27 June 2016 at 02:38:22 UTC, Timon Gehr wrote:
 As far as I understand, for the first expression, code gen will 
 generate a reference to a temporary copy of base, and for the 
 second expression, it will generate a reference to base 
 directly. If lwr() or upr() then update the ptr and/or the 
 length of base, those changes will be seen for the second slice 
 expression, but not for the first.
Exactly. That's what I initially asked in
 Should the returned slice be based on the slicee's buffer 
 before or after evaluating the bounds expressions?
So Timon prefers the pre-buffer (apparently what DMD does), GDC does the post-buffer, and LDC buggily something inbetween (for $, we treat base.length as lvalue, but we load base.ptr before evaluating the bounds, hence treating base as rvalue there). Can we agree on something, add corresponding tests and make sure CTFE works exactly the same? %)
 The point is that the slice expression itself does or does not 
 see the updates based on whether I wrap base in a lambda or not.
I don't really see a necessity for the lambda to return the same kind (lvalue/rvalue) of value as the expression directly.
Jul 13 2016
parent kinke <noone nowhere.com> writes:
On Wednesday, 13 July 2016 at 21:06:28 UTC, kinke wrote:
 On Monday, 27 June 2016 at 02:38:22 UTC, Timon Gehr wrote:
 The point is that the slice expression itself does or does not 
 see the updates based on whether I wrap base in a lambda or 
 not.
I don't really see a necessity for the lambda to return the same kind (lvalue/rvalue) of value as the expression directly.
Oh, that's actually https://issues.dlang.org/show_bug.cgi?id=16271. So lambda wrapping isn't the issue here. It's just that both ways of dealing with the base are possible and arguably plausible. Is the current DMD way (base treated as rvalue) the one to be followed or has just nobody given this a deeper thought yet?
Jul 13 2016
prev sibling parent Michael Coulombe <kirsybuu gmail.com> writes:
On Friday, 17 June 2016 at 19:59:09 UTC, kinke wrote:

 void main()
 {
     auto r = getBase()[getLowerBound($) .. getUpperBound($)];
     assert(r == [ 2, 3 ]);
 }

 Firstly, it fails with DMD 2.071 because $ in the upper bound 
 expression is 0, i.e., it doesn't reflect the updated length 
 (1) after evaluating the lower bound expression. LDC does.
The docs aren't fully detailed, but this is explicit behavior in the DMD front end that is the same no matter what type getBase() returns: "Note that opDollar!i is only evaluated once for each i where $ occurs in the corresponding position in the indexing operation." - https://dlang.org/spec/operatoroverloading.html "PostfixExpression is evaluated. if PostfixExpression is an expression of type static array or dynamic array, the special variable $ is declared and set to be the length of the array. " - https://dlang.org/spec/expression.html
Jul 13 2016