digitalmars.D - Proposal: Operator overloading without temporaries

Don Clugston (90/90) Mar 27 2006 Background: Operator overloading, in the form it exists in C++ and

Jarrett Billingsley (4/5) Mar 27 2006 I really, really like this idea. Granted, I don't do too much with oper...
Craig Black (2/2) Mar 27 2006 Beautiful! An elegant solution to a long standing and annoying problem.
Dave (8/98) Mar 27 2006 Hear, hear! And my opinions are (I) yes, (II) yes and (III) no. And I'd ...
Charles (6/111) Mar 27 2006 Wow looks good ... too good. How could this have gone

Sean Kelly (7/10) Mar 27 2006 I have a feeling this may be a lot more difficult in C++ because of

kris (2/17) Mar 27 2006 ADL? Does that stand for Attention Deficit Level?

pragma (4/21) Mar 27 2006 Sorry, I missed this entire thread. What's this about now?
Sean Kelly (7/22) Mar 27 2006 Argument Dependent Lookup. ie. the complex overload resolution

Charles (2/37) Mar 27 2006

Walter Bright (1/1) Mar 27 2006 I think it's a great idea.
Regan Heath (23/24) Mar 27 2006 I was wondering the exact same thing recently:
Sean Kelly (10/35) Mar 27 2006 Very nice. And much better than expression templates. I'm all for it.
James Dunne (38/143) Mar 27 2006 I guess I'll be the "Negative Nancy" here for purposes of strengthening

Don Clugston (23/56) Mar 27 2006 I don't know - but I don't think so. My feeling is that if it strays too...

Sean Kelly (11/16) Mar 28 2006 I think that's something that could be added as a Quality of

Norbert Nemec (21/27) Mar 28 2006 I don't see any obvious reasons against this proposal, but one should

Don Clugston (11/43) Mar 28 2006 Obviously not with real matrices (C*D is not a pointwise operation), but...

Norbert Nemec (14/23) Mar 28 2006 Point accepted. For matrices, the issues are much more complicated, but

James Dunne (13/43) Mar 28 2006 If possible, can someone lay out a clear definition of both "array

xs0 (37/43) Mar 28 2006 afaik, array expressions are just expressions which get evaluated

James Dunne (10/74) Mar 28 2006 Yes, thank you very much!

Sean Kelly (22/34) Mar 28 2006 It almost seems like this could be handled via a special opIndex

Norbert Nemec (9/11) Mar 28 2006 I believe people should not be overly afraid of expression templates. In

Oskar Linde (12/16) Mar 28 2006 I fully support this proposal. It makes sense to place stricter semantic...
Bruno Medeiros (8/22) Apr 02 2006 Ok, I'm new to this, so it took me a while to understand the problem.

Don Clugston (5/25) Apr 03 2006 Not really, it applies everywhere that you can have overloaded

Bruno Medeiros (13/40) Apr 04 2006 But with structs (more generally, with stack-based value types), can't

Don Clugston (8/47) Apr 05 2006 True, but for objects on the stack, the cost is really just in the

Don Clugston <dac nospam.com.au> writes:

Background: Operator overloading, in the form it exists in C++ and 
currently in D, inherently results in sub-optimal code, because it 
always results in unnecessary temporary objects being created.

For example,
X = A - ((B*C) + D)* E;

becomes:
T1 = B * C;
T2 = T1 + D;
T3 = T2 * E;
T4 = A - T3;
X = T4;
Four objects were created, whereas only one was strictly required.
In C++, there are libraries like Blitz++ which use complicated 
expression templates in order to avoid these creating these temporaries, 
and provide performance comparable with FORTRAN. I think D can do much 
better...
Note that temporaries are avoided when using the opXXXAssign() operators 
like +=.

===========
   Proposal
===========
(1) Allow the compiler to assume that b = b + c  can be replaced with b 
+= c. (In C++, operator + and operator += are just symbols, the compiler 
doesn't know that there is any relationship between them).
In the example above, this would allow the compiler to generate:
T1 = B * C;
T1 += D;
T1 *= E;

and we have eliminated two of the three temporaries.
(2). Fill in the gaps in the operator overloading table by introducing
opAddAssign_r, opSubAssign_r, etc.

Just as A.opSubAssign(B)
is the operation  A -= B  or equivalently  A = A - B, similarly

A.opSubAssign_r(B)
would mean
A = B - A.
and would only occur when temporaries are generated in expressions. Like 
-=, it's an operation which can frequently be performed very 
efficiently, but at present the language has no way of expressing it.

Our original example then becomes:

T1 = B.opMul(C);
T1.opAddAssign(D);
T1.opMulAssign(E);
T1.opSubAssign_r(A);
X = T1;
... and all the useless temporaries are gone!

More formally, when the expression tree for an expression is generated:
With a binary operator XXX, operating on left & right nodes:

if (the left node is *not* an original leaf node) {
    // the left node is a temporary, does not need to be preserved.
    // we don't care if the right node is a temporary or not
    look for opXXXAssign().
} else if (the the right node is not an original leaf node) {
    // the right node is a temporary
    look for opXXXAssign_r()
} else {
   // both left and right nodes are leaf nodes, we have to
   // create a temporary
    look for opXXX(), just as it does now.
}

These rules also cope with the situation where temporaries are required:
eg
X = (A*B) + (C*D);
becomes
T1 = A*B;
T2 = C*D;
T1 += T2;
X = T1;

If this were implemented, it would permanently eradicate (for D) the 
most significant advantage which Fortran has managed to retain over 
object-oriented languages. And I really don't think it would be 
difficult to implement, or have negative side-effects.

There are a couple of decisions to be made:
(I) should the compiler use opAdd() and generate a temporary, if 
opAddAssign_r() doesn't exist, to preserve existing behaviour? I think 
the answer to this is YES.
(II) should the compiler use opAdd() and generate a temporary, if 
oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm 
inclined to answer YES.
(III) If the code includes +=, and there is an opAdd() but no 
opAddAssign(), should the compiler accept this, and just generate an 
opAdd() followed by an assignment?? This would mean that opAdd() would 
generate the += operation as well as +, while opAddAssign() would be a 
performance enhancement. (It would still be possible to have 
opAddAssign() without opAdd(), to have += but not +, but it would not be 
possible to have + without +=). This would mean that += would be 
*purely* syntactic sugar.

Decision III would be a little more difficult to implement and is of 
less obvious merit, I only mention it as a possibility.

Comments?

Mar 27 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:e087or$dgm$1 digitaldaemon.com...
 Comments?

I really, really like this idea.  Granted, I don't do too much with operator 
overloading, but this seems like a very solid improvement to it.

Mar 27 2006

"Craig Black" <cblack ara.com> writes:

Beautiful! An elegant solution to a long standing and annoying problem.

-Craig

Mar 27 2006

Dave <Dave_member pathlink.com> writes:

Hear, hear! And my opinions are (I) yes, (II) yes and (III) no. And I'd like to
suggest that this optimization would be applied to built-in's like strX = strX ~
strY; => strX ~= strY;

The problem this addresses has driven me nuts in the C++ world as I've
maintained/optimized code. (i.e.: operator += is defined but not used where it
could be).

- Dave

In article <e087or$dgm$1 digitaldaemon.com>, Don Clugston says...
Background: Operator overloading, in the form it exists in C++ and 
currently in D, inherently results in sub-optimal code, because it 
always results in unnecessary temporary objects being created.

For example,
X = A - ((B*C) + D)* E;

becomes:
T1 = B * C;
T2 = T1 + D;
T3 = T2 * E;
T4 = A - T3;
X = T4;
Four objects were created, whereas only one was strictly required.
In C++, there are libraries like Blitz++ which use complicated 
expression templates in order to avoid these creating these temporaries, 
and provide performance comparable with FORTRAN. I think D can do much 
better...
Note that temporaries are avoided when using the opXXXAssign() operators 
like +=.

===========
   Proposal
===========
(1) Allow the compiler to assume that b = b + c  can be replaced with b 
+= c. (In C++, operator + and operator += are just symbols, the compiler 
doesn't know that there is any relationship between them).
In the example above, this would allow the compiler to generate:
T1 = B * C;
T1 += D;
T1 *= E;

and we have eliminated two of the three temporaries.
(2). Fill in the gaps in the operator overloading table by introducing
opAddAssign_r, opSubAssign_r, etc.

Just as A.opSubAssign(B)
is the operation  A -= B  or equivalently  A = A - B, similarly

A.opSubAssign_r(B)
would mean
A = B - A.
and would only occur when temporaries are generated in expressions. Like 
-=, it's an operation which can frequently be performed very 
efficiently, but at present the language has no way of expressing it.

Our original example then becomes:

T1 = B.opMul(C);
T1.opAddAssign(D);
T1.opMulAssign(E);
T1.opSubAssign_r(A);
X = T1;
... and all the useless temporaries are gone!

More formally, when the expression tree for an expression is generated:
With a binary operator XXX, operating on left & right nodes:

if (the left node is *not* an original leaf node) {
    // the left node is a temporary, does not need to be preserved.
    // we don't care if the right node is a temporary or not
    look for opXXXAssign().
} else if (the the right node is not an original leaf node) {
    // the right node is a temporary
    look for opXXXAssign_r()
} else {
   // both left and right nodes are leaf nodes, we have to
   // create a temporary
    look for opXXX(), just as it does now.
}

These rules also cope with the situation where temporaries are required:
eg
X = (A*B) + (C*D);
becomes
T1 = A*B;
T2 = C*D;
T1 += T2;
X = T1;

If this were implemented, it would permanently eradicate (for D) the 
most significant advantage which Fortran has managed to retain over 
object-oriented languages. And I really don't think it would be 
difficult to implement, or have negative side-effects.

There are a couple of decisions to be made:
(I) should the compiler use opAdd() and generate a temporary, if 
opAddAssign_r() doesn't exist, to preserve existing behaviour? I think 
the answer to this is YES.
(II) should the compiler use opAdd() and generate a temporary, if 
oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm 
inclined to answer YES.
(III) If the code includes +=, and there is an opAdd() but no 
opAddAssign(), should the compiler accept this, and just generate an 
opAdd() followed by an assignment?? This would mean that opAdd() would 
generate the += operation as well as +, while opAddAssign() would be a 
performance enhancement. (It would still be possible to have 
opAddAssign() without opAdd(), to have += but not +, but it would not be 
possible to have + without +=). This would mean that += would be 
*purely* syntactic sugar.

Decision III would be a little more difficult to implement and is of 
less obvious merit, I only mention it as a possibility.

Comments?

Mar 27 2006

Charles <noone nowhere.com> writes:

Wow looks good ... too good.  How could this have gone 
un[noticed|implemented] in the  last 20 years ?  I'm anxious to here 
Walters take.

1. yes, 2. yes, 3. over my head :).

Charlie



Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.
 
 For example,
 X = A - ((B*C) + D)* E;
 
 becomes:
 T1 = B * C;
 T2 = T1 + D;
 T3 = T2 * E;
 T4 = A - T3;
 X = T4;
 Four objects were created, whereas only one was strictly required.
 In C++, there are libraries like Blitz++ which use complicated 
 expression templates in order to avoid these creating these temporaries, 
 and provide performance comparable with FORTRAN. I think D can do much 
 better...
 Note that temporaries are avoided when using the opXXXAssign() operators 
 like +=.
 
 ===========
   Proposal
 ===========
 (1) Allow the compiler to assume that b = b + c  can be replaced with b 
 += c. (In C++, operator + and operator += are just symbols, the compiler 
 doesn't know that there is any relationship between them).
 In the example above, this would allow the compiler to generate:
 T1 = B * C;
 T1 += D;
 T1 *= E;
 
 and we have eliminated two of the three temporaries.
 (2). Fill in the gaps in the operator overloading table by introducing
 opAddAssign_r, opSubAssign_r, etc.
 
 Just as A.opSubAssign(B)
 is the operation  A -= B  or equivalently  A = A - B, similarly
 
 A.opSubAssign_r(B)
 would mean
 A = B - A.
 and would only occur when temporaries are generated in expressions. Like 
 -=, it's an operation which can frequently be performed very 
 efficiently, but at present the language has no way of expressing it.
 
 Our original example then becomes:
 
 T1 = B.opMul(C);
 T1.opAddAssign(D);
 T1.opMulAssign(E);
 T1.opSubAssign_r(A);
 X = T1;
 ... and all the useless temporaries are gone!
 
 More formally, when the expression tree for an expression is generated:
 With a binary operator XXX, operating on left & right nodes:
 
 if (the left node is *not* an original leaf node) {
    // the left node is a temporary, does not need to be preserved.
    // we don't care if the right node is a temporary or not
    look for opXXXAssign().
 } else if (the the right node is not an original leaf node) {
    // the right node is a temporary
    look for opXXXAssign_r()
 } else {
   // both left and right nodes are leaf nodes, we have to
   // create a temporary
    look for opXXX(), just as it does now.
 }
 
 These rules also cope with the situation where temporaries are required:
 eg
 X = (A*B) + (C*D);
 becomes
 T1 = A*B;
 T2 = C*D;
 T1 += T2;
 X = T1;
 
 If this were implemented, it would permanently eradicate (for D) the 
 most significant advantage which Fortran has managed to retain over 
 object-oriented languages. And I really don't think it would be 
 difficult to implement, or have negative side-effects.
 
 There are a couple of decisions to be made:
 (I) should the compiler use opAdd() and generate a temporary, if 
 opAddAssign_r() doesn't exist, to preserve existing behaviour? I think 
 the answer to this is YES.
 (II) should the compiler use opAdd() and generate a temporary, if 
 oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm 
 inclined to answer YES.
 (III) If the code includes +=, and there is an opAdd() but no 
 opAddAssign(), should the compiler accept this, and just generate an 
 opAdd() followed by an assignment?? This would mean that opAdd() would 
 generate the += operation as well as +, while opAddAssign() would be a 
 performance enhancement. (It would still be possible to have 
 opAddAssign() without opAdd(), to have += but not +, but it would not be 
 possible to have + without +=). This would mean that += would be 
 *purely* syntactic sugar.
 
 Decision III would be a little more difficult to implement and is of 
 less obvious merit, I only mention it as a possibility.
 
 Comments?

Mar 27 2006

Sean Kelly <sean f4.ca> writes:

Charles wrote:
 Wow looks good ... too good.  How could this have gone 
 un[noticed|implemented] in the  last 20 years ?  I'm anxious to here 
 Walters take.

I have a feeling this may be a lot more difficult in C++ because of 
ADL--there are simply a lot more functions to be evaluated when building 
expression trees.  Also, the standard doesn't seem to consider things 
from a compiler writer's perspective, which this three-value code 
optimization requires.


Sean

Mar 27 2006

kris <foo bar.com> writes:

Sean Kelly wrote:
 Charles wrote:
 
 Wow looks good ... too good.  How could this have gone 
 un[noticed|implemented] in the  last 20 years ?  I'm anxious to here 
 Walters take.

 
 
 I have a feeling this may be a lot more difficult in C++ because of 
 ADL--there are simply a lot more functions to be evaluated when building 
 expression trees.  Also, the standard doesn't seem to consider things 
 from a compiler writer's perspective, which this three-value code 
 optimization requires.
 
 
 Sean

ADL? Does that stand for Attention Deficit Level?

Mar 27 2006

pragma <pragma_member pathlink.com> writes:

In article <e09v1r$2tb0$1 digitaldaemon.com>, kris says...
Sean Kelly wrote:
 Charles wrote:
 
 Wow looks good ... too good.  How could this have gone 
 un[noticed|implemented] in the  last 20 years ?  I'm anxious to here 
 Walters take.

 
 
 I have a feeling this may be a lot more difficult in C++ because of 
 ADL--there are simply a lot more functions to be evaluated when building 
 expression trees.  Also, the standard doesn't seem to consider things 
 from a compiler writer's perspective, which this three-value code 
 optimization requires.
 
 
 Sean

ADL? Does that stand for Attention Deficit Level?

Sorry, I missed this entire thread.  What's this about now? 

<g>

- EricAnderton at yahoo

Mar 27 2006

Sean Kelly <sean f4.ca> writes:

kris wrote:
 Sean Kelly wrote:
 Charles wrote:

 Wow looks good ... too good.  How could this have gone 
 un[noticed|implemented] in the  last 20 years ?  I'm anxious to here 
 Walters take.


 I have a feeling this may be a lot more difficult in C++ because of 
 ADL--there are simply a lot more functions to be evaluated when 
 building expression trees.  Also, the standard doesn't seem to 
 consider things from a compiler writer's perspective, which this 
 three-value code optimization requires.

 
 ADL? Does that stand for Attention Deficit Level?

Argument Dependent Lookup.  ie. the complex overload resolution 
semantics in C++.  Another potential issue is the lack of "_r" functions 
in C++, since while free functions can do quite a bit they must either 
use temporaries, be friend functions with very odd semantics, or do 
something akin to expression templates.


Sean

Mar 27 2006

Charles <noone nowhere.com> writes:

 Argument Dependent Lookup.  ie. the complex overload resolution
 semantics in C++.  Another potential issue is the lack of "_r" functions
 in C++, since while free functions can do quite a bit they must either
 use temporaries, be friend functions with very odd semantics, or do
 something akin to expression templates.

Ahh , I see.  Well I think this will be huge for D, great idea Don!


Sean Kelly wrote:
 kris wrote:
 
 Sean Kelly wrote:

 Charles wrote:

 Wow looks good ... too good.  How could this have gone 
 un[noticed|implemented] in the  last 20 years ?  I'm anxious to here 
 Walters take.



 I have a feeling this may be a lot more difficult in C++ because of 
 ADL--there are simply a lot more functions to be evaluated when 
 building expression trees.  Also, the standard doesn't seem to 
 consider things from a compiler writer's perspective, which this 
 three-value code optimization requires.


 ADL? Does that stand for Attention Deficit Level?

 
 
 Argument Dependent Lookup.  ie. the complex overload resolution 
 semantics in C++.  Another potential issue is the lack of "_r" functions 
 in C++, since while free functions can do quite a bit they must either 
 use temporaries, be friend functions with very odd semantics, or do 
 something akin to expression templates.
 
 
 Sean

Mar 27 2006

"Walter Bright" <newshound digitalmars.com> writes:

I think it's a great idea.

Mar 27 2006

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 27 Mar 2006 10:29:13 +0200, Don Clugston <dac nospam.com.au> wrote:
 Comments?

I was wondering the exact same thing recently:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/35382

Do you think we gain anything by using the lhs variable where possible.  
i.e.

Instead of:

T1 = B.opMul(C);
T1.opAddAssign(D);
T1.opMulAssign(E);
T1.opSubAssign_r(A);
X = T1;

We have:

X = B;
X.opMulAssign(C);
X.opAddAssign(D);
X.opMulAssign(E);
X.opSubAssign_r(A);

It seems to me that this results in 1 less temporary and therefore 1 less  
assignment.

Of course, it doesn't help/work in cases where there is no existing lhs,  
i.e.

   foo(A - ((B*C) + D)* E);

Regan

Mar 27 2006

Sean Kelly <sean f4.ca> writes:

Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.

...
 If this were implemented, it would permanently eradicate (for D) the 
 most significant advantage which Fortran has managed to retain over 
 object-oriented languages. And I really don't think it would be 
 difficult to implement, or have negative side-effects.

Very nice.  And much better than expression templates.  I'm all for it.

 There are a couple of decisions to be made:
 (I) should the compiler use opAdd() and generate a temporary, if 
 opAddAssign_r() doesn't exist, to preserve existing behaviour? I think 
 the answer to this is YES.

Yes.

 (II) should the compiler use opAdd() and generate a temporary, if 
 oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm 
 inclined to answer YES.

Yes.

 (III) If the code includes +=, and there is an opAdd() but no 
 opAddAssign(), should the compiler accept this, and just generate an 
 opAdd() followed by an assignment?? This would mean that opAdd() would 
 generate the += operation as well as +, while opAddAssign() would be a 
 performance enhancement. (It would still be possible to have 
 opAddAssign() without opAdd(), to have += but not +, but it would not be 
 possible to have + without +=). This would mean that += would be 
 *purely* syntactic sugar.
 
 Decision III would be a little more difficult to implement and is of 
 less obvious merit, I only mention it as a possibility.

I'd say no to this initially, and see how things sort out.  It may be 
that this turns out to be desirable and it may not.  But theoretically, 
I'd prefer to know when an OpAssign fn is required that I haven't 
provided than to have the compiler silently accept the syntax anyway.


Sean

Mar 27 2006

James Dunne <james.jdunne gmail.com> writes:

Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.
 
 For example,
 X = A - ((B*C) + D)* E;
 
 becomes:
 T1 = B * C;
 T2 = T1 + D;
 T3 = T2 * E;
 T4 = A - T3;
 X = T4;
 Four objects were created, whereas only one was strictly required.
 In C++, there are libraries like Blitz++ which use complicated 
 expression templates in order to avoid these creating these temporaries, 
 and provide performance comparable with FORTRAN. I think D can do much 
 better...
 Note that temporaries are avoided when using the opXXXAssign() operators 
 like +=.
 
 ===========
   Proposal
 ===========
 (1) Allow the compiler to assume that b = b + c  can be replaced with b 
 += c. (In C++, operator + and operator += are just symbols, the compiler 
 doesn't know that there is any relationship between them).
 In the example above, this would allow the compiler to generate:
 T1 = B * C;
 T1 += D;
 T1 *= E;
 
 and we have eliminated two of the three temporaries.
 (2). Fill in the gaps in the operator overloading table by introducing
 opAddAssign_r, opSubAssign_r, etc.
 
 Just as A.opSubAssign(B)
 is the operation  A -= B  or equivalently  A = A - B, similarly
 
 A.opSubAssign_r(B)
 would mean
 A = B - A.
 and would only occur when temporaries are generated in expressions. Like 
 -=, it's an operation which can frequently be performed very 
 efficiently, but at present the language has no way of expressing it.
 
 Our original example then becomes:
 
 T1 = B.opMul(C);
 T1.opAddAssign(D);
 T1.opMulAssign(E);
 T1.opSubAssign_r(A);
 X = T1;
 .... and all the useless temporaries are gone!
 
 More formally, when the expression tree for an expression is generated:
 With a binary operator XXX, operating on left & right nodes:
 
 if (the left node is *not* an original leaf node) {
    // the left node is a temporary, does not need to be preserved.
    // we don't care if the right node is a temporary or not
    look for opXXXAssign().
 } else if (the the right node is not an original leaf node) {
    // the right node is a temporary
    look for opXXXAssign_r()
 } else {
   // both left and right nodes are leaf nodes, we have to
   // create a temporary
    look for opXXX(), just as it does now.
 }
 
 These rules also cope with the situation where temporaries are required:
 eg
 X = (A*B) + (C*D);
 becomes
 T1 = A*B;
 T2 = C*D;
 T1 += T2;
 X = T1;
 
 If this were implemented, it would permanently eradicate (for D) the 
 most significant advantage which Fortran has managed to retain over 
 object-oriented languages. And I really don't think it would be 
 difficult to implement, or have negative side-effects.
 
 There are a couple of decisions to be made:
 (I) should the compiler use opAdd() and generate a temporary, if 
 opAddAssign_r() doesn't exist, to preserve existing behaviour? I think 
 the answer to this is YES.
 (II) should the compiler use opAdd() and generate a temporary, if 
 oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm 
 inclined to answer YES.
 (III) If the code includes +=, and there is an opAdd() but no 
 opAddAssign(), should the compiler accept this, and just generate an 
 opAdd() followed by an assignment?? This would mean that opAdd() would 
 generate the += operation as well as +, while opAddAssign() would be a 
 performance enhancement. (It would still be possible to have 
 opAddAssign() without opAdd(), to have += but not +, but it would not be 
 possible to have + without +=). This would mean that += would be 
 *purely* syntactic sugar.
 
 Decision III would be a little more difficult to implement and is of 
 less obvious merit, I only mention it as a possibility.
 
 Comments?

I guess I'll be the "Negative Nancy" here for purposes of strengthening 
your proposal...

While being well laid out and well thought through, this proposal still 
screams to me that it's concentrating on the mathematical problem 
domain.  This is fine for assuming that classes implementing operators 
will be mimicing real-world mathematical entities, such as vectors, 
matricies, etc.  But will this affect other problem domains adversely?

I usually like to come from the "everything explicit" angle and don't 
want the compiler making decisions on my behalf; especially when I'm not 
aware of them.  My suggestion would be to add a keyword in the operator 
definition (or class definition) to indicate that you want this sort of 
operator overloading behavior, such that one could leave it off if the 
default behavior is desired for other such cases.

In what specific problem domain are you experiencing issues with the 
current operator overloading syntax/semantics?  Or is it just that you 
feel that the current syntax/semantics are not quite fully developed?

And last but not least, another problem is in the order of evaluation 
for the operator overload calls.  What do you propose for this?  I think 
in order for this _not_ to matter, you'd have to guarantee that the 
classes themselves are self-contained and would have no references to 
(or have any effect on) the other classes involved in the expression 
statement.

This brings me to another related issue: these temporaries are going to 
be allocated on the GC heap no matter what, correct?  What if a silent 
out-of-memory exception was thrown from a line of code appearing to have 
no effect on memory allocation whatsoever?

There's basically no control over the number of temporaries that could 
be generated.  Also, there'd be "no going back" from a GC to a manual 
allocation strategy (i.e. memory pools) because you've effectively lost 
handles to those blocks of memory for the temporaries.  One could use 
custom allocators on the class for this purpose, but that would have an 
adverse effect on normal usage of the class.  These findings lead me to 
believe that classes which overload operators should have the 
requirement of being 'auto' (as in RAII or RR).

-- 
Regards,
James Dunne

Mar 27 2006

Don Clugston <dac nospam.com.au> writes:

James Dunne wrote:
 
 While being well laid out and well thought through, this proposal still 
 screams to me that it's concentrating on the mathematical problem 
 domain.  This is fine for assuming that classes implementing operators 
 will be mimicing real-world mathematical entities, such as vectors, 
 matricies, etc.  But will this affect other problem domains adversely?

I don't know - but I don't think so. My feeling is that if it strays too 
far from a mathematical domain, it probably shouldn't be using 
overloading of the arithmetical operators. In particular, I think that 
it's very hard to justify a+=b being different to a=a+b.

 I usually like to come from the "everything explicit" angle and don't 
 want the compiler making decisions on my behalf; especially when I'm not 
 aware of them. 

I'll take this as another vote against (III).

My suggestion would be to add a keyword in the operator
 definition (or class definition) to indicate that you want this sort of 
 operator overloading behavior, such that one could leave it off if the 
 default behavior is desired for other such cases.


 In what specific problem domain are you experiencing issues with the 
 current operator overloading syntax/semantics?  Or is it just that you 
 feel that the current syntax/semantics are not quite fully developed?

I was specifically interested in linear algebra. In thinking about 
Norbet's matrix proposal, I was thinking that it doesn't make sense to 
work on the syntax when there's an inherent inefficiency underneath.
Ultimately, operator overloading is just syntactic sugar for function 
calls. The problem with the C++ approach is that it only provides 
function calls for two of the three situations. Consequently, you suffer 
an unnecessary performance hit every time you use operator +.


 And last but not least, another problem is in the order of evaluation 
 for the operator overload calls.  What do you propose for this?  I think 
 in order for this _not_ to matter, you'd have to guarantee that the 
 classes themselves are self-contained and would have no references to 
 (or have any effect on) the other classes involved in the expression 
 statement.

True, but I think this already applies to operator +.

 This brings me to another related issue: these temporaries are going to 
 be allocated on the GC heap no matter what, correct?  What if a silent 
 out-of-memory exception was thrown from a line of code appearing to have 
 no effect on memory allocation whatsoever?

Again, this already applies to opAdd. The only thing this proposal 
changes is that avoidable temporaries are not created. Unavoidable 
temporaries are unchanged.
You raise a good point, though -- unavoidable temporaries could be 
treated better (eg with memory pools), this proposal does not let you 
distinguish between "new temporary = a+b " and "new result = a+b", the 
former could be stored in a memory pool. I think that's a minor issue, 
though.

 There's basically no control over the number of temporaries that could 
 be generated.  Also, there'd be "no going back" from a GC to a manual 
 allocation strategy (i.e. memory pools) because you've effectively lost 
 handles to those blocks of memory for the temporaries.  One could use 
 custom allocators on the class for this purpose, but that would have an 
 adverse effect on normal usage of the class.  These findings lead me to 
 believe that classes which overload operators should have the 
 requirement of being 'auto' (as in RAII or RR).

Mar 27 2006

Sean Kelly <sean f4.ca> writes:

Don Clugston wrote:
 You raise a good point, though -- unavoidable temporaries could be 
 treated better (eg with memory pools), this proposal does not let you 
 distinguish between "new temporary = a+b " and "new result = a+b", the 
 former could be stored in a memory pool. I think that's a minor issue, 
 though.

I think that's something that could be added as a Quality of 
Implementation issue without violating the rules you've outlined.  ie. 
it doesn't matter where the memory comes from.  Temporaries could even 
be allocated using alloca in some cases.

As for other issues with the behavior--so long as this is spelled out in 
the spec then I don't see any problems with it.  As you've said, it's 
what's actually happening behind the scenes anyway, but defining it this 
way makes for the most efficient code generation possible, and this is a 
fantastic guarantee to have in the case of large objects.


Sean

Mar 28 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

I don't see any obvious reasons against this proposal, but one should
not overestimate it!

It is true that it allows a number of optimizations and helps avoiding
some unnecessary temporaries, but it is not a replacement for expression
templates or vectorized expressions (aka array expressions).

Imagine A,B,C,D and X being arrays of the same size and consider the
last example in the proposal:

 X = (A*B) + (C*D);
 becomes
 T1 = A*B;
 T2 = C*D;
 T1 += T2;
 X = T1;

Fortran90 could translate the original expression into something like
	for(int i=0;i<N;i++)
		X[i] = (A[i]*B[i]) + (C[i]*D[i]);
which not only eliminates *all* temporaries, but does something more:
handle all calculations in one loop, allowing the memory to be read
cache friendly and all the calculations being done in registers.

C++ expression templates as used in blitz++ et al allow the same kind of
optimizations. Array expressions in D could do the same thing. The
operator optimization cannot handle this optimization.

So, as it stands I have no objections against the proposal, but it
should *NOT* be used as excuse against expression templates or array
expressions in the long term.

Greetings,
Norbert

Mar 28 2006

Don Clugston <dac nospam.com.au> writes:

Norbert Nemec wrote:
 I don't see any obvious reasons against this proposal, but one should
 not overestimate it!
 
 It is true that it allows a number of optimizations and helps avoiding
 some unnecessary temporaries, but it is not a replacement for expression
 templates or vectorized expressions (aka array expressions).
 
 Imagine A,B,C,D and X being arrays of the same size and consider the
 last example in the proposal:
 
 X = (A*B) + (C*D);
 becomes
 T1 = A*B;
 T2 = C*D;
 T1 += T2;
 X = T1;

 
 Fortran90 could translate the original expression into something like
 	for(int i=0;i<N;i++)
 		X[i] = (A[i]*B[i]) + (C[i]*D[i]);
 which not only eliminates *all* temporaries, but does something more:
 handle all calculations in one loop, allowing the memory to be read
 cache friendly and all the calculations being done in registers.

Obviously not with real matrices (C*D is not a pointwise operation), but 
  point taken. (BTW, the temporaries are still there, they're just in 
registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of 
all unnecessary temporaries, the problem is that there's no vectorisation).

 C++ expression templates as used in blitz++ et al allow the same kind of
 optimizations. Array expressions in D could do the same thing. The
 operator optimization cannot handle this optimization.
 
 So, as it stands I have no objections against the proposal, but it
 should *NOT* be used as excuse against expression templates or array
 expressions in the long term.

I completely agree. I see this as fixing the general case, but it does 
nothing for vectorisation. I suspect that array expressions are somewhat 
special as regards expressions, because of the vectorisation 
possibility. If we had array expressions, this might eliminate the 
necessity for expression templates. Do you think that's right?

Thanks for putting this into perspective.

 Greetings,
 Norbert

Mar 28 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

Don Clugston wrote:
 Obviously not with real matrices (C*D is not a pointwise operation), but
  point taken. (BTW, the temporaries are still there, they're just in
 registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of
 all unnecessary temporaries, the problem is that there's no vectorisation).

Point accepted. For matrices, the issues are much more complicated, but
still there is quite a bit of optimization possible when vectorization
is taken into accound. p.e. the expression A*B+C can be done very
efficiently when done in one shot. (There even are BLAS routines for
this kind of combined operations, which are very common in many fields
of application.)

 I completely agree. I see this as fixing the general case, but it does
 nothing for vectorisation. I suspect that array expressions are somewhat
 special as regards expressions, because of the vectorisation
 possibility. If we had array expressions, this might eliminate the
 necessity for expression templates. Do you think that's right?

Those have their right of existance and should be supported by the
language: array expressions are more comfortable to use and to optimize
than a corresponding ET-library, but expression templates can be used
for a much larger field of applications than just vectorized
expressions. Just consider the linear algebra example above, where an
expression template library might automatically optimize A*B+C into a
single BLAS function call.

Mar 28 2006

James Dunne <james.jdunne gmail.com> writes:

Norbert Nemec wrote:
 Don Clugston wrote:
 
Obviously not with real matrices (C*D is not a pointwise operation), but
 point taken. (BTW, the temporaries are still there, they're just in
registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of
all unnecessary temporaries, the problem is that there's no vectorisation).

 
 
 Point accepted. For matrices, the issues are much more complicated, but
 still there is quite a bit of optimization possible when vectorization
 is taken into accound. p.e. the expression A*B+C can be done very
 efficiently when done in one shot. (There even are BLAS routines for
 this kind of combined operations, which are very common in many fields
 of application.)
 
 
I completely agree. I see this as fixing the general case, but it does
nothing for vectorisation. I suspect that array expressions are somewhat
special as regards expressions, because of the vectorisation
possibility. If we had array expressions, this might eliminate the
necessity for expression templates. Do you think that's right?

 
 
 Those have their right of existance and should be supported by the
 language: array expressions are more comfortable to use and to optimize
 than a corresponding ET-library, but expression templates can be used
 for a much larger field of applications than just vectorized
 expressions. Just consider the linear algebra example above, where an
 expression template library might automatically optimize A*B+C into a
 single BLAS function call.

If possible, can someone lay out a clear definition of both "array 
expressions" and "expression templates"?  I'd really like to fully 
understand what's possible in this area for my own research.

Thanks,

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O 
M--  V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e 
h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne

Mar 28 2006

xs0 <xs0 xs0.com> writes:

James Dunne wrote:
 If possible, can someone lay out a clear definition of both "array 
 expressions" and "expression templates"?  I'd really like to fully 
 understand what's possible in this area for my own research.
 
 Thanks,
 

afaik, array expressions are just expressions which get evaluated 
element-wise over whole arrays:

a[] = b[] + c[]; // must be same length

is the same as

for (int i=0; i<a.length; i++)
     a[i] = b[i] + c[i];

The advantage of having them instead of doing for loops (in addition to 
aesthetics) is that the compiler can optimize the code much better (for 
example, by doing vectorization == AltiVec/MMX/SSE), because it clearly 
knows what you're doing - with a for loop, it's just a bunch of 
single-element operations.


Expression templates, otoh, are a somewhat complex template technique, 
which allows efficient evaluation of expressions over arbitrary types. 
Instead of evaluating the expression one operation at a time:

a = b + c * d

usually becomes

_t1 = c.opMul(d);
_t2 = b.opAdd(_t1);
a = _t2

the expressions first evaluate to template instances, which can then be 
inlined and optimized by the compiler. That obviously results in faster 
execution. The above example would become something like:

auto expr=Sum!(b, Product!(c, d));
a.length=expr.length;
for (int i=0; i<a.length; i++)
     a[i]=expr.evaluate(i); // hopefully inlines to b[i]+c[i]*d[i]

http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html

One problem in doing them in D is that you can't overload the = 
operator, so the best one can hope for is

(b+c*d).assignTo(a);
// or
a = (b+c*d).eval();

Another problem is that expression templates rely heavily on implicit 
instantiation, which is currently quite basic in D (but getting better).


Hope that helped :)


xs0

Mar 28 2006

James Dunne <james.jdunne gmail.com> writes:

xs0 wrote:
 James Dunne wrote:
 
 If possible, can someone lay out a clear definition of both "array 
 expressions" and "expression templates"?  I'd really like to fully 
 understand what's possible in this area for my own research.

 Thanks,

 
 afaik, array expressions are just expressions which get evaluated 
 element-wise over whole arrays:
 
 a[] = b[] + c[]; // must be same length
 
 is the same as
 
 for (int i=0; i<a.length; i++)
     a[i] = b[i] + c[i];
 
 The advantage of having them instead of doing for loops (in addition to 
 aesthetics) is that the compiler can optimize the code much better (for 
 example, by doing vectorization == AltiVec/MMX/SSE), because it clearly 
 knows what you're doing - with a for loop, it's just a bunch of 
 single-element operations.
 
 
 Expression templates, otoh, are a somewhat complex template technique, 
 which allows efficient evaluation of expressions over arbitrary types. 
 Instead of evaluating the expression one operation at a time:
 
 a = b + c * d
 
 usually becomes
 
 _t1 = c.opMul(d);
 _t2 = b.opAdd(_t1);
 a = _t2
 
 the expressions first evaluate to template instances, which can then be 
 inlined and optimized by the compiler. That obviously results in faster 
 execution. The above example would become something like:
 
 auto expr=Sum!(b, Product!(c, d));
 a.length=expr.length;
 for (int i=0; i<a.length; i++)
     a[i]=expr.evaluate(i); // hopefully inlines to b[i]+c[i]*d[i]
 
 http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
 
 One problem in doing them in D is that you can't overload the = 
 operator, so the best one can hope for is
 
 (b+c*d).assignTo(a);
 // or
 a = (b+c*d).eval();
 
 Another problem is that expression templates rely heavily on implicit 
 instantiation, which is currently quite basic in D (but getting better).
 
 
 Hope that helped :)
 
 
 xs0

Yes, thank you very much!

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O 
M--  V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e 
h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne

Mar 28 2006

Sean Kelly <sean f4.ca> writes:

Norbert Nemec wrote:
 Don Clugston wrote:
 Obviously not with real matrices (C*D is not a pointwise operation), but
  point taken. (BTW, the temporaries are still there, they're just in
 registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of
 all unnecessary temporaries, the problem is that there's no vectorisation).

 
 Point accepted. For matrices, the issues are much more complicated, but
 still there is quite a bit of optimization possible when vectorization
 is taken into accound. p.e. the expression A*B+C can be done very
 efficiently when done in one shot. (There even are BLAS routines for
 this kind of combined operations, which are very common in many fields
 of application.)

It almost seems like this could be handled via a special opIndex 
function: opIndexCalc or some such.  If the method exists for all 
involved types, then:

A = B + C

could be translated to:

for( size_t i = 0; i < A.length; ++i )
     A[i] = B[i] + C[i];

where the subscripting calls opIndexCalc instead of the standard 
opIndex.  But this leaves out array length checking, and simply throwing 
an IndexOutOfBounds exception if something goes wrong would leave A 
corrupted.  So perhaps some checking would also be required to see if 
opIndexCalc should be called?  The only catch is that this would likely 
need to occur at run-time:

if( A.matches( B ) && A.matches( C ) )
     for( size_t i = 0; i < A.length; ++i )
         A[i] = B[i] + C[i];
else
     A = B + C; // standard method using temporaries

I don't have enough experience to know what might work here, but it 
would be great if an alternative to expression templates could be devised.


Sean

Mar 28 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

Sean Kelly wrote:
 I don't have enough experience to know what might work here, but it
 would be great if an alternative to expression templates could be devised.

I believe people should not be overly afraid of expression templates. In
C++ they are ugly because the whole template system is ugly. In
principle they are a tremendously powerful concept that should
definitely be supported in D as well as possible.

When I say that there should be support for array expressions that does
not rely on expression templates, that is only because I believe that
arrays are crucial enough for the language to justify this special
treatment.

Mar 28 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

I fully support this proposal. It makes sense to place stricter semantic 
requirements on overloaded operators. I can not see any problems. You 
seem to have everything covered. What restrictions should the compiler 
placed on the operator overloading signatures? Should it for instance be 
illegal to define an opAdd with a different return type than opAddAssign?

Don Clugston wrote:

 In C++, there are libraries like Blitz++ which use complicated 
 expression templates in order to avoid these creating these temporaries, 
 and provide performance comparable with FORTRAN. I think D can do much 
 better...

Expression templates would still be useful for other cases though. 
Consider: (A * B) % C.
Here, expression templates could allow evaluating a much more efficient 
modMul(A,B,C). Expression templates could also help writing less complex 
but still efficient code by allowing lazy evaluation.

/Oskar

Mar 28 2006

Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:

Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.
 
 For example,
 X = A - ((B*C) + D)* E;
 
 becomes:
 T1 = B * C;
 T2 = T1 + D;
 T3 = T2 * E;
 T4 = A - T3;
 X = T4;
 Four objects were created, whereas only one was strictly required.

Ok, I'm new to this, so it took me a while to understand the problem. 
Let's see if I got it right: this is actually only a problem when the 
operator methods explicitly instantiate a *class object*, to be used as 
the return of the method, right?



-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Apr 02 2006

Don Clugston <dac nospam.com.au> writes:

Bruno Medeiros wrote:
 Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.

 For example,
 X = A - ((B*C) + D)* E;

 becomes:
 T1 = B * C;
 T2 = T1 + D;
 T3 = T2 * E;
 T4 = A - T3;
 X = T4;
 Four objects were created, whereas only one was strictly required.

 
 Ok, I'm new to this, so it took me a while to understand the problem. 
 Let's see if I got it right: this is actually only a problem when the 
 operator methods explicitly instantiate a *class object*, to be used as 
 the return of the method, right?

Not really, it applies everywhere that you can have overloaded 
operators. The cost of a temporary will be greater with classes, but for 
structs, eliminating temporaries will make it much easier for the 
compiler to optimise.

Apr 03 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Don Clugston wrote:
 Bruno Medeiros wrote:
 Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.

 For example,
 X = A - ((B*C) + D)* E;

 becomes:
 T1 = B * C;
 T2 = T1 + D;
 T3 = T2 * E;
 T4 = A - T3;
 X = T4;
 Four objects were created, whereas only one was strictly required.

 Ok, I'm new to this, so it took me a while to understand the problem. 
 Let's see if I got it right: this is actually only a problem when the 
 operator methods explicitly instantiate a *class object*, to be used 
 as the return of the method, right?

 
 Not really, it applies everywhere that you can have overloaded 
 operators. The cost of a temporary will be greater with classes, but for 
 structs, eliminating temporaries will make it much easier for the 
 compiler to optimise.
 

But with structs (more generally, with stack-based value types), can't 
the compiler already optimize it? In your example, it seems to me that 
the compiler make the code so that it uses only two temporaries:

T1 = B * C;
T2 = T1 + D; // T1 is now free for use
T1 = T2 * E; // T2 is now free for use
T2 = A - T1; // T1 is now free for use
X = T2;

And, if inlining occurs, it can be made to use only one temporary, no?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Apr 04 2006

Don Clugston <dac nospam.com.au> writes:

Bruno Medeiros wrote:
 Don Clugston wrote:
 Bruno Medeiros wrote:
 Don Clugston wrote:
 Background: Operator overloading, in the form it exists in C++ and 
 currently in D, inherently results in sub-optimal code, because it 
 always results in unnecessary temporary objects being created.

 For example,
 X = A - ((B*C) + D)* E;

 becomes:
 T1 = B * C;
 T2 = T1 + D;
 T3 = T2 * E;
 T4 = A - T3;
 X = T4;
 Four objects were created, whereas only one was strictly required.

 Ok, I'm new to this, so it took me a while to understand the problem. 
 Let's see if I got it right: this is actually only a problem when the 
 operator methods explicitly instantiate a *class object*, to be used 
 as the return of the method, right?

 Not really, it applies everywhere that you can have overloaded 
 operators. The cost of a temporary will be greater with classes, but 
 for structs, eliminating temporaries will make it much easier for the 
 compiler to optimise.

 
 But with structs (more generally, with stack-based value types), can't 
 the compiler already optimize it? In your example, it seems to me that 
 the compiler make the code so that it uses only two temporaries:
 
 T1 = B * C;
 T2 = T1 + D; // T1 is now free for use
 T1 = T2 * E; // T2 is now free for use
 T2 = A - T1; // T1 is now free for use
 X = T2;

True, but for objects on the stack, the cost is really just in the 
copying of data, not the memory allocation. T1 and T2 still get 
initialised twice.

 And, if inlining occurs, it can be made to use only one temporary, no?

Indeed, the compiler might optimise it, if the structs are small enough. 
Which is why I said that it makes it "much easier for the compiler to 
optimise". It might be able to do it without this help, but my 
experience with C++ has been that inlining is unreliable.

Apr 05 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Proposal: Operator overloading without temporaries