www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Can we fix reverse operator overloading (opSub_r et. al.)?

reply Don <nospam nospam.com> writes:
Currently, template versions of opXXX_r appear to be useless.
Consider:

struct A
{
    int opSub(T)(T x) { return 1; }
    int opSub_r(T)(T x) { return -1; }
}

A a;
int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ???

This is annoying and rather silly. I can't imagine a case where you 
would template opXXX_r, without also templating opXXX. And by 
definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r 
has a broken design in both D1 and D2.

We could fix it by adding a rule to operator overloading:

[New rule] 1. If a and b have the same type, the expression is written 
as a.opfunc(b).
[Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then 
overloading is applied across all of them and the best match is used.
...

Patching this is pretty easy. For example, in DMD2.031, opover.c, line 
263, simply add:

  	if ((t1 == t2)) {
		//  x OP x can only be x.opOP(x), not x.opOP_r(x)
		s_r = NULL;
	}
Jul 07 2009
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Don wrote:
 Currently, template versions of opXXX_r appear to be useless.
 Consider:
 
 struct A
 {
    int opSub(T)(T x) { return 1; }
    int opSub_r(T)(T x) { return -1; }
 }
 
 A a;
 int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ???
 
 This is annoying and rather silly. I can't imagine a case where you 
 would template opXXX_r, without also templating opXXX. And by 
 definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r 
 has a broken design in both D1 and D2.
 
 We could fix it by adding a rule to operator overloading:
 
 [New rule] 1. If a and b have the same type, the expression is written 
 as a.opfunc(b).
 [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then 
 overloading is applied across all of them and the best match is used.
 ...
 
 Patching this is pretty easy. For example, in DMD2.031, opover.c, line 
 263, simply add:
 
      if ((t1 == t2)) {
         //  x OP x can only be x.opOP(x), not x.opOP_r(x)
         s_r = NULL;
     }

I recently ran into the exact same issue, so I agree completely. -Lars
Jul 07 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Currently, template versions of opXXX_r appear to be useless.
 Consider:
 
 struct A
 {
    int opSub(T)(T x) { return 1; }
    int opSub_r(T)(T x) { return -1; }
 }
 
 A a;
 int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ???
 
 This is annoying and rather silly. I can't imagine a case where you 
 would template opXXX_r, without also templating opXXX. And by 
 definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r 
 has a broken design in both D1 and D2.
 
 We could fix it by adding a rule to operator overloading:
 
 [New rule] 1. If a and b have the same type, the expression is written 
 as a.opfunc(b).
 [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then 
 overloading is applied across all of them and the best match is used.
 ...
 
 Patching this is pretty easy. For example, in DMD2.031, opover.c, line 
 263, simply add:
 
      if ((t1 == t2)) {
         //  x OP x can only be x.opOP(x), not x.opOP_r(x)
         s_r = NULL;
     }

Good point. This also reminds me that the entire operator overloading feature must be thrown away and redesigned. Andrei
Jul 07 2009
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Andrei Alexandrescu wrote:
 Don wrote:
 Currently, template versions of opXXX_r appear to be useless.
 Consider:

 struct A
 {
    int opSub(T)(T x) { return 1; }
    int opSub_r(T)(T x) { return -1; }
 }

 A a;
 int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ???

 This is annoying and rather silly. I can't imagine a case where you 
 would template opXXX_r, without also templating opXXX. And by 
 definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r 
 has a broken design in both D1 and D2.

 We could fix it by adding a rule to operator overloading:

 [New rule] 1. If a and b have the same type, the expression is written 
 as a.opfunc(b).
 [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, 
 then overloading is applied across all of them and the best match is 
 used.
 ...

 Patching this is pretty easy. For example, in DMD2.031, opover.c, line 
 263, simply add:

      if ((t1 == t2)) {
         //  x OP x can only be x.opOP(x), not x.opOP_r(x)
         s_r = NULL;
     }

Good point. This also reminds me that the entire operator overloading feature must be thrown away and redesigned. Andrei

I assume that means you haven't written the "operator overloading" chapter of your book yet? ;) -Lars
Jul 07 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading 
 feature must be thrown away and redesigned.

:-(
Jul 09 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading feature must 
 be thrown away and redesigned.

:-(

It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. Andrei
Jul 09 2009
parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading feature 
 must be thrown away and redesigned.

:-(

It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. Andrei

Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations. Despite that, I'd still like to see this patch (or something like it) especially in DMD1 so that we can have a _reasonable_ syntax ASAP.
Jul 10 2009
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading feature 
 must be thrown away and redesigned.

:-(

It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. Andrei

Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.

That is true. There is, for instance, a good reason why the basic BLAS matrix multiplication routine calculates a A B + b C (a,b: scalars; A,B,C: matrices) instead of just AB. Would/could one could gain something, performance-wise, by having such "expression overloading" as a built-in feature of the language itself, rather than as a library? BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -Lars
Jul 10 2009
next sibling parent reply Don <nospam nospam.com> writes:
Lars T. Kyllingstad wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading feature 
 must be thrown away and redesigned.

:-(

It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. Andrei

Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.

That is true. There is, for instance, a good reason why the basic BLAS matrix multiplication routine calculates a A B + b C (a,b: scalars; A,B,C: matrices) instead of just AB. Would/could one could gain something, performance-wise, by having such "expression overloading" as a built-in feature of the language itself, rather than as a library? BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -Lars

Curiously, in the DMD front-end, that's almost what happens with array operations. x[] = a*y[] + c*z[]; gets translated into: __somemangledarrayfuncname(x, y, z, a, c); and creates: __somemangledarrayfuncname(T[] p1, T[] p2, T[] p3, T c1, T c2) { for(int p=0; p < p1.length; ++p) { p1[p] = c1*p2[p] + c2*p3[p]; } } And there are many bugs associated with this, since the compiler doesn't distinguish x[] from x properly (where x is a dynamic array); you can get internal compiler errors referring to type mismatches of 'p'. This could, I think, be done better by putting a simple version of BLADE in the compiler runtime. A big issue with matrix overloading is, what do you with dot product? It's just as fundamental as '+' or '*', but it doesn't have an operator symbol. Consider that a 1x5 matrix multiplied by a 5x1 matrix is just a dot product of two vectors, and a smart compiler would be able to recognize that. The other thing that's desperately missing from D is multi-dimensional indexing.
Jul 10 2009
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Don wrote:
 Lars T. Kyllingstad wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading feature 
 must be thrown away and redesigned.

:-(

It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. Andrei

Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.

That is true. There is, for instance, a good reason why the basic BLAS matrix multiplication routine calculates a A B + b C (a,b: scalars; A,B,C: matrices) instead of just AB. Would/could one could gain something, performance-wise, by having such "expression overloading" as a built-in feature of the language itself, rather than as a library? BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -Lars

Curiously, in the DMD front-end, that's almost what happens with array operations. x[] = a*y[] + c*z[]; gets translated into: __somemangledarrayfuncname(x, y, z, a, c); and creates: __somemangledarrayfuncname(T[] p1, T[] p2, T[] p3, T c1, T c2) { for(int p=0; p < p1.length; ++p) { p1[p] = c1*p2[p] + c2*p3[p]; } } And there are many bugs associated with this, since the compiler doesn't distinguish x[] from x properly (where x is a dynamic array); you can get internal compiler errors referring to type mismatches of 'p'. This could, I think, be done better by putting a simple version of BLADE in the compiler runtime. A big issue with matrix overloading is, what do you with dot product? It's just as fundamental as '+' or '*', but it doesn't have an operator symbol. Consider that a 1x5 matrix multiplied by a 5x1 matrix is just a dot product of two vectors, and a smart compiler would be able to recognize that.

There are actually three (four) basic types of vector/matrix multiplication, and the * operator would more or less be fitting for any of them: - element-by-element multiplication, which is what * means now - dot product - matrix multiplication (- cross product ) I wish more operators were easily accessible on keyboards. (Hey, I just found out that "×" is Shift-AltGr-* on my keboard!) Perhaps the modulus operator, %, could be reused for one of them? If you drink a few beers, add some goodwill, and perhaps squint a little, it almost looks like a cross. ;)
 The other thing that's desperately missing from D is multi-dimensional 
 indexing.

Agreed. But with the aforementioned "expression overloading", one could make extremely elegant multidimensional library types... -Lars
Jul 10 2009
next sibling parent =?ISO-8859-15?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: quoted-printable

Robert Jacques wrote:
 On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad=20
 <public kyllingen.nospamnet> wrote:
 There are actually three (four) basic types of vector/matrix=20
 multiplication, and the * operator would more or less be fitting for=20
 any of them:
    - element-by-element multiplication, which is what * means now
    - dot product
    - matrix multiplication
   (- cross product )

Actually, matrix multiplication and the dot product are both cases of=20 the inner product and the cross product is specific to 3D vectors.

And you can add the outer product... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jul 11 2009
prev sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Robert Jacques wrote:
 On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad 
 <public kyllingen.nospamnet> wrote:
 There are actually three (four) basic types of vector/matrix 
 multiplication, and the * operator would more or less be fitting for 
 any of them:
    - element-by-element multiplication, which is what * means now
    - dot product
    - matrix multiplication
   (- cross product )

Actually, matrix multiplication and the dot product are both cases of the inner product and the cross product is specific to 3D vectors.

Actually, the dot product is both a special case of matrix multiplication and an inner product. Matrix multiplication in general is not an inner product, since an inner product always associates two vectors with a scalar. That aside, my point was simply that there are several operations for which one may want to use the '*' operator, and there is only one '*'. :) -Lars
Jul 13 2009
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Lars T. Kyllingstad wrote:
 Robert Jacques wrote:
 On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:
 There are actually three (four) basic types of vector/matrix
 multiplication, and the * operator would more or less be fitting for
 any of them:
    - element-by-element multiplication, which is what * means now
    - dot product
    - matrix multiplication
   (- cross product )

Actually, matrix multiplication and the dot product are both cases of the inner product and the cross product is specific to 3D vectors.

Actually, the dot product is both a special case of matrix multiplication and an inner product. Matrix multiplication in general is not an inner product, since an inner product always associates two vectors with a scalar. That aside, my point was simply that there are several operations for which one may want to use the '*' operator, and there is only one '*'. :) -Lars

You could always convert to using downscode: a /inner/ b a /dot/ b a /cross/ b As evil as it is... there's something strangely alluring about being able to define your own infix operators...
Jul 13 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 The other thing that's desperately missing from D is multi-dimensional 
 indexing.

What are the limitations of multiple-argument []? Andrei
Jul 10 2009
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Bill Baxter wrote:
 On Fri, Jul 10, 2009 at 8:15 AM, Andrei
 Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:
 Don wrote:
 The other thing that's desperately missing from D is multi-dimensional
 indexing.

Andrei

I don't know what Don had in mind, but multi-dimensional slice is ugly. should be A[ a..b, c..d, e..f ] but currently you have to do something like A[ [a,c,e]..[b,d,f] ]. Or A[a..b][c..d][e..f] and bend over backwards to avoid creating too many temporary proxy objects. Also you'd like to be able to mix and match slices and indices like A[ a..b, 3, e..f ]. Also a basic built in multi-dim array would be nice. I think most of the time when people make a new T(10,20) they don't really want 10 arrays of length 20, each of which can be individually resized (and must be if more capacity is needed). They want one array with size 10 x 20. --bb

Either, D needs built-in multidimensional arrays, or they must be implemented in Phobos. To support slicing of built-in multidim arrays, I think it is necessary to make slices a different type than arrays, because they would need to have a stride in order to slice across rows. -Lars
Jul 13 2009
prev sibling parent Don <nospam nospam.com> writes:
Bill Baxter wrote:
 On Fri, Jul 10, 2009 at 8:15 AM, Andrei
 Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:
 Don wrote:
 The other thing that's desperately missing from D is multi-dimensional
 indexing.

Andrei

I don't know what Don had in mind, but multi-dimensional slice is ugly. should be A[ a..b, c..d, e..f ] but currently you have to do something like A[ [a,c,e]..[b,d,f] ]. Or A[a..b][c..d][e..f] and bend over backwards to avoid creating too many temporary proxy objects.

And I don't think the latter option is necessarily correct. If A is an array type, A[a..b][c..d] currently means assert(d-c <= b-a), A[a+c..a+(d-c)], with $=b while evaluating d and c. Also you'd like to be able to mix and match slices and
 indices like  A[ a..b, 3, e..f ].

This one, in particular, is what I had in mind.
 Also a basic built in multi-dim array would be nice.  I think most of
 the time when people make a new T(10,20) they don't really want 10
 arrays of length 20, each of which can be individually resized (and
 must be if more capacity is needed).   They want one array with size
 10 x 20.

It's a bit trickier to provide that (other than just providing an operator-overloaded opIndex) since you'd then need to limit slicing on it, or else provide strided slices as a built-in type, as well. Maybe that's better as a library type. Dunno. Definitely we need a more powerful opIndex().
Jul 13 2009
prev sibling parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

Lars T. Kyllingstad wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 This also reminds me that the entire operator overloading feature=20
 must be thrown away and redesigned.

:-(

It's run its course like an old BMW. We need to do new things, and=20 bolting them on what we have won't work. Andrei

Indeed, I even think that the concept of operator overloading is wrong=


 In C++, operator overloading was just for syntax sugar. That's wrong: =


 pretty much everything that you want overloaded operators for, is=20
 performance-critical. And that implies you need to deal on the level=20
 of complete expressions, not individual operations.

=20 That is true. There is, for instance, a good reason why the basic BLAS =

 matrix multiplication routine calculates
=20
   a A B + b C        (a,b: scalars; A,B,C: matrices)
=20
 instead of just AB.
=20
 Would/could one could gain something, performance-wise, by having such =

 "expression overloading" as a built-in feature of the language itself, =

 rather than as a library?
=20
 BLADE has already shown that it is possible to do stuff like this in a =

 library, but I think it goes without saying that if it was built into=20
 the language the syntax could be made considerably nicer. Compare:
=20
   auto m =3D MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx);=

=20
   auto m =3D a*A*B + b*C;
=20
 If D could do this, I think it would become the next FORTRAN. :)
=20
 -Lars

Actually, this has already been done in C++:=20 http://flens.sourceforge.net/ It should be possible to port it to D... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jul 11 2009
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
"Jérôme M. Berger" wrote:
(...)
 
 BLADE has already shown that it is possible to do stuff like this in a
 library, but I think it goes without saying that if it was built into
 the language the syntax could be made considerably nicer. Compare:
 
   auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx);
 
   auto m = a*A*B + b*C;
 
 If D could do this, I think it would become the next FORTRAN. :)
 
 -Lars

Actually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D... Jerome

Someone correct me if I'm wrong, but I think what Blade does is a bit more advanced than FLENS. Blade performs optimizations on the AST level and generates (near) optimal assembly at compile time. I couldn't find info on what FLENS does exactly beyond inlining through template expressions, but from the looks of it it doesn't do any of the AST level optimizations Blade does. Anyone care to provide more info? Can Blade also generate better asm than is possible with libraries such as FLENS?
Jul 11 2009
parent =?ISO-8859-15?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: quoted-printable

Robert Jacques wrote:
 On Sat, 11 Jul 2009 06:14:56 -0400, Lutger=20
 <lutger.blijdestijn gmail.com> wrote:
=20
 "J=E9r=F4me M. Berger" wrote:
 (...)
 BLADE has already shown that it is possible to do stuff like this in=




 library, but I think it goes without saying that if it was built int=




 the language the syntax could be made considerably nicer. Compare:

   auto m =3D MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtr=




   auto m =3D a*A*B + b*C;

 If D could do this, I think it would become the next FORTRAN. :)

 -Lars

Actually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D..=



 Jerome

Someone correct me if I'm wrong, but I think what Blade does is a bit =


 more
 advanced than FLENS. Blade performs optimizations on the AST level and=


 generates (near) optimal assembly at compile time. I couldn't find=20
 info on
 what FLENS does exactly beyond inlining through template expressions, =


 from the looks of it it doesn't do any of the AST level optimizations =


 Blade
 does. Anyone care to provide more info? Can Blade also generate better=


 asm
 than is possible with libraries such as FLENS?

Flens (and several other libraries like it) just provide a syntactic=20 sugar for BLAS using expression templates. So calling something like a =

 b + c + d gets transformed into (if you're lucky) a =3D b + c; a =3D a =

 (and if you're unlucky) temp =3D b + c; a =3D temp + d;

The way I understand it, FLENS will never take the second solution.=20 If it cannot do the first with BLAS, it will fail to compile in=20 order to let you choose the best way to decompose your operation.
 So you end up looping through memory multiple times. The next best opti=

 expression templates, (Blitz or Boost come to mind) which encode the=20
 arguments of each operation into a struct. This results in a lot of=20
 temporaries and is a major performance hit for small-ish vectors, but=20
 you only loop through memory once and make no allocations, which is a=20
 win on larger vectors. Then you have BLADE and D array ops which don't =

 create any temporaries and are faster still. The counter is that a BLAS=

 library can be tuned for each specific CPU or GPU and select the fastes=

 library at runtime.
=20

of operands, looping through memory multiple may in theory be the=20 best option in some cases. This is because it will use the cache=20 (and the swap) more efficiently than a solution that loops only once=20 through all arrays at the same time. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jul 12 2009
prev sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Don wrote:
 Indeed, I even think that the concept of operator overloading is wrong.
 In C++, operator overloading was just for syntax sugar. That's wrong:
 pretty much everything that you want overloaded operators for, is
 performance-critical. And that implies you need to deal on the level of
 complete expressions, not individual operations.

I disagree for two reasons: 1. Operator syntax /is/ just syntax sugar for function calls, which is a concept orthogonal with expression-level optimization. Consider the following expressions: a + b + c add(add(a, b), c) If one of these is reduced to a single function call, both should be. 2. Expression-level optimization is the compiler's job, not the programmer's. -- Rainer Deyke - rainerd eldwood.com
Jul 10 2009
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Rainer,

 2. Expression-level optimization is the compiler's job, not the
 programmer's.
 

Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature. Many of the use cases for operator overloading don't work if the compiler is allowed to assume all the same identities it can for scalars (e.g. the transitive identity doesn't hold for matrix math). Unless you wan to make a way for the programmer to both add and remove expression level optimizations, and require an optimization engine powerful enough to use them, you can't do optimization of expressions with overloaded operators purely in the compiler.
Jul 10 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
BCS wrote:
 Reply to Rainer,
 
 2. Expression-level optimization is the compiler's job, not the
 programmer's.

Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature.

I'm not sure what you mean by "algebraic" here, but...
 Many of the use cases for operator overloading don't work if the
 compiler is allowed to assume all the same identities it can for scalars
 (e.g. the transitive identity doesn't hold for matrix math).

The compiler shouldn't make any assumptions. The compiler doesn't /need/ to make any assumptions, since the compiler has access to the actual code. For example, take a simple vector expression, with 3-element vectors: a = cross_product(b, c + d + e) First the compiler rewrites this to use a temporary variable: tmp = c + d tmp = tmp + e a = cross_product(b, tmp) Then the compiler inlines both functions and unrolls the loops: tmp[0] = c[0] + d[0] tmp[1] = c[1] + d[1] tmp[2] = c[2] + d[2] tmp[0] = tmp[0] + e[0] tmp[1] = tmp[1] + e[1] tmp[2] = tmp[2] + e[2] a[0] = b[1] * tmp[2] - b[2] - tmp[1] a[1] = b[2] * tmp[0] - b[0] - tmp[2] a[2] = b[0] * tmp[1] - b[1] - tmp[0] The compiler then reduces the temporary vector to a set of temporary scalars, probably stored in registers: tmp0 = c[0] + d[0] + e[0] tmp1 = c[1] + d[1] + e[1] tmp2 = c[2] + d[2] + e[2] a[0] = b[1] * tmp2 - b[2] - tmp1 a[1] = b[2] * tmp0 - b[0] - tmp2 a[2] = b[0] * tmp1 - b[1] - tmp0 The compiler sees no further optimizations, so it leaves the code as is. This as about as good as a human optimizer could have done. If the CPU has built-in support for vector cross-multiplication, the compiler can use this at the code generation stage.
 Unless you wan to make a way for the programmer to both add and remove
 expression level optimizations, and require an optimization engine
 powerful enough to use them, you can't do optimization of expressions
 with overloaded operators purely in the compiler.

Why should the programmer need to give hints to the compiler when the compiler already has enough information to figure out how to optimize properly? -- Rainer Deyke - rainerd eldwood.com
Jul 10 2009
next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Jarrett Billingsley wrote:
 1) The compiler doesn't always have the code.  If you're linking
 against a library, where the operator overloads are not given in the
 headers, the compiler doesn't have their code.  Link-time optimization
 might be able to help here, but without the original syntax tree to
 work with, it's probably not going to be able to do the same things.
 
 2) The compiler doesn't necessarily have enough information to
 optimize properly.  Virtual calls destroy any guarantees.  OK, sure,
 you could say "vector and matrix types really should be structs and
 therefore not use virtual calls," but in the general case, the
 compiler can't figure these things out.

Yes and no in both cases. Yes, the way way DMD is set up makes these kind of optimizations exceedingly difficult. Cross-library function calls and slow and virtual function calls are slow. I tend to assume that all performance-critical code is templated anyway, which is incompatible with both libraries and virtual functions. Better to move as many calculations as possible from run time to compile time. On the other hand, JIT compilers for fully dynamic languages can perform these types of optimizations. A clever D implementation could do the same.
 You're right, however, that in the vast majority of cases, the
 compiler *could* optimize calls to these functions.  But you still
 might need some way of informing the compiler "hey!  I really _really_
 would like you to inline this."

I'm not really opposed to the idea of optional optimization hints, but I'd like to see a compiler clever enough to not need them. -- Rainer Deyke - rainerd eldwood.com
Jul 10 2009
parent BCS <none anon.com> writes:
Hello Rainer,

 You're right, however, that in the vast majority of cases, the
 compiler *could* optimize calls to these functions.  But you still
 might need some way of informing the compiler "hey!  I really
 _really_ would like you to inline this."
 

I'd like to see a compiler clever enough to not need them.

When that happens we can all retire. :)
Jul 11 2009
prev sibling parent reply BCS <none anon.com> writes:
Hello Rainer,

 BCS wrote:
 
 Reply to Rainer,
 
 2. Expression-level optimization is the compiler's job, not the
 programmer's.
 

expressions with overloaded operators are not algebraic in nature.


Algebraic optimizations would be optimizations that work by applying stuff you learned in algebra class.
 Many of the use cases for operator overloading don't work if the
 compiler is allowed to assume all the same identities it can for
 scalars (e.g. the transitive identity doesn't hold for matrix math).
 

/need/ to make any assumptions, since the compiler has access to the actual code. For example, take a simple vector expression, with 3-element vectors:

Ok try another case: y = a*b + c*a; for scalers, the compiler can (and should) convert that 3 ops to 2: y = a*(b+c); for a matrix you can't. And I'm sure there are matrix cases where you can get improvements by doing algebraic manipulations before you inline that would be darn near impossible to get the compiler to figure out after you inline.
 Unless you wan to make a way for the programmer to both add and
 remove expression level optimizations, and require an optimization
 engine powerful enough to use them, you can't do optimization of
 expressions with overloaded operators purely in the compiler.
 

compiler already has enough information to figure out how to optimize properly?

Because the compiler doesn't have enough information. I've tried to work with systems that are supposed to do math all by them selves. They didn't work well. Computational Algebra Systems don't work without hints.
Jul 11 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
BCS wrote:
 Ok try another case:
 
 y = a*b + c*a;
 
 for scalers, the compiler can (and should) convert that 3 ops to 2:
 
 y = a*(b+c);
 
 for a matrix you can't. And I'm sure there are matrix cases where you
 can get improvements by doing algebraic manipulations before you inline
 that would be darn near impossible to get the compiler to figure out
 after you inline.

There are two possible cases: - The algebraically manipulated expression is functionally equivalent to the original expression. In this case, the optimizer should produce the exact same code whether for both expressions, so the high-level algebraic manipulation is unnecessary. (Yes, I'm asking a lot from the optimizer.) - The algebraically manipulated expression is /not/ functionally equivalent to the original expression. In this case the algebraic manipulation is invalid, and should not be performed. Optimizing complex expressions is not algorithmically more difficult than optimizing simple expressions, it just takes more memory and CPU time. -- Rainer Deyke - rainerd eldwood.com
Jul 11 2009
parent BCS <none anon.com> writes:
Hello Rainer,

 BCS wrote:
 
 Ok try another case:
 
 y = a*b + c*a;
 
 for scalers, the compiler can (and should) convert that 3 ops to 2:
 
 y = a*(b+c);
 
 for a matrix you can't. And I'm sure there are matrix cases where you
 can get improvements by doing algebraic manipulations before you
 inline that would be darn near impossible to get the compiler to
 figure out after you inline.
 

- The algebraically manipulated expression is functionally equivalent to the original expression. In this case, the optimizer should produce the exact same code whether for both expressions, so the high-level algebraic manipulation is unnecessary. (Yes, I'm asking a lot from the optimizer.)

Now that's a bit of understatment :)
 - The algebraically manipulated expression is /not/ functionally
 equivalent to the original expression.  In this case the algebraic
 manipulation is invalid, and should not be performed.
 

Duh :)
 Optimizing complex expressions is not algorithmically more difficult
 than optimizing simple expressions, it just takes more memory and CPU
 time.

My argument is coming from my understanding of what Andrei Alexandrescu was proposing. I think he is forwarding that operator overloading be done via AST macros. The end effect could be something like my backmath library in the it could manipulate expression under user control before the optimizer gets it crack at things.
Jul 11 2009
prev sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Fri, Jul 10, 2009 at 10:18 PM, Rainer Deyke<rainerd eldwood.com> wrote:
 BCS wrote:
 Reply to Rainer,

 2. Expression-level optimization is the compiler's job, not the
 programmer's.

Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature.

I'm not sure what you mean by "algebraic" here, but...
 Many of the use cases for operator overloading don't work if the
 compiler is allowed to assume all the same identities it can for scalars
 (e.g. the transitive identity doesn't hold for matrix math).

The compiler shouldn't make any assumptions. =A0The compiler doesn't /need/ to make any assumptions, since the compiler has access to the actual code. =A0For example, take a simple vector expression, with 3-element vectors: =A0a =3D cross_product(b, c + d + e) First the compiler rewrites this to use a temporary variable: =A0tmp =3D c + d =A0tmp =3D tmp + e =A0a =3D cross_product(b, tmp) Then the compiler inlines both functions and unrolls the loops: =A0tmp[0] =3D c[0] + d[0] =A0tmp[1] =3D c[1] + d[1] =A0tmp[2] =3D c[2] + d[2] =A0tmp[0] =3D tmp[0] + e[0] =A0tmp[1] =3D tmp[1] + e[1] =A0tmp[2] =3D tmp[2] + e[2] =A0a[0] =3D b[1] * tmp[2] - b[2] - tmp[1] =A0a[1] =3D b[2] * tmp[0] - b[0] - tmp[2] =A0a[2] =3D b[0] * tmp[1] - b[1] - tmp[0] The compiler then reduces the temporary vector to a set of temporary scalars, probably stored in registers: =A0tmp0 =3D c[0] + d[0] + e[0] =A0tmp1 =3D c[1] + d[1] + e[1] =A0tmp2 =3D c[2] + d[2] + e[2] =A0a[0] =3D b[1] * tmp2 - b[2] - tmp1 =A0a[1] =3D b[2] * tmp0 - b[0] - tmp2 =A0a[2] =3D b[0] * tmp1 - b[1] - tmp0 The compiler sees no further optimizations, so it leaves the code as is. =A0This as about as good as a human optimizer could have done. =A0If the =

 has built-in support for vector cross-multiplication, the compiler can
 use this at the code generation stage.

 Unless you wan to make a way for the programmer to both add and remove
 expression level optimizations, and require an optimization engine
 powerful enough to use them, you can't do optimization of expressions
 with overloaded operators purely in the compiler.

Why should the programmer need to give hints to the compiler when the compiler already has enough information to figure out how to optimize properly?

You're somewhat wrong on two points: 1) The compiler doesn't always have the code. If you're linking against a library, where the operator overloads are not given in the headers, the compiler doesn't have their code. Link-time optimization might be able to help here, but without the original syntax tree to work with, it's probably not going to be able to do the same things. 2) The compiler doesn't necessarily have enough information to optimize properly. Virtual calls destroy any guarantees. OK, sure, you could say "vector and matrix types really should be structs and therefore not use virtual calls," but in the general case, the compiler can't figure these things out. You're right, however, that in the vast majority of cases, the compiler *could* optimize calls to these functions. But you still might need some way of informing the compiler "hey! I really _really_ would like you to inline this."
Jul 10 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Fri, Jul 10, 2009 at 8:15 AM, Andrei
Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:
 Don wrote:
 The other thing that's desperately missing from D is multi-dimensional
 indexing.

What are the limitations of multiple-argument []? Andrei

I don't know what Don had in mind, but multi-dimensional slice is ugly. should be A[ a..b, c..d, e..f ] but currently you have to do something like A[ [a,c,e]..[b,d,f] ]. Or A[a..b][c..d][e..f] and bend over backwards to avoid creating too many temporary proxy objects. Also you'd like to be able to mix and match slices and indices like A[ a..b, 3, e..f ]. Also a basic built in multi-dim array would be nice. I think most of the time when people make a new T(10,20) they don't really want 10 arrays of length 20, each of which can be individually resized (and must be if more capacity is needed). They want one array with size 10 x 20. --bb
Jul 10 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Fri, Jul 10, 2009 at 12:45 PM, Bill Baxter<wbaxter gmail.com> wrote:

 Also a basic built in multi-dim array would be nice. =A0I think most of
 the time when people make a new T(10,20) they don't really want 10
 arrays of length 20, each of which can be individually resized (and
 must be if more capacity is needed). =A0 They want one array with size
 10 x 20.

Something like C#'s would be nice: http://msdn.microsoft.com/en-us/library/aa288453(VS.71).aspx If we were to D-ize them, they might look something like: int[,] a =3D new int[,](3, 4); a[,] =3D 5; // slice-assign entire array a[2,] =3D 10; // slice-assign only the third row for(int i =3D 0; i < a.length(0); i++) // notice .length(0) for(int j =3D 0; i < a.length(1); j++) writeln(a[i, j]); auto b =3D a.dup; a[0,] =3D b[1,]; // copy b's second row into a's first - hmm Slicing them might be tricky, because I'm not sure what type the slice should be. I think maybe a[0, ] should be int[], but what the heck would a[,0] be? int[,4] ? It'd require a lot more thought.
Jul 10 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 10 Jul 2009 11:15:15 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 The other thing that's desperately missing from D is multi-dimensional  
 indexing.

What are the limitations of multiple-argument []? Andrei

Don might be refering to the $ operator, but that more applies to slicing. Personally, I find the lack of multi-dimentional slicing more of a lack: i.e. matrix[[1,2]..[matrix.lengths[0]-3,5]] vs matrix[1..$-3, 2..5]
Jul 10 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:
 There are actually three (four) basic types of vector/matrix  
 multiplication, and the * operator would more or less be fitting for any  
 of them:
    - element-by-element multiplication, which is what * means now
    - dot product
    - matrix multiplication
   (- cross product )

Actually, matrix multiplication and the dot product are both cases of the inner product and the cross product is specific to 3D vectors.
Jul 11 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 11 Jul 2009 06:14:56 -0400, Lutger <lutger.blijdestijn gmail.com>  
wrote:

 "Jérôme M. Berger" wrote:
 (...)
 BLADE has already shown that it is possible to do stuff like this in a
 library, but I think it goes without saying that if it was built into
 the language the syntax could be made considerably nicer. Compare:

   auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx);

   auto m = a*A*B + b*C;

 If D could do this, I think it would become the next FORTRAN. :)

 -Lars

Actually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D... Jerome

Someone correct me if I'm wrong, but I think what Blade does is a bit more advanced than FLENS. Blade performs optimizations on the AST level and generates (near) optimal assembly at compile time. I couldn't find info on what FLENS does exactly beyond inlining through template expressions, but from the looks of it it doesn't do any of the AST level optimizations Blade does. Anyone care to provide more info? Can Blade also generate better asm than is possible with libraries such as FLENS?

Flens (and several other libraries like it) just provide a syntactic sugar for BLAS using expression templates. So calling something like a = b + c + d gets transformed into (if you're lucky) a = b + c; a = a + d; (and if you're unlucky) temp = b + c; a = temp + d; So you end up looping through memory multiple times. The next best option are expression templates, (Blitz or Boost come to mind) which encode the arguments of each operation into a struct. This results in a lot of temporaries and is a major performance hit for small-ish vectors, but you only loop through memory once and make no allocations, which is a win on larger vectors. Then you have BLADE and D array ops which don't create any temporaries and are faster still. The counter is that a BLAS library can be tuned for each specific CPU or GPU and select the fastest library at runtime.
Jul 12 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 12 Jul 2009 11:14:02 -0400, Jérôme M. Berger <jeberger free.fr>  
wrote:

 Robert Jacques wrote:
 On Sat, 11 Jul 2009 06:14:56 -0400, Lutger
 <lutger.blijdestijn gmail.com> wrote:

 "Jérôme M. Berger" wrote:
 (...)
 BLADE has already shown that it is possible to do stuff like this in  
 a
 library, but I think it goes without saying that if it was built into
 the language the syntax could be made considerably nicer. Compare:

   auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx);

   auto m = a*A*B + b*C;

 If D could do this, I think it would become the next FORTRAN. :)

 -Lars

Actually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D... Jerome

Someone correct me if I'm wrong, but I think what Blade does is a bit more advanced than FLENS. Blade performs optimizations on the AST level and generates (near) optimal assembly at compile time. I couldn't find info on what FLENS does exactly beyond inlining through template expressions, but from the looks of it it doesn't do any of the AST level optimizations Blade does. Anyone care to provide more info? Can Blade also generate better asm than is possible with libraries such as FLENS?

Flens (and several other libraries like it) just provide a syntactic sugar for BLAS using expression templates. So calling something like a = b + c + d gets transformed into (if you're lucky) a = b + c; a = a + d; (and if you're unlucky) temp = b + c; a = temp + d;

The way I understand it, FLENS will never take the second solution. If it cannot do the first with BLAS, it will fail to compile in order to let you choose the best way to decompose your operation.
 So you end up looping through memory multiple times. The next best  
 option are
 expression templates, (Blitz or Boost come to mind) which encode the
 arguments of each operation into a struct. This results in a lot of
 temporaries and is a major performance hit for small-ish vectors, but
 you only loop through memory once and make no allocations, which is a
 win on larger vectors. Then you have BLADE and D array ops which don't
 create any temporaries and are faster still. The counter is that a BLAS
 library can be tuned for each specific CPU or GPU and select the fastest
 library at runtime.

of operands, looping through memory multiple may in theory be the best option in some cases. This is because it will use the cache (and the swap) more efficiently than a solution that loops only once through all arrays at the same time.

For arrays this is never, ever the case. Array ops are memory bandwidth bound. (Okay, with enough operands you could mess up the cache, but 100s of operands are pathological) What you're probably thinking about are the matrix multiply algorithms, but these algorithms inherently loop through memory multiple times. It's not something 'added' by the limitation of a fixed API.
Jul 12 2009