## digitalmars.D - The Matrix to end all Matrix classes (Let's dream!)

- Don Clugston <dac nospam.com.au> Nov 20 2007
- Oskar Linde <oskar.lindeREM OVEgmail.com> Nov 20 2007
- Bill Baxter <dnewsgroup billbaxter.com> Nov 20 2007
- TomD <t_demmer nospam.web.de> Nov 20 2007
- Bill Baxter <dnewsgroup billbaxter.com> Nov 20 2007
- Don Clugston <dac nospam.com.au> Nov 21 2007
- TomD <t_demmer nospam.web.de> Nov 20 2007
- Bill Baxter <dnewsgroup billbaxter.com> Nov 22 2007
- "Craig Black" <craigblack2 cox.net> Nov 20 2007
- "Craig Black" <craigblack2 cox.net> Nov 20 2007
- 0ffh <frank frankhirsch.youknow.what.todo.net> Nov 20 2007
- Don Clugston <dac nospam.com.au> Nov 20 2007
- "Craig Black" <cblack ara.com> Nov 20 2007
- Don Clugston <dac nospam.com.au> Nov 20 2007
- "Craig Black" <cblack ara.com> Nov 20 2007
- Sean Kelly <sean f4.ca> Nov 20 2007
- "Craig Black" <cblack ara.com> Nov 20 2007
- =?ISO-8859-1?Q?Pablo_Ripoll=e9s?= <in-call gmx.net> Nov 20 2007
- Neal Becker <ndbecker2 gmail.com> Nov 20 2007
- 0ffh <frank frankhirsch.youknow.what.todo.net> Nov 20 2007
- Bill Baxter <dnewsgroup billbaxter.com> Nov 20 2007
- 0ffh <frank frankhirsch.youknow.what.todo.net> Nov 20 2007
- Knud Soerensen <4tuu4k002 sneakemail.com> Dec 01 2007

No, I don't have the perfect Matrix. I just want to explore the question of whether it exists. There are a plethora of C++ matrix libraries, all of which are far from perfect; they all involve trade-offs between various language limitations. In particular, the usage of expression templates creates all kinds of evil. Requirements: 1. Must be compatible with future multi-dimensional array syntax such as double [7,5] m; 2. Must be theoretically capable of optimal performance. 3. Needs to support slices. This means we also need to support strided vectors. Therefore we also need matrix slices, and matrix references. Probably distinguishing between the T[] and T[new] concepts. 4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand). 5. Can be generalized to tensors. 6. Nice error messages. I now have enough experience with my BLADE library to be confident that all of these features can be achieved in existing D. In particular, the crucial requirement 2 can be completely decoupled from the others. But, there are many requirements (dozens?) which are not on my brief list, and perhaps they cannot all be met simultaneously, even in theory. Yet I'm not convinced that it's impossible. What are the other requirements?

Nov 20 2007

Don Clugston wrote:No, I don't have the perfect Matrix. I just want to explore the question of whether it exists.

A very interesting topic indeed. I will just give some quick thoughts to start with. First of all, a matrix is (generally) 2-dimensional, but there are many uses for higher-dimensional structures as well. Also, the 1 dimensional vector is a useful structure that could be handled in the same generic framework. Is the target just 2-dimensional matrices or also higher dimensional arrays? The matrix as a mathematical structure is quite different in semantics and use from generic multi-dimensional arrays and should perhaps be treated separately. To avoid confusion, I will use "size" to refer to the number of elements of a matrix/array and "dimensionality" to refer to the number of dimensions of an array (a vector is 1-dimensional, a matrix is 2-dimensional). First, there is the separation of compile time and run time. One can identify (at least) 3 different needs, with different requirements for run-time/compile-time dynamics: * Small static matrices/arrays (where the size and dimensionality are known at compile time) * Dynamic matrices/arrays (where the sizes are determined at run-time, but dimensionality is fixed at compile time) * Fully dynamic multidimensional arrays (where the dimensionality is determined at run time) The third case could probably in most cases be left out, but the first two make an interesting separation. Knowing the size at compile time is mostly an advantage for smaller matrices (eg 4x4, as commonly used for geometric translations). Larger matrices (eg 1000x1000) probably gain much less from compile time known sizes. There is also the matter of template bloat that means that compile time quantities are best left to the smaller numbers.Requirements: 1. Must be compatible with future multi-dimensional array syntax such as double [7,5] m; 2. Must be theoretically capable of optimal performance.

Seems reasonable.3. Needs to support slices. This means we also need to support strided vectors. Therefore we also need matrix slices, and matrix references.

Yes, both strided views and non-strided references are definitely needed.Probably distinguishing between the T[] and T[new] concepts.

Such a distinction is definitely of an advantage. For large matrices, ownership of memory becomes important, as such structures may be too large to be able to rely on the GC.4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand).

For generality, at least sparse (and possibly calculated) are useful for vectors and higher dimensional arrays too.5. Can be generalized to tensors. 6. Nice error messages. I now have enough experience with my BLADE library to be confident that all of these features can be achieved in existing D. In particular, the crucial requirement 2 can be completely decoupled from the others.

Sounds excellent.But, there are many requirements (dozens?) which are not on my brief list, and perhaps they cannot all be met simultaneously, even in theory. Yet I'm not convinced that it's impossible. What are the other requirements?

Naturally, different element types (as well UDT). Compatibilty with external libraries (Fortran/C) with the possibility of switching between Fortran and C layout. -- Oskar

Nov 20 2007

Oskar Linde wrote:Don Clugston wrote:No, I don't have the perfect Matrix. I just want to explore the question of whether it exists.

A very interesting topic indeed. I will just give some quick thoughts to start with. First of all, a matrix is (generally) 2-dimensional, but there are many uses for higher-dimensional structures as well. Also, the 1 dimensional vector is a useful structure that could be handled in the same generic framework. Is the target just 2-dimensional matrices or also higher dimensional arrays? The matrix as a mathematical structure is quite different in semantics and use from generic multi-dimensional arrays and should perhaps be treated separately. To avoid confusion, I will use "size" to refer to the number of elements of a matrix/array and "dimensionality" to refer to the number of dimensions of an array (a vector is 1-dimensional, a matrix is 2-dimensional). First, there is the separation of compile time and run time. One can identify (at least) 3 different needs, with different requirements for run-time/compile-time dynamics: * Small static matrices/arrays (where the size and dimensionality are known at compile time) * Dynamic matrices/arrays (where the sizes are determined at run-time, but dimensionality is fixed at compile time) * Fully dynamic multidimensional arrays (where the dimensionality is determined at run time) The third case could probably in most cases be left out,

I implemented the third case in Murray (nee MultiArray) www.dsource.org/projects/multiarray Mainly because I originally wanted an easy way port my NumPy code to D. But I have to agree that after spending a lot of time on it, I would go with #2 if I were starting over. Fix the number of dimensions at compile time.but the first two make an interesting separation. Knowing the size at compile time is mostly an advantage for smaller matrices (eg 4x4, as commonly used for geometric translations). Larger matrices (eg 1000x1000) probably gain much less from compile time known sizes. There is also the matter of template bloat that means that compile time quantities are best left to the smaller numbers.

For #1, I've never had a need for anything besides 2x2,3x3,4x4. Maybe occasionally things like 2x3 to represent affine transformations, but then interpretation starts getting in the way. You can't handle it like a generic 2x3 matrix. So to me for the compile time size case, you can pretty much get by with just a fixed set of classes. Of course if you can generate those efficiently from a big template, then that's great. But there are usually a lot of special cases. If you look at a geometry-oriented linalg lib like Helix you can find many little differences in the API between Matrix22 and Matrix33. They can be taken care of with static ifs, but the question is if the loss of readability it's worth it in the end.Requirements: 1. Must be compatible with future multi-dimensional array syntax such as double [7,5] m; 2. Must be theoretically capable of optimal performance.

Seems reasonable.3. Needs to support slices. This means we also need to support strided vectors. Therefore we also need matrix slices, and matrix references.

Yes, both strided views and non-strided references are definitely needed.Probably distinguishing between the T[] and T[new] concepts.

Such a distinction is definitely of an advantage. For large matrices, ownership of memory becomes important, as such structures may be too large to be able to rely on the GC.

Sorry, I forgot what this distinction was. Can someone remind me?4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand).

For generality, at least sparse (and possibly calculated) are useful for vectors and higher dimensional arrays too.5. Can be generalized to tensors.

Numbers 4 and 5 are a bit conflicting. There is no "triangular storage" for anything other than 2-D arrays. Same goes for the commonly used compressed matrix formats: they are 2D-specific. Convenient usage for linear algebra also suggests that opMul should do linear algebraic multiplication, but that doesn't make sense for tensors. Changing opMul to do different things depending on the dimension seems inconsistent. My feeling is that the generic N-D array concept needs to be separated from the linear algebra concepts. However, I think all these needs can be accommodated by layering things appropriately. first Storage then N-D array using some specific storage then Matrix using 2-D array, and Vector using 1-D array. Implementations of the storage concept can be dimension-specific (like CCS sparse), or generic (like strided memory). N-D array will have only operators that make sense for N-D arrays without reference to specific interpretations. So opMul would be element-wise multiplication if anything. Matrix and Vector can implement the linear algebraic rules.Yet I'm not convinced that it's impossible. What are the otherrequirements?

Naturally, different element types (as well UDT).

I think the UDT part in particular is difficult to do well without reference return values. Even now I think with a complex matrix you have the problem that you can't set the real part of an element using opIndexAssign: mat[3,2].re = 5. // oops you just set the real part of a copy! Or for increment for any type of matrix (+= 1.0). But that's old news. I think you can achieve something decent using current D, but the syntax will not be ideal. And doing all the above will be a very large effort. I started collecting a list of possible libraries to draw inspiration from over at the Murray forum: http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2 --bb

Nov 20 2007

Bill Baxter Wrote: [...]The third case could probably in most cases be left out,

I implemented the third case in Murray (nee MultiArray) www.dsource.org/projects/multiarray Mainly because I originally wanted an easy way port my NumPy code to D. But I have to agree that after spending a lot of time on it, I would go with #2 if I were starting over. Fix the number of dimensions at compile time.but the first two make an interesting separation. Knowing the size at compile time is mostly an advantage for smaller matrices (eg 4x4, as commonly used for geometric translations). Larger matrices (eg 1000x1000) probably gain much less from compile time known sizes. There is also the matter of template bloat that means that compile time quantities are best left to the smaller numbers.

For #1, I've never had a need for anything besides 2x2,3x3,4x4. Maybe occasionally things like 2x3 to represent affine transformations, but then interpretation starts getting in the way. You can't handle it like a generic 2x3 matrix. So to me for the compile time size case, you can pretty much get by with just a fixed set of classes. Of course if you can generate those efficiently from a big template, then that's great. But there are usually a lot of special cases. If you look at a geometry-oriented linalg lib like Helix you can find many little differences in the API between Matrix22 and Matrix33. They can be taken care of with static ifs, but the question is if the loss of readability it's worth it in the end.

In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, depending on the type of elements you use, also some non-square ones. [...]4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand).

For generality, at least sparse (and possibly calculated) are useful for vectors and higher dimensional arrays too.

5. Can be generalized to tensors.

Numbers 4 and 5 are a bit conflicting. There is no "triangular storage" for anything other than 2-D arrays. Same goes for the commonly used compressed matrix formats: they are 2D-specific.

Well, that becomes something like "pyramidal" structure in 3D. They are quite common in 3D partial differential equations, like Laplace solvers, or in Fluid Mechanics.Convenient usage for linear algebra also suggests that opMul should do linear algebraic multiplication, but that doesn't make sense for tensors. Changing opMul to do different things depending on the dimension seems inconsistent.

For Tensors we'd need something like ., :, and x, dot product, double dot product and cross product. It may seem inconsistent, but it is math :-) Ciao TomMy feeling is that the generic N-D array concept needs to be separated from the linear algebra concepts. However, I think all these needs can be accommodated by layering things appropriately. first Storage then N-D array using some specific storage then Matrix using 2-D array, and Vector using 1-D array. Implementations of the storage concept can be dimension-specific (like CCS sparse), or generic (like strided memory). N-D array will have only operators that make sense for N-D arrays without reference to specific interpretations. So opMul would be element-wise multiplication if anything. Matrix and Vector can implement the linear algebraic rules.Yet I'm not convinced that it's impossible. What are the otherrequirements?

Naturally, different element types (as well UDT).

I think the UDT part in particular is difficult to do well without reference return values. Even now I think with a complex matrix you have the problem that you can't set the real part of an element using opIndexAssign: mat[3,2].re = 5. // oops you just set the real part of a copy! Or for increment for any type of matrix (+= 1.0). But that's old news. I think you can achieve something decent using current D, but the syntax will not be ideal. And doing all the above will be a very large effort. I started collecting a list of possible libraries to draw inspiration from over at the Murray forum: http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2 --bb

Nov 20 2007

TomD wrote:Bill Baxter Wrote: [...]The third case could probably in most cases be left out,

www.dsource.org/projects/multiarray Mainly because I originally wanted an easy way port my NumPy code to D. But I have to agree that after spending a lot of time on it, I would go with #2 if I were starting over. Fix the number of dimensions at compile time.but the first two make an interesting separation. Knowing the size at compile time is mostly an advantage for smaller matrices (eg 4x4, as commonly used for geometric translations). Larger matrices (eg 1000x1000) probably gain much less from compile time known sizes. There is also the matter of template bloat that means that compile time quantities are best left to the smaller numbers.

occasionally things like 2x3 to represent affine transformations, but then interpretation starts getting in the way. You can't handle it like a generic 2x3 matrix. So to me for the compile time size case, you can pretty much get by with just a fixed set of classes. Of course if you can generate those efficiently from a big template, then that's great. But there are usually a lot of special cases. If you look at a geometry-oriented linalg lib like Helix you can find many little differences in the API between Matrix22 and Matrix33. They can be taken care of with static ifs, but the question is if the loss of readability it's worth it in the end.

In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, depending on the type of elements you use, also some non-square ones.

And wouldn't you know it. Right after I wrote that I sat down and started working on some code where I really needed 2x3 matrices. Doh! I guess I just never noticed needing them before.[...]

vectors and higher dimensional arrays too.

5. Can be generalized to tensors.

for anything other than 2-D arrays. Same goes for the commonly used compressed matrix formats: they are 2D-specific.

Well, that becomes something like "pyramidal" structure in 3D. They are quite common in 3D partial differential equations, like Laplace solvers, or in Fluid Mechanics.Convenient usage for linear algebra also suggests that opMul should do linear algebraic multiplication, but that doesn't make sense for tensors. Changing opMul to do different things depending on the dimension seems inconsistent.

For Tensors we'd need something like ., :, and x, dot product, double dot product and cross product. It may seem inconsistent, but it is math :-)

So the thing to do seems to be to have different wrappers for the underlying storage methods that expose different operations. Or just different sets of functions that treat the storage in different ways. linalg.dot, tensor.dot, ... It's quite a large software engineering problem. But I'm looking forward to the day when Don is done writing it. :-) --bb

Nov 20 2007

Bill Baxter wrote:TomD wrote:Bill Baxter Wrote: [...]The third case could probably in most cases be left out,

www.dsource.org/projects/multiarray Mainly because I originally wanted an easy way port my NumPy code to D. But I have to agree that after spending a lot of time on it, I would go with #2 if I were starting over. Fix the number of dimensions at compile time.

Maybe occasionally things like 2x3 to represent affine transformations, but then interpretation starts getting in the way. You can't handle it like a generic 2x3 matrix. So to me for the compile time size case, you can pretty much get by with just a fixed set of classes. Of course if you can generate those efficiently from a big template, then that's great. But there are usually a lot of special cases. If you look at a geometry-oriented linalg lib like Helix you can find many little differences in the API between Matrix22 and Matrix33. They can be taken care of with static ifs, but the question is if the loss of readability it's worth it in the end.

In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, depending on the type of elements you use, also some non-square ones.

And wouldn't you know it. Right after I wrote that I sat down and started working on some code where I really needed 2x3 matrices. Doh! I guess I just never noticed needing them before.[...]

for vectors and higher dimensional arrays too.

5. Can be generalized to tensors.

storage" for anything other than 2-D arrays. Same goes for the commonly used compressed matrix formats: they are 2D-specific.

Well, that becomes something like "pyramidal" structure in 3D. They are quite common in 3D partial differential equations, like Laplace solvers, or in Fluid Mechanics.Convenient usage for linear algebra also suggests that opMul should do linear algebraic multiplication, but that doesn't make sense for tensors. Changing opMul to do different things depending on the dimension seems inconsistent.

For Tensors we'd need something like ., :, and x, dot product, double dot product and cross product. It may seem inconsistent, but it is math :-)

So the thing to do seems to be to have different wrappers for the underlying storage methods that expose different operations. Or just different sets of functions that treat the storage in different ways. linalg.dot, tensor.dot, ... It's quite a large software engineering problem. But I'm looking forward to the day when Don is done writing it. :-)

Me too :-) Quite possibly it's all too hard. Yet it seems to me that fundamentally, the storage class just needs to expose how to get from one element to another efficiently. eg for a matrix, expose some compile-time traits m.packingType --- row, column, transposable(at runtime, will always be either row or column, please generate code for row & column and choose between them at runtime), none (no optimisation potential). and then if row-based: m.firstElementInRow(y) m.lastElementInRow(y) m.rowstride and then optimal access is something like: for (int i=0; i<m.numColumns; ++i) { for (auto t= &m[i][firstElementInRow(i)]; t<=&m[i][lastElementInRow(i)]; t+=m.rowStride) { // *t is m[i][j] } } It all gets very complicated, but the saving grace is that it only needs to be optimal for the common cases.

Nov 21 2007

Bill Baxter Wrote: [...]The third case could probably in most cases be left out,

I implemented the third case in Murray (nee MultiArray) www.dsource.org/projects/multiarray Mainly because I originally wanted an easy way port my NumPy code to D. But I have to agree that after spending a lot of time on it, I would go with #2 if I were starting over. Fix the number of dimensions at compile time.

For #1, I've never had a need for anything besides 2x2,3x3,4x4. Maybe occasionally things like 2x3 to represent affine transformations, but then interpretation starts getting in the way. You can't handle it like a generic 2x3 matrix. So to me for the compile time size case, you can pretty much get by with just a fixed set of classes. Of course if you can generate those efficiently from a big template, then that's great. But there are usually a lot of special cases. If you look at a geometry-oriented linalg lib like Helix you can find many little differences in the API between Matrix22 and Matrix33. They can be taken care of with static ifs, but the question is if the loss of readability it's worth it in the end.

In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, depending on the type of elements you use, also some non-square ones. [...]

For generality, at least sparse (and possibly calculated) are useful for vectors and higher dimensional arrays too.

5. Can be generalized to tensors.

Numbers 4 and 5 are a bit conflicting. There is no "triangular storage" for anything other than 2-D arrays. Same goes for the commonly used compressed matrix formats: they are 2D-specific.

Well, that becomes something like "pyramidal" structure in 3D. They are quite common in 3D partial differential equations, like Laplace solvers, or in Fluid Mechanics.

For Tensors we'd need something like ., :, and x, dot product, double dot product and cross product. It may seem inconsistent, but it is math :-) Ciao TomMy feeling is that the generic N-D array concept needs to be separated from the linear algebra concepts. However, I think all these needs can be accommodated by layering things appropriately. first Storage then N-D array using some specific storage then Matrix using 2-D array, and Vector using 1-D array. Implementations of the storage concept can be dimension-specific (like CCS sparse), or generic (like strided memory). N-D array will have only operators that make sense for N-D arrays without reference to specific interpretations. So opMul would be element-wise multiplication if anything. Matrix and Vector can implement the linear algebraic rules.Yet I'm not convinced that it's impossible. What are the otherrequirements?

Naturally, different element types (as well UDT).

I think the UDT part in particular is difficult to do well without reference return values. Even now I think with a complex matrix you have the problem that you can't set the real part of an element using opIndexAssign: mat[3,2].re = 5. // oops you just set the real part of a copy! Or for increment for any type of matrix (+= 1.0). But that's old news. I think you can achieve something decent using current D, but the syntax will not be ideal. And doing all the above will be a very large effort. I started collecting a list of possible libraries to draw inspiration from over at the Murray forum: http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2 --bb

Nov 20 2007

Oskar Linde wrote:Don Clugston wrote:

First, there is the separation of compile time and run time. One can identify (at least) 3 different needs, with different requirements for run-time/compile-time dynamics: * Small static matrices/arrays (where the size and dimensionality are known at compile time)

I just implemented a version of this one this morning: http://www.dsource.org/projects/openmeshd/browser/trunk/OpenMeshD/OpenMesh/Core/Geometry/MatrixT.d (There's a VectorT there too, but I haven't made any connections between the two yet -- mat*vec ops and such.) http://www.dsource.org/projects/openmeshd/browser/trunk/OpenMeshD/OpenMesh/Core/Geometry/MatrixT.d --bb

Nov 22 2007

Requirements: 1. Must be compatible with future multi-dimensional array syntax such as double [7,5] m; 2. Must be theoretically capable of optimal performance. 3. Needs to support slices. This means we also need to support strided vectors. Therefore we also need matrix slices, and matrix references. Probably distinguishing between the T[] and T[new] concepts. 4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand). 5. Can be generalized to tensors. 6. Nice error messages.

Another requirement would be elegant syntax that "fits" with D syntax. Is that what you were trying to communicate with requirement 1? If I remember correctly doesn't BLAS syntax look a little funny because it relies on mixins? If you want requirement 2 then you must be able to leverage any available GPU's or GPGPU's. This is going to be especially useful when Nvidia and AMD release their GPU-like processors that will be able to do double precision floating point operations. Nvidia's (yet unnamed) processor is due out in December. AMD's Firestorm is due in the first quarter of '08. My company has plans to develop a matrix library for physics simulations that will support these new processors. However, if one was already available for us then we wouldn't need to develop it. -Craig

Nov 20 2007

Another requirement would be elegant syntax that "fits" with D syntax. Is that what you were trying to communicate with requirement 1? If I remember correctly doesn't BLAS syntax look a little funny because it relies on mixins?

Er, I meant BLADE not BLAS.

Nov 20 2007

Don Clugston wrote:[about TMtEAMC]

Very interesting indeed! I do a lot of scientific computation with Matlab, and I'd love to have a powerful matrix lib for D. It's just I'm currently swamped with other work, so I couldn't be much help... :-( The first thing that springs in my mind is which operations to choose for operator overloading? Given that a distinction is made between column and row vectors, we can use the operand order to choose between inner and outer product. But what about element-wise product? Same goes for division. Maybe "fake infix operators" could be used? regards, frank

Nov 20 2007

For reference, here's the state of BLADE: Syntax is: mixin(vectorize("expression")); eg, mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant")); It would be so nice to get rid of the mixin() and the quotes. But they don't impede development in any way. Ideally it would be: vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant); Most significant changes since I my last post about it. (a) it finally works in a usable form :-). Thanks Walter for some great bugfixes in the past few releases! I'm using dmd 1.023. (b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of the expression and the types involved. (c) an expression rewriting step has been added, which performs scalar/const folding, and indexing/slicing folding (eg, (A+7*B)[0..5] ---> A[0..5] + 7*(B[0..5]) ). (d) it gives really, really nice error messages and debug output. No template garbage from the library internals -- just a straightforward one-line error message, giving the line in YOUR code which contains the error. eg, with an array a[], mixin(vectorize("any+= old*garbage")); mixin(vectorize("a+= 2")); you get this output (and NOTHING ELSE): demo.d(38): static assert "BLADE: Undefined symbols: any old garbage" demo.d(39): static assert "BLADE: Rank mismatch (addition or subtraction)" It's still not terribly useful, since it only generates code for packed vectors. But the infrastructure is very solid. Here's a particularly nasty example which the constant folding can cope with, to generate SSE2 code: double [] a = new double[4]; double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4]; a[0..$] = [3.4, 565, 31.3, 41.8]; double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]]; mixin(vectorize( ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`)); ------- Generates this front-end code (compile with -debug=BladeFrontEnd to see it). Note that there are many asserts to give nice debug info at runtime, but the only runtime code is a single function call, which passes 3 pointers and a double into an asm function (there's no inlining work for the compiler to do): ------ // bladedemo.d(34) a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$] assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`); assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector length mismatch`); assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`); assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector misalignment: d[2..($-1)][(3u-3)..$]`); assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector misalignment: another[][1][(3u-3)..$]`); SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]); ----- The function consists of this ASM code (compile with -debug=BladeBackEnd to see it; BTW all comments are auto-generated). Note there are only 8 asm instructions in the inner loop: ------- // Operation : ACB*D-+A= asm { push EBX; mov EAX, veclength; lea ECX, [8*EAX]; add ECX, values[0]; //A movsd XMM0, values[1]; shufpd XMM0, XMM0,0; //B lea EDX, [8*EAX]; add EDX, values[2]; //C lea EBX, [8*EAX]; add EBX, values[3]; //D xor EAX, EAX; sub EAX, veclength; // counter=-length jz short L2; // test for length==0 align 16; L1: movapd XMM1, [ECX + 8*EAX]; // A movapd XMM2, [EDX + 8*EAX]; // C mulpd XMM2, XMM0; // B* subpd XMM2, [EBX + 8*EAX]; // D- addpd XMM1, XMM2; //+ movapd [ECX + 8*EAX], XMM1; // A= add EAX,2; js L1; L2: sub EAX, 2; jns L4; movsd XMM1, [ECX + 8*EAX+16]; // A movsd XMM2, [EDX + 8*EAX+16]; // C mulsd XMM2, XMM0; // B* subsd XMM2, [EBX + 8*EAX+16]; // D- addsd XMM1, XMM2; //+ movsd [ECX + 8*EAX+16], XMM1; // A= L4: ; pop EBX; } -------

Nov 20 2007

"Don Clugston" <dac nospam.com.au> wrote in message news:fhv00l$13eq$1 digitalmars.com...For reference, here's the state of BLADE: Syntax is: mixin(vectorize("expression")); eg, mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant")); It would be so nice to get rid of the mixin() and the quotes. But they don't impede development in any way. Ideally it would be: vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant); Most significant changes since I my last post about it. (a) it finally works in a usable form :-). Thanks Walter for some great bugfixes in the past few releases! I'm using dmd 1.023. (b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of the expression and the types involved. (c) an expression rewriting step has been added, which performs scalar/const folding, and indexing/slicing folding (eg, (A+7*B)[0..5] ---> A[0..5] + 7*(B[0..5]) ). (d) it gives really, really nice error messages and debug output. No template garbage from the library internals -- just a straightforward one-line error message, giving the line in YOUR code which contains the error. eg, with an array a[], mixin(vectorize("any+= old*garbage")); mixin(vectorize("a+= 2")); you get this output (and NOTHING ELSE): demo.d(38): static assert "BLADE: Undefined symbols: any old garbage" demo.d(39): static assert "BLADE: Rank mismatch (addition or subtraction)" It's still not terribly useful, since it only generates code for packed vectors. But the infrastructure is very solid. Here's a particularly nasty example which the constant folding can cope with, to generate SSE2 code: double [] a = new double[4]; double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4]; a[0..$] = [3.4, 565, 31.3, 41.8]; double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]]; mixin(vectorize( ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`)); ------- Generates this front-end code (compile with -debug=BladeFrontEnd to see it). Note that there are many asserts to give nice debug info at runtime, but the only runtime code is a single function call, which passes 3 pointers and a double into an asm function (there's no inlining work for the compiler to do): ------ // bladedemo.d(34) a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$] assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`); assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector length mismatch`); assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`); assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector misalignment: d[2..($-1)][(3u-3)..$]`); assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector misalignment: another[][1][(3u-3)..$]`); SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]); ----- The function consists of this ASM code (compile with -debug=BladeBackEnd to see it; BTW all comments are auto-generated). Note there are only 8 asm instructions in the inner loop: ------- // Operation : ACB*D-+A= asm { push EBX; mov EAX, veclength; lea ECX, [8*EAX]; add ECX, values[0]; //A movsd XMM0, values[1]; shufpd XMM0, XMM0,0; //B lea EDX, [8*EAX]; add EDX, values[2]; //C lea EBX, [8*EAX]; add EBX, values[3]; //D xor EAX, EAX; sub EAX, veclength; // counter=-length jz short L2; // test for length==0 align 16; L1: movapd XMM1, [ECX + 8*EAX]; // A movapd XMM2, [EDX + 8*EAX]; // C mulpd XMM2, XMM0; // B* subpd XMM2, [EBX + 8*EAX]; // D- addpd XMM1, XMM2; //+ movapd [ECX + 8*EAX], XMM1; // A= add EAX,2; js L1; L2: sub EAX, 2; jns L4; movsd XMM1, [ECX + 8*EAX+16]; // A movsd XMM2, [EDX + 8*EAX+16]; // C mulsd XMM2, XMM0; // B* subsd XMM2, [EBX + 8*EAX+16]; // D- addsd XMM1, XMM2; //+ movsd [ECX + 8*EAX+16], XMM1; // A= L4: ; pop EBX; } -------

Very very cool Don! This obviously has huge potential. Do you have any plans to support GPU's or multicores? Have you looked into SSE3 or SSE4 to see if there are any new instructions that would be useful? Double precision support is the most important to me, but I am also interested in single precision and extended precision. To what degree are they supported? I assume that SSE instructions can't be used for extended precision operations. So there is only support for one dimensional arrays? When will two dimensional arrays be supported? Once two dimensional arrays are implemented, a simple test for the code generator would be to generate optimized code for single precision multiplication of 3x3 or 4x4 matrices. Optimal code for single precision 3x3 and 4x4 matrix multiplication is widely available on the internet. -Craig

Nov 20 2007

Craig Black wrote:"Don Clugston" <dac nospam.com.au> wrote in message news:fhv00l$13eq$1 digitalmars.com...For reference, here's the state of BLADE: -------

Very very cool Don! This obviously has huge potential. Do you have any plans to support GPU's or multicores? Have you looked into SSE3 or SSE4 to see if there are any new instructions that would be useful?

Yes. Although it seems to me that the GPU world is about to change radically -- using OpenGL looks like a horrible hack without long-term future. I'll wait until we get a decent API, I think. I have been looking at SSE3, 4, and 5, but I don't even have a Core2 machine...Double precision support is the most important to me, but I am also interested in single precision and extended precision. To what degree are they supported?

Yes. (b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of the expression and the types involved. I assume that SSE instructions can't be used for extendedprecision operations.

Yes. I don't think they're even worth doing when strided arrays are involved, except in special cases.So there is only support for one dimensional arrays? When will two dimensional arrays be supported?

Strided arrays are next, then 2D. Once two dimensional arrays areimplemented, a simple test for the code generator would be to generate optimized code for single precision multiplication of 3x3 or 4x4 matrices. Optimal code for single precision 3x3 and 4x4 matrix multiplication is widely available on the internet. -Craig

Nov 20 2007

"Don Clugston" <dac nospam.com.au> wrote in message news:fhv35u$18nc$1 digitalmars.com...Craig Black wrote:"Don Clugston" <dac nospam.com.au> wrote in message news:fhv00l$13eq$1 digitalmars.com...For reference, here's the state of BLADE: -------

Very very cool Don! This obviously has huge potential. Do you have any plans to support GPU's or multicores? Have you looked into SSE3 or SSE4 to see if there are any new instructions that would be useful?

Yes. Although it seems to me that the GPU world is about to change radically -- using OpenGL looks like a horrible hack without long-term future. I'll wait until we get a decent API, I think. I have been looking at SSE3, 4, and 5, but I don't even have a Core2 machine...Double precision support is the most important to me, but I am also interested in single precision and extended precision. To what degree are they supported?

Yes. (b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of the expression and the types involved. I assume that SSE instructions can't be used for extendedprecision operations.

Yes. I don't think they're even worth doing when strided arrays are involved, except in special cases.So there is only support for one dimensional arrays? When will two dimensional arrays be supported?

Strided arrays are next, then 2D.

What are strided arrays? Are they similar to sparse arrays?

Nov 20 2007

Craig Black wrote:What are strided arrays? Are they similar to sparse arrays?

Say you have a two-dimensional array and you want to slice a specific column. By design, the elements are not adjacent in memory so a normal slice won't work. I believe such a slice is called a strided array. Sean

Nov 20 2007

"Don Clugston" <dac nospam.com.au> wrote in message news:fhv00l$13eq$1 digitalmars.com...For reference, here's the state of BLADE: Syntax is: mixin(vectorize("expression")); eg, mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant")); It would be so nice to get rid of the mixin() and the quotes. But they don't impede development in any way. Ideally it would be: vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant); Most significant changes since I my last post about it. (a) it finally works in a usable form :-). Thanks Walter for some great bugfixes in the past few releases! I'm using dmd 1.023. (b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of the expression and the types involved. (c) an expression rewriting step has been added, which performs scalar/const folding, and indexing/slicing folding (eg, (A+7*B)[0..5] ---> A[0..5] + 7*(B[0..5]) ). (d) it gives really, really nice error messages and debug output. No template garbage from the library internals -- just a straightforward one-line error message, giving the line in YOUR code which contains the error. eg, with an array a[], mixin(vectorize("any+= old*garbage")); mixin(vectorize("a+= 2")); you get this output (and NOTHING ELSE): demo.d(38): static assert "BLADE: Undefined symbols: any old garbage" demo.d(39): static assert "BLADE: Rank mismatch (addition or subtraction)" It's still not terribly useful, since it only generates code for packed vectors. But the infrastructure is very solid. Here's a particularly nasty example which the constant folding can cope with, to generate SSE2 code: double [] a = new double[4]; double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4]; a[0..$] = [3.4, 565, 31.3, 41.8]; double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]]; mixin(vectorize( ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`)); ------- Generates this front-end code (compile with -debug=BladeFrontEnd to see it). Note that there are many asserts to give nice debug info at runtime, but the only runtime code is a single function call, which passes 3 pointers and a double into an asm function (there's no inlining work for the compiler to do): ------ // bladedemo.d(34) a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$] assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`); assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector length mismatch`); assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`); assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector misalignment: d[2..($-1)][(3u-3)..$]`); assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector misalignment: another[][1][(3u-3)..$]`); SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]); ----- The function consists of this ASM code (compile with -debug=BladeBackEnd to see it; BTW all comments are auto-generated). Note there are only 8 asm instructions in the inner loop: ------- // Operation : ACB*D-+A= asm { push EBX; mov EAX, veclength; lea ECX, [8*EAX]; add ECX, values[0]; //A movsd XMM0, values[1]; shufpd XMM0, XMM0,0; //B lea EDX, [8*EAX]; add EDX, values[2]; //C lea EBX, [8*EAX]; add EBX, values[3]; //D xor EAX, EAX; sub EAX, veclength; // counter=-length jz short L2; // test for length==0 align 16; L1: movapd XMM1, [ECX + 8*EAX]; // A movapd XMM2, [EDX + 8*EAX]; // C mulpd XMM2, XMM0; // B* subpd XMM2, [EBX + 8*EAX]; // D- addpd XMM1, XMM2; //+ movapd [ECX + 8*EAX], XMM1; // A= add EAX,2; js L1; L2: sub EAX, 2; jns L4; movsd XMM1, [ECX + 8*EAX+16]; // A movsd XMM2, [EDX + 8*EAX+16]; // C mulsd XMM2, XMM0; // B* subsd XMM2, [EBX + 8*EAX+16]; // D- addsd XMM1, XMM2; //+ movsd [ECX + 8*EAX+16], XMM1; // A= L4: ; pop EBX; } -------

Another question. How practical would this mixin technique be to generate optimal floating point code for regular floating point operations (not vectors or matrices)? The DMD backend currently doesn't optimize floating point very well. Perhaps this approach could be used for floating point operations in general.

Nov 20 2007

Hello! Just a few sort comments and ideas... Don Clugston Wrote:No, I don't have the perfect Matrix. I just want to explore the question of whether it exists. There are a plethora of C++ matrix libraries, all of which are far from perfect; they all involve trade-offs between various language limitations.

Appart from these C++ matrix libs, I've come to appreciate these array/matrix API's: http://numpy.scipy.org/ Fortran 95 arrays MATLAB arrays Maple linear algebra API'sIn particular, the usage of expression templates creates all kinds of evil. Requirements: 1. Must be compatible with future multi-dimensional array syntax such as double [7,5] m;

As it has already been mentioned, is the target of this Matrix only 2-dimensional? In mathematical terms a matrix is always a 2-dimensional array with a proper algebra. For example, AFAIK the transpose operation is not defined for a 3-dimensional array.2. Must be theoretically capable of optimal performance. 3. Needs to support slices. This means we also need to support strided vectors. Therefore we also need matrix slices, and matrix references. Probably distinguishing between the T[] and T[new] concepts. 4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand). 5. Can be generalized to tensors.

What does this mean? AFAIK tensors are not a mere dimensional generalization of matrices. Tensors are "geometric" entities that are independent of the coordinate system. Besides all this play with the covariant/contravariant you haven't got with matrices. 0-order tensors are scalars, 1-order tensors are vectors. We can use matrices to represent scalars, different covariant/contravariant vectors and 2-order tensors. I would like to see a Matrix type but respecting the mathematical abstraction it should represent.6. Nice error messages. I now have enough experience with my BLADE library to be confident that all of these features can be achieved in existing D. In particular, the crucial requirement 2 can be completely decoupled from the others. But, there are many requirements (dozens?) which are not on my brief list, and perhaps they cannot all be met simultaneously, even in theory. Yet I'm not convinced that it's impossible. What are the other requirements?

IMHO I think that when implementing (classic) mathematical abstractions, its primitive conceptual design should be retained. Pablo

Nov 20 2007

I'm fond of blitz++ approach: 1. An array is a view of memory, with dimension, strides, and (possibly) non-zero bases. This design allows multiple, independent views of the same memory. 2. Memory allocation itself is ref counted. Benefits: * This design allows a slice, or view to be an lvalue. * Plays nicely with python - semantics are similar to numpy/numeric. specifically, I can return a view as a python object. Python can hold it, and ref counting keeps the underlying memory alive. * Generalized views. For example real(complex_array) is a view, and is an lvalue.

Nov 20 2007

Hi, regarding the organisation of matrices maybe it'd pay to take a look at how Lush is doing them, as it's designed (and known) to do them quite well. I'll quote from the Lush tutorial: Matrices are really composed of two separate entities: * the storage or srg which contains the following fields: a pointer to the actual data , an element type identifier (and the size thereof) , and flags that indicate if the data is writable or not , if it is in RAM or memory-mapped from a file. * the index or idx points to a srg and contains the following fields: the number of dimensions of the tensor , the size in each dimension , the address increment from one element to the next in each dimension (called modulo ) , and the offset of the first tensor element in the storage . This structure allows multiple idx to point to the same srg thereby allowing the data to be accessed in multiple ways without copying. regards, frank

Nov 20 2007

0ffh wrote:Hi, regarding the organisation of matrices maybe it'd pay to take a look at how Lush is doing them, as it's designed (and known) to do them quite well. I'll quote from the Lush tutorial: Matrices are really composed of two separate entities: * the storage or srg which contains the following fields: a pointer to the actual data , an element type identifier (and the size thereof) , and flags that indicate if the data is writable or not , if it is in RAM or memory-mapped from a file. * the index or idx points to a srg and contains the following fields: the number of dimensions of the tensor , the size in each dimension , the address increment from one element to the next in each dimension (called modulo ) , and the offset of the first tensor element in the storage . This structure allows multiple idx to point to the same srg thereby allowing the data to be accessed in multiple ways without copying.

That sounds like the way NumPy in python does it too. And maybe Blitz++ as well? --bb

Nov 20 2007

Bill Baxter wrote:That sounds like the way NumPy in python does it too. And maybe Blitz++ as well?

Humm, upon reading the last few posts on this topic I decided that I was probably way behind with this comment... I should read more first. Seems this way of handling things is widely used; the more reason to suspect there might be something to it. =) regards, frank

Nov 20 2007

Don Clugston wrote:No, I don't have the perfect Matrix. I just want to explore the question of whether it exists. There are a plethora of C++ matrix libraries, all of which are far from perfect; they all involve trade-offs between various language limitations. In particular, the usage of expression templates creates all kinds of evil. Requirements: 1. Must be compatible with future multi-dimensional array syntax such as double [7,5] m; 2. Must be theoretically capable of optimal performance. 3. Needs to support slices. This means we also need to support strided vectors. Therefore we also need matrix slices, and matrix references. Probably distinguishing between the T[] and T[new] concepts. 4. For storage we need packed, triangular, symmetric, banded, and sparse matrix support. (Possibly even 'calculated' matrix support, for user-defined matrix types which generate entries on-demand). 5. Can be generalized to tensors. 6. Nice error messages.

Take a look at the old vectorization debate. http://all-technology.com/eigenpolls/dwishlist/index.php?it=10 http://www.google.com/search?q=vectorization+&domains=www.digitalmars.com&sitesearch=www.digitalmars.com

Dec 01 2007