digitalmars.D - The Matrix to end all Matrix classes (Let's dream!)

Don Clugston (23/23) Nov 20 2007 No, I don't have the perfect Matrix. I just want to explore the question...

Oskar Linde (43/65) Nov 20 2007 A very interesting topic indeed. I will just give some quick thoughts to...

Bill Baxter (51/119) Nov 20 2007 I implemented the third case in Murray (nee MultiArray)

TomD (12/100) Nov 20 2007 In 2D Finite Elements some other fixed size small arrays are common, 6x6...

Bill Baxter (11/66) Nov 20 2007 And wouldn't you know it. Right after I wrote that I sat down and

Don Clugston (21/92) Nov 21 2007 Me too :-) Quite possibly it's all too hard.

TomD (12/100) Nov 20 2007 In 2D Finite Elements some other fixed size small arrays are common, 6x6...

Bill Baxter (7/14) Nov 22 2007 I just implemented a version of this one this morning:

Craig Black (13/25) Nov 20 2007 Another requirement would be elegant syntax that "fits" with D syntax. ...

Craig Black (1/5) Nov 20 2007 Er, I meant BLADE not BLAS.

0ffh (15/16) Nov 20 2007 Very interesting indeed! I do a lot of scientific
Don Clugston (90/90) Nov 20 2007 For reference, here's the state of BLADE:

Craig Black (16/113) Nov 20 2007 Very very cool Don! This obviously has huge potential. Do you have any...

Don Clugston (13/36) Nov 20 2007 Yes. Although it seems to me that the GPU world is about to change radic...

Craig Black (3/32) Nov 20 2007 What are strided arrays? Are they similar to sparse arrays?

Sean Kelly (5/7) Nov 20 2007 Say you have a two-dimensional array and you want to slice a specific

Craig Black (7/104) Nov 20 2007 Another question. How practical would this mixin technique be to genera...

=?ISO-8859-1?Q?Pablo_Ripoll=e9s?= (12/38) Nov 20 2007 Hello!
Neal Becker (12/12) Nov 20 2007 I'm fond of blitz++ approach:
Dan (10/25) Nov 20 2007 Recreating the Python square wheel in D won't serve the needs of very ma...

Bill Baxter (22/49) Nov 20 2007 Maybe not, but if you'd like to take it for a test drive, go visit

Robert DaSilva (8/27) Nov 20 2007 Can you propose a syntax for that?

Ary Borenszweig (2/20) Nov 20 2007 [1 .. 2, 3 .. 4]

Bill Baxter (13/34) Nov 20 2007 :-) What about the other end? How do you define the opSlice that gets

Dan (6/19) Nov 20 2007 I would certainly agree that the [n,n] notation seems to have some major...

Bill Baxter (7/42) Nov 20 2007 Robert DaSilva wrote:

Don Clugston (4/49) Nov 20 2007 Not really. I was saying, * what does the ultimate goal look like?

0ffh (15/15) Nov 20 2007 Hi, regarding the organisation of matrices maybe it'd pay to take a look...

Bill Baxter (4/24) Nov 20 2007 That sounds like the way NumPy in python does it too. And maybe Blitz++...

0ffh (6/8) Nov 20 2007 Humm, upon reading the last few posts on this topic I decided that I was

Knud Soerensen (4/22) Dec 01 2007 Take a look at the old vectorization debate.

Don Clugston <dac nospam.com.au> writes:

No, I don't have the perfect Matrix. I just want to explore the question of 
whether it exists.
There are a plethora of C++ matrix libraries, all of which are far from
perfect; 
they all involve trade-offs between various language limitations.
In particular, the usage of expression templates creates all kinds of evil.

Requirements:
1. Must be compatible with future multi-dimensional array syntax such as
double [7,5] m;
2. Must be theoretically capable of optimal performance.
3. Needs to support slices. This means we also need to support strided
vectors. Therefore we also need matrix slices, and matrix references.
Probably distinguishing between the T[] and T[new] concepts.
4. For storage we need packed, triangular, symmetric, banded, and sparse
matrix support. (Possibly even 'calculated' matrix support, for
user-defined matrix types which generate entries on-demand).
5. Can be generalized to tensors.
6. Nice error messages.

I now have enough experience with my BLADE library to be confident that all of 
these features can be achieved in existing D. In particular, the crucial 
requirement 2 can be completely decoupled from the others.

But, there are many requirements (dozens?) which are not on my brief list, and 
perhaps they cannot all be met simultaneously, even in theory. Yet I'm not 
convinced that it's impossible. What are the other requirements?

Nov 20 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Don Clugston wrote:
 No, I don't have the perfect Matrix. I just want to explore the question 
 of whether it exists.

A very interesting topic indeed. I will just give some quick thoughts to 
start with. First of all, a matrix is (generally) 2-dimensional, but 
there are many uses for higher-dimensional structures as well. Also, the 
1 dimensional vector is a useful structure that could be handled in the 
same generic framework. Is the target just 2-dimensional matrices or 
also higher dimensional arrays?

The matrix as a mathematical structure is quite different in semantics 
and use from generic multi-dimensional arrays and should perhaps be 
treated separately.

To avoid confusion, I will use "size" to refer to the number of elements 
of a matrix/array and "dimensionality" to refer to the number of 
dimensions of an array (a vector is 1-dimensional, a matrix is 
2-dimensional).

First, there is the separation of compile time and run time. One can 
identify (at least) 3 different needs, with different requirements for 
run-time/compile-time dynamics:

* Small static matrices/arrays (where the size and dimensionality are 
known at compile time)
* Dynamic matrices/arrays (where the sizes are determined at run-time, 
but dimensionality is fixed at compile time)
* Fully dynamic multidimensional arrays (where the dimensionality is 
determined at run time)

The third case could probably in most cases be left out, but the first 
two make an interesting separation. Knowing the size at compile time is 
mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
geometric translations). Larger matrices (eg 1000x1000) probably gain 
much less from compile time known sizes. There is also the matter of 
template bloat that means that compile time quantities are best left to 
the smaller numbers.

 Requirements:
 1. Must be compatible with future multi-dimensional array syntax such as
 double [7,5] m;
 2. Must be theoretically capable of optimal performance.

Seems reasonable.

 3. Needs to support slices. This means we also need to support strided
 vectors. Therefore we also need matrix slices, and matrix references.

Yes, both strided views and non-strided references are definitely needed.

 Probably distinguishing between the T[] and T[new] concepts.

Such a distinction is definitely of an advantage. For large matrices, 
ownership of memory becomes important, as such structures may be too 
large to be able to rely on the GC.

 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).

For generality, at least sparse (and possibly calculated) are useful for 
vectors and higher dimensional arrays too.

 5. Can be generalized to tensors.
 6. Nice error messages.
 
 I now have enough experience with my BLADE library to be confident that 
 all of these features can be achieved in existing D. In particular, the 
 crucial requirement 2 can be completely decoupled from the others.

Sounds excellent.

 But, there are many requirements (dozens?) which are not on my brief 
 list, and perhaps they cannot all be met simultaneously, even in theory. 
 Yet I'm not convinced that it's impossible. What are the other 
 requirements?

Naturally, different element types (as well UDT). Compatibilty with 
external libraries (Fortran/C) with the possibility of switching between 
Fortran and C layout.

-- 
Oskar

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 Don Clugston wrote:
 No, I don't have the perfect Matrix. I just want to explore the 
 question of whether it exists.

 
 A very interesting topic indeed. I will just give some quick thoughts to 
 start with. First of all, a matrix is (generally) 2-dimensional, but 
 there are many uses for higher-dimensional structures as well. Also, the 
 1 dimensional vector is a useful structure that could be handled in the 
 same generic framework. Is the target just 2-dimensional matrices or 
 also higher dimensional arrays?
 
 The matrix as a mathematical structure is quite different in semantics 
 and use from generic multi-dimensional arrays and should perhaps be 
 treated separately.
 
 To avoid confusion, I will use "size" to refer to the number of elements 
 of a matrix/array and "dimensionality" to refer to the number of 
 dimensions of an array (a vector is 1-dimensional, a matrix is 
 2-dimensional).
 
 First, there is the separation of compile time and run time. One can 
 identify (at least) 3 different needs, with different requirements for 
 run-time/compile-time dynamics:
 
 * Small static matrices/arrays (where the size and dimensionality are 
 known at compile time)
 * Dynamic matrices/arrays (where the sizes are determined at run-time, 
 but dimensionality is fixed at compile time)
 * Fully dynamic multidimensional arrays (where the dimensionality is 
 determined at run time)
 
 The third case could probably in most cases be left out, 

I implemented the third case in Murray (nee MultiArray)
    www.dsource.org/projects/multiarray

Mainly because I originally wanted an easy way port my NumPy code to D.
But I have to agree that after spending a lot of time on it, I would go 

compile time.

 but the first 
 two make an interesting separation. Knowing the size at compile time is 
 mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
 geometric translations). Larger matrices (eg 1000x1000) probably gain 
 much less from compile time known sizes. There is also the matter of 
 template bloat that means that compile time quantities are best left to 
 the smaller numbers.


occasionally things like 2x3 to represent affine transformations, but 
then interpretation starts getting in the way.  You can't handle it like 
a generic 2x3 matrix.  So to me for the compile time size case, you can 
pretty much get by with just a fixed set of classes.  Of course if you 
can generate those efficiently from a big template, then that's great. 
But there are usually a lot of special cases.  If you look at a 
geometry-oriented linalg lib like Helix you can find many little 
differences in the API between Matrix22 and Matrix33.  They can be taken 
care of with static ifs, but the question is if the loss of readability 
it's worth it in the end.

 
 Requirements:
 1. Must be compatible with future multi-dimensional array syntax such as
 double [7,5] m;
 2. Must be theoretically capable of optimal performance.

 
 Seems reasonable.
 
 3. Needs to support slices. This means we also need to support strided
 vectors. Therefore we also need matrix slices, and matrix references.

 
 Yes, both strided views and non-strided references are definitely needed.
 
 Probably distinguishing between the T[] and T[new] concepts.

 
 Such a distinction is definitely of an advantage. For large matrices, 
 ownership of memory becomes important, as such structures may be too 
 large to be able to rely on the GC.

Sorry, I forgot what this distinction was.  Can someone remind me?

 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).

 
 For generality, at least sparse (and possibly calculated) are useful for 
 vectors and higher dimensional arrays too.

 5. Can be generalized to tensors.


Numbers 4 and 5 are a bit conflicting.  There is no "triangular storage" 
for anything other than 2-D arrays.  Same goes for the commonly used 
compressed matrix formats: they are 2D-specific.

Convenient usage for linear algebra also suggests that opMul should do 
linear algebraic multiplication, but that doesn't make sense for 
tensors.  Changing opMul to do different things depending on the 
dimension seems inconsistent.

My feeling is that the generic N-D array concept needs to be separated 
from the linear algebra concepts.

However, I think all these needs can be accommodated by layering things 
appropriately.

first Storage
then  N-D array using some specific storage
then  Matrix using 2-D array, and Vector using 1-D array.

Implementations of the storage concept can be dimension-specific (like 
CCS sparse), or generic (like strided memory).

N-D array will have only operators that make sense for N-D arrays 
without reference to specific interpretations.  So opMul would be 
element-wise multiplication if anything.

Matrix and Vector can implement the linear algebraic rules.

 Yet I'm not convinced that it's impossible. What are the other
 requirements?

 
 Naturally, different element types (as well UDT). 

I think the UDT part in particular is difficult to do well without 
reference return values.  Even now I think with a complex matrix you 
have the problem that you can't set the real part of an element using 
opIndexAssign:
   mat[3,2].re = 5.  // oops you just set the real part of a copy!

Or for increment for any type of matrix (+= 1.0).  But that's old news.


I think you can achieve something decent using current D, but the syntax 
will not be ideal.   And doing all the above will be a very large effort.

I started collecting a list of possible libraries to draw inspiration 
from over at the Murray forum: 
http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2

--bb

Nov 20 2007

TomD <t_demmer nospam.web.de> writes:

Bill Baxter Wrote:
[...]
 The third case could probably in most cases be left out, 

 
 I implemented the third case in Murray (nee MultiArray)
     www.dsource.org/projects/multiarray
 
 Mainly because I originally wanted an easy way port my NumPy code to D.
 But I have to agree that after spending a lot of time on it, I would go 

 compile time.
 
 but the first 
 two make an interesting separation. Knowing the size at compile time is 
 mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
 geometric translations). Larger matrices (eg 1000x1000) probably gain 
 much less from compile time known sizes. There is also the matter of 
 template bloat that means that compile time quantities are best left to 
 the smaller numbers.

 

 occasionally things like 2x3 to represent affine transformations, but 
 then interpretation starts getting in the way.  You can't handle it like 
 a generic 2x3 matrix.  So to me for the compile time size case, you can 
 pretty much get by with just a fixed set of classes.  Of course if you 
 can generate those efficiently from a big template, then that's great. 
 But there are usually a lot of special cases.  If you look at a 
 geometry-oriented linalg lib like Helix you can find many little 
 differences in the API between Matrix22 and Matrix33.  They can be taken 
 care of with static ifs, but the question is if the loss of readability 
 it's worth it in the end.

In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, 
depending on the type of elements you use, also some non-square ones.
[...]
 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).

 
 For generality, at least sparse (and possibly calculated) are useful for 
 vectors and higher dimensional arrays too.

  >
 5. Can be generalized to tensors.


 
 Numbers 4 and 5 are a bit conflicting.  There is no "triangular storage" 
 for anything other than 2-D arrays.  Same goes for the commonly used 
 compressed matrix formats: they are 2D-specific.

Well, that becomes something like "pyramidal" structure in 3D.
They are quite common in 3D partial differential equations, like
Laplace solvers, or in Fluid Mechanics.

 
 Convenient usage for linear algebra also suggests that opMul should do 
 linear algebraic multiplication, but that doesn't make sense for 
 tensors.  Changing opMul to do different things depending on the 
 dimension seems inconsistent.

For Tensors we'd need something like ., :, and x, dot product, double dot
product and cross product. It may seem inconsistent, but it is math :-)

Ciao
Tom

 
 My feeling is that the generic N-D array concept needs to be separated 
 from the linear algebra concepts.
 
 However, I think all these needs can be accommodated by layering things 
 appropriately.
 
 first Storage
 then  N-D array using some specific storage
 then  Matrix using 2-D array, and Vector using 1-D array.
 
 Implementations of the storage concept can be dimension-specific (like 
 CCS sparse), or generic (like strided memory).
 
 N-D array will have only operators that make sense for N-D arrays 
 without reference to specific interpretations.  So opMul would be 
 element-wise multiplication if anything.
 
 Matrix and Vector can implement the linear algebraic rules.
 
 Yet I'm not convinced that it's impossible. What are the other
 requirements?

 
 Naturally, different element types (as well UDT). 

 
 I think the UDT part in particular is difficult to do well without 
 reference return values.  Even now I think with a complex matrix you 
 have the problem that you can't set the real part of an element using 
 opIndexAssign:
    mat[3,2].re = 5.  // oops you just set the real part of a copy!
 
 Or for increment for any type of matrix (+= 1.0).  But that's old news.
 
 
 I think you can achieve something decent using current D, but the syntax 
 will not be ideal.   And doing all the above will be a very large effort.
 
 I started collecting a list of possible libraries to draw inspiration 
 from over at the Murray forum: 
 http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2
 
 --bb

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

TomD wrote:
 Bill Baxter Wrote:
 [...]
 The third case could probably in most cases be left out, 

 I implemented the third case in Murray (nee MultiArray)
     www.dsource.org/projects/multiarray

 Mainly because I originally wanted an easy way port my NumPy code to D.
 But I have to agree that after spending a lot of time on it, I would go 

 compile time.

 but the first 
 two make an interesting separation. Knowing the size at compile time is 
 mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
 geometric translations). Larger matrices (eg 1000x1000) probably gain 
 much less from compile time known sizes. There is also the matter of 
 template bloat that means that compile time quantities are best left to 
 the smaller numbers.


 occasionally things like 2x3 to represent affine transformations, but 
 then interpretation starts getting in the way.  You can't handle it like 
 a generic 2x3 matrix.  So to me for the compile time size case, you can 
 pretty much get by with just a fixed set of classes.  Of course if you 
 can generate those efficiently from a big template, then that's great. 
 But there are usually a lot of special cases.  If you look at a 
 geometry-oriented linalg lib like Helix you can find many little 
 differences in the API between Matrix22 and Matrix33.  They can be taken 
 care of with static ifs, but the question is if the loss of readability 
 it's worth it in the end.

 
 In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, 
 depending on the type of elements you use, also some non-square ones.

And wouldn't you know it.  Right after I wrote that I sat down and 
started working on some code where I really needed 2x3 matrices.  Doh! 
I guess I just never noticed needing them before.

 [...]
 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).

 For generality, at least sparse (and possibly calculated) are useful for 
 vectors and higher dimensional arrays too.

  >
 5. Can be generalized to tensors.


 Numbers 4 and 5 are a bit conflicting.  There is no "triangular storage" 
 for anything other than 2-D arrays.  Same goes for the commonly used 
 compressed matrix formats: they are 2D-specific.

 
 Well, that becomes something like "pyramidal" structure in 3D.
 They are quite common in 3D partial differential equations, like
 Laplace solvers, or in Fluid Mechanics.
 
 Convenient usage for linear algebra also suggests that opMul should do 
 linear algebraic multiplication, but that doesn't make sense for 
 tensors.  Changing opMul to do different things depending on the 
 dimension seems inconsistent.

 
 For Tensors we'd need something like ., :, and x, dot product, double dot
 product and cross product. It may seem inconsistent, but it is math :-)

So the thing to do seems to be to have different wrappers for the 
underlying storage methods that expose different operations.

Or just different sets of functions that treat the storage in different 
ways.  linalg.dot, tensor.dot, ...

It's quite a large software engineering problem.  But I'm looking 
forward to the day when Don is done writing it.  :-)

--bb

Nov 20 2007

Don Clugston <dac nospam.com.au> writes:

Bill Baxter wrote:
 TomD wrote:
 Bill Baxter Wrote:
 [...]
 The third case could probably in most cases be left out, 

 I implemented the third case in Murray (nee MultiArray)
     www.dsource.org/projects/multiarray

 Mainly because I originally wanted an easy way port my NumPy code to D.
 But I have to agree that after spending a lot of time on it, I would 

 compile time.

 but the first two make an interesting separation. Knowing the size 
 at compile time is mostly an advantage for smaller matrices (eg 4x4, 
 as commonly used for geometric translations). Larger matrices (eg 
 1000x1000) probably gain much less from compile time known sizes. 
 There is also the matter of template bloat that means that compile 
 time quantities are best left to the smaller numbers.


 Maybe occasionally things like 2x3 to represent affine 
 transformations, but then interpretation starts getting in the way.  
 You can't handle it like a generic 2x3 matrix.  So to me for the 
 compile time size case, you can pretty much get by with just a fixed 
 set of classes.  Of course if you can generate those efficiently from 
 a big template, then that's great. But there are usually a lot of 
 special cases.  If you look at a geometry-oriented linalg lib like 
 Helix you can find many little differences in the API between 
 Matrix22 and Matrix33.  They can be taken care of with static ifs, 
 but the question is if the loss of readability it's worth it in the end.

 In 2D Finite Elements some other fixed size small arrays are common, 
 6x6, 9x9, depending on the type of elements you use, also some 
 non-square ones.

 
 And wouldn't you know it.  Right after I wrote that I sat down and 
 started working on some code where I really needed 2x3 matrices.  Doh! I 
 guess I just never noticed needing them before.
 
 [...]
 4. For storage we need packed, triangular, symmetric, banded, and 
 sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).

 For generality, at least sparse (and possibly calculated) are useful 
 for vectors and higher dimensional arrays too.

  >
 5. Can be generalized to tensors.


 Numbers 4 and 5 are a bit conflicting.  There is no "triangular 
 storage" for anything other than 2-D arrays.  Same goes for the 
 commonly used compressed matrix formats: they are 2D-specific.

 Well, that becomes something like "pyramidal" structure in 3D.
 They are quite common in 3D partial differential equations, like
 Laplace solvers, or in Fluid Mechanics.

 Convenient usage for linear algebra also suggests that opMul should 
 do linear algebraic multiplication, but that doesn't make sense for 
 tensors.  Changing opMul to do different things depending on the 
 dimension seems inconsistent.

 For Tensors we'd need something like ., :, and x, dot product, double dot
 product and cross product. It may seem inconsistent, but it is math :-)

 
 So the thing to do seems to be to have different wrappers for the 
 underlying storage methods that expose different operations.
 
 Or just different sets of functions that treat the storage in different 
 ways.  linalg.dot, tensor.dot, ...
 
 It's quite a large software engineering problem.  But I'm looking 
 forward to the day when Don is done writing it.  :-)

Me too :-) Quite possibly it's all too hard.

Yet it seems to me that fundamentally, the storage class just needs to expose 
how to get from one element to another efficiently.

eg for a matrix, expose some compile-time traits
m.packingType   --- row, column, transposable(at runtime, will always be either 
row or column, please generate code for row & column and choose between them at 
runtime), none (no optimisation potential).

and then if row-based:

m.firstElementInRow(y)
m.lastElementInRow(y)
m.rowstride

and then optimal access is something like:

for (int i=0; i<m.numColumns; ++i) {
   for (auto t= &m[i][firstElementInRow(i)];
	t<=&m[i][lastElementInRow(i)]; t+=m.rowStride) {
	// *t  is m[i][j]
   }
}

It all gets very complicated, but the saving grace is that it only needs to be 
optimal for the common cases.

Nov 21 2007

TomD <t_demmer nospam.web.de> writes:

Bill Baxter Wrote:
[...]
 The third case could probably in most cases be left out, 

 
 I implemented the third case in Murray (nee MultiArray)
     www.dsource.org/projects/multiarray
 
 Mainly because I originally wanted an easy way port my NumPy code to D.
 But I have to agree that after spending a lot of time on it, I would go 

 compile time.
 
 but the first 
 two make an interesting separation. Knowing the size at compile time is 
 mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
 geometric translations). Larger matrices (eg 1000x1000) probably gain 
 much less from compile time known sizes. There is also the matter of 
 template bloat that means that compile time quantities are best left to 
 the smaller numbers.

 

 occasionally things like 2x3 to represent affine transformations, but 
 then interpretation starts getting in the way.  You can't handle it like 
 a generic 2x3 matrix.  So to me for the compile time size case, you can 
 pretty much get by with just a fixed set of classes.  Of course if you 
 can generate those efficiently from a big template, then that's great. 
 But there are usually a lot of special cases.  If you look at a 
 geometry-oriented linalg lib like Helix you can find many little 
 differences in the API between Matrix22 and Matrix33.  They can be taken 
 care of with static ifs, but the question is if the loss of readability 
 it's worth it in the end.

In 2D Finite Elements some other fixed size small arrays are common, 6x6, 9x9, 
depending on the type of elements you use, also some non-square ones.
[...]
 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).

 
 For generality, at least sparse (and possibly calculated) are useful for 
 vectors and higher dimensional arrays too.

  >
 5. Can be generalized to tensors.


 
 Numbers 4 and 5 are a bit conflicting.  There is no "triangular storage" 
 for anything other than 2-D arrays.  Same goes for the commonly used 
 compressed matrix formats: they are 2D-specific.

Well, that becomes something like "pyramidal" structure in 3D.
They are quite common in 3D partial differential equations, like
Laplace solvers, or in Fluid Mechanics.

 
 Convenient usage for linear algebra also suggests that opMul should do 
 linear algebraic multiplication, but that doesn't make sense for 
 tensors.  Changing opMul to do different things depending on the 
 dimension seems inconsistent.

For Tensors we'd need something like ., :, and x, dot product, double dot
product and cross product. It may seem inconsistent, but it is math :-)

Ciao
Tom

 
 My feeling is that the generic N-D array concept needs to be separated 
 from the linear algebra concepts.
 
 However, I think all these needs can be accommodated by layering things 
 appropriately.
 
 first Storage
 then  N-D array using some specific storage
 then  Matrix using 2-D array, and Vector using 1-D array.
 
 Implementations of the storage concept can be dimension-specific (like 
 CCS sparse), or generic (like strided memory).
 
 N-D array will have only operators that make sense for N-D arrays 
 without reference to specific interpretations.  So opMul would be 
 element-wise multiplication if anything.
 
 Matrix and Vector can implement the linear algebraic rules.
 
 Yet I'm not convinced that it's impossible. What are the other
 requirements?

 
 Naturally, different element types (as well UDT). 

 
 I think the UDT part in particular is difficult to do well without 
 reference return values.  Even now I think with a complex matrix you 
 have the problem that you can't set the real part of an element using 
 opIndexAssign:
    mat[3,2].re = 5.  // oops you just set the real part of a copy!
 
 Or for increment for any type of matrix (+= 1.0).  But that's old news.
 
 
 I think you can achieve something decent using current D, but the syntax 
 will not be ideal.   And doing all the above will be a very large effort.
 
 I started collecting a list of possible libraries to draw inspiration 
 from over at the Murray forum: 
 http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2
 
 --bb

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 Don Clugston wrote:

 First, there is the separation of compile time and run time. One can 
 identify (at least) 3 different needs, with different requirements for 
 run-time/compile-time dynamics:
 
 * Small static matrices/arrays (where the size and dimensionality are 
 known at compile time)

I just implemented a version of this one this morning:
http://www.dsource.org/projects/openmeshd/browser/trunk/OpenMeshD/OpenMesh/Core/Geometry/MatrixT.d

(There's a VectorT there too, but I haven't made any connections between 
the two yet  -- mat*vec ops and such.)
http://www.dsource.org/projects/openmeshd/browser/trunk/OpenMeshD/OpenMesh/Core/Geometry/MatrixT.d

--bb

Nov 22 2007

"Craig Black" <craigblack2 cox.net> writes:

 Requirements:
 1. Must be compatible with future multi-dimensional array syntax such as
 double [7,5] m;
 2. Must be theoretically capable of optimal performance.
 3. Needs to support slices. This means we also need to support strided
 vectors. Therefore we also need matrix slices, and matrix references.
 Probably distinguishing between the T[] and T[new] concepts.
 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).
 5. Can be generalized to tensors.
 6. Nice error messages.

Another requirement would be elegant syntax that "fits" with D syntax.  Is 
that what you were trying to communicate with requirement 1?  If I remember 
correctly doesn't BLAS syntax look a little funny because it relies on 
mixins?

If you want requirement 2 then you must be able to leverage any available 
GPU's or GPGPU's.  This is going to be especially useful when Nvidia and AMD 
release their GPU-like processors that will be able to do double precision 
floating point operations.  Nvidia's (yet unnamed) processor is due out in 
December.  AMD's Firestorm is due in the first quarter of '08.  My company 
has plans to develop a matrix library for physics simulations that will 
support these new processors.  However, if one was already available for us 
then we wouldn't need to develop it.

-Craig

Nov 20 2007

"Craig Black" <craigblack2 cox.net> writes:

 Another requirement would be elegant syntax that "fits" with D syntax.  Is
 that what you were trying to communicate with requirement 1?  If I 
 remember correctly doesn't BLAS syntax look a little funny because it 
 relies on mixins?

Er, I meant BLADE not BLAS.

Nov 20 2007

0ffh <frank frankhirsch.youknow.what.todo.net> writes:

Don Clugston wrote:
 [about TMtEAMC]

Very interesting indeed! I do a lot of scientific
computation with Matlab, and I'd love to have a
powerful matrix lib for D. It's just I'm currently
swamped with other work, so I couldn't be much
help... :-(
The first thing that springs in my mind is which
operations to choose for operator overloading?
Given that a distinction is made between column
and row vectors, we can use the operand order
to choose between inner and outer product.
But what about element-wise product?
Same goes for division.
Maybe "fake infix operators" could be used?

regards, frank

Nov 20 2007

Don Clugston <dac nospam.com.au> writes:

For reference, here's the state of BLADE:

Syntax is:

mixin(vectorize("expression"));

eg,

mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant"));

It would be so nice to get rid of the mixin() and the quotes. But they don't 
impede development in any way. Ideally it would be:

vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);

Most significant changes since I my last post about it.
(a) it finally works in a usable form :-). Thanks Walter for some great
bugfixes 
in the past few releases!  I'm using dmd 1.023.
(b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of 
the expression and the types involved.
(c) an expression rewriting step has been added, which performs scalar/const 
folding, and indexing/slicing folding
    (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
(d) it gives really, really nice error messages and debug output. No template 
garbage from the library internals -- just a straightforward one-line error 
message, giving the line in YOUR code which contains the error.

eg,   with an array a[],
     mixin(vectorize("any+= old*garbage"));
     mixin(vectorize("a+= 2"));

you get this output (and NOTHING ELSE):
demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
demo.d(39): static assert  "BLADE: Rank mismatch (addition or subtraction)"

It's still not terribly useful, since it only generates code for packed
vectors. 
But the infrastructure is very solid.

Here's a particularly nasty example which the constant folding can cope with,
to 
generate SSE2 code:

     double [] a = new double[4];
     double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
     a[0..$] = [3.4, 565, 31.3, 41.8];
     double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]];

     mixin(vectorize(
` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));

-------
Generates this front-end code (compile with -debug=BladeFrontEnd to see it). 
Note that there are many asserts to give nice debug info at runtime, but the 
only runtime code is a single function call, which passes 3 pointers and a 
double into an asm function (there's no inlining work for the compiler to do):

------
// bladedemo.d(34)  a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`);
assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector 
length mismatch`);
assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
misalignment: d[2..($-1)][(3u-3)..$]`);
assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
misalignment: another[][1][(3u-3)..$]`);

SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);

-----
The function consists of this ASM code (compile with -debug=BladeBackEnd to see 
it; BTW all comments are auto-generated). Note there are only 8 asm
instructions 
in the inner loop:

-------
// Operation : ACB*D-+A=

asm {
  push EBX;
   mov EAX, veclength;
   lea ECX, [8*EAX];     add ECX, values[0];  //A
   movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
   lea EDX, [8*EAX];     add EDX, values[2];  //C
   lea EBX, [8*EAX];     add EBX, values[3];  //D
   xor EAX, EAX;
   sub EAX, veclength; // counter=-length
   jz short L2; // test for length==0

   align 16;
L1:
   movapd XMM1, [ECX + 8*EAX];  // A
   movapd XMM2, [EDX + 8*EAX];  // C
   mulpd XMM2, XMM0; // B*
   subpd  XMM2, [EBX + 8*EAX];  // D-
   addpd XMM1, XMM2;  //+
   movapd [ECX + 8*EAX], XMM1;  // A=
   add EAX,2;
   js L1;
L2:
   sub EAX, 2;
   jns L4;
   movsd XMM1, [ECX + 8*EAX+16];  // A
   movsd XMM2, [EDX + 8*EAX+16];  // C
   mulsd XMM2, XMM0; // B*
   subsd  XMM2, [EBX + 8*EAX+16];  // D-
   addsd XMM1, XMM2;  //+
   movsd [ECX + 8*EAX+16], XMM1;  // A=
L4:
;  pop EBX;
}
-------

Nov 20 2007

"Craig Black" <cblack ara.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:fhv00l$13eq$1 digitalmars.com...
 For reference, here's the state of BLADE:

 Syntax is:

 mixin(vectorize("expression"));

 eg,

 mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * 
 some_constant"));

 It would be so nice to get rid of the mixin() and the quotes. But they 
 don't impede development in any way. Ideally it would be:

 vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);

 Most significant changes since I my last post about it.
 (a) it finally works in a usable form :-). Thanks Walter for some great 
 bugfixes in the past few releases!  I'm using dmd 1.023.
 (b) generates X87, SSE, SSE2, or inline D code, depending on the 
 complexity of the expression and the types involved.
 (c) an expression rewriting step has been added, which performs 
 scalar/const folding, and indexing/slicing folding
    (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
 (d) it gives really, really nice error messages and debug output. No 
 template garbage from the library internals -- just a straightforward 
 one-line error message, giving the line in YOUR code which contains the 
 error.

 eg,   with an array a[],
     mixin(vectorize("any+= old*garbage"));
     mixin(vectorize("a+= 2"));

 you get this output (and NOTHING ELSE):
 demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
 demo.d(39): static assert  "BLADE: Rank mismatch (addition or 
 subtraction)"

 It's still not terribly useful, since it only generates code for packed 
 vectors. But the infrastructure is very solid.

 Here's a particularly nasty example which the constant folding can cope 
 with, to generate SSE2 code:

     double [] a = new double[4];
     double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
     a[0..$] = [3.4, 565, 31.3, 41.8];
     double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 
 2.43]];

     mixin(vectorize(
 ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));

 -------
 Generates this front-end code (compile with -debug=BladeFrontEnd to see 
 it). Note that there are many asserts to give nice debug info at runtime, 
 but the only runtime code is a single function call, which passes 3 
 pointers and a double into an asm function (there's no inlining work for 
 the compiler to do):

 ------
 // bladedemo.d(34)  a += 
 (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
 assert(a.length==another[][1][(3u-3)..$].length, `Vector length 
 mismatch`);
 assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, 
 `Vector length mismatch`);
 assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
 assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
 misalignment: d[2..($-1)][(3u-3)..$]`);
 assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE 
 Vector misalignment: another[][1][(3u-3)..$]`);

 SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);

 -----
 The function consists of this ASM code (compile with -debug=BladeBackEnd 
 to see it; BTW all comments are auto-generated). Note there are only 8 asm 
 instructions in the inner loop:

 -------
 // Operation : ACB*D-+A=

 asm {
  push EBX;
   mov EAX, veclength;
   lea ECX, [8*EAX];     add ECX, values[0];  //A
   movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
   lea EDX, [8*EAX];     add EDX, values[2];  //C
   lea EBX, [8*EAX];     add EBX, values[3];  //D
   xor EAX, EAX;
   sub EAX, veclength; // counter=-length
   jz short L2; // test for length==0

   align 16;
 L1:
   movapd XMM1, [ECX + 8*EAX];  // A
   movapd XMM2, [EDX + 8*EAX];  // C
   mulpd XMM2, XMM0; // B*
   subpd  XMM2, [EBX + 8*EAX];  // D-
   addpd XMM1, XMM2;  //+
   movapd [ECX + 8*EAX], XMM1;  // A=
   add EAX,2;
   js L1;
 L2:
   sub EAX, 2;
   jns L4;
   movsd XMM1, [ECX + 8*EAX+16];  // A
   movsd XMM2, [EDX + 8*EAX+16];  // C
   mulsd XMM2, XMM0; // B*
   subsd  XMM2, [EBX + 8*EAX+16];  // D-
   addsd XMM1, XMM2;  //+
   movsd [ECX + 8*EAX+16], XMM1;  // A=
 L4:
 ;  pop EBX;
 }
 -------

Very very cool Don!  This obviously has huge potential.  Do you have any 
plans to support GPU's or multicores?  Have you looked into SSE3 or SSE4 to 
see if there are any new instructions that would be useful?

Double precision support is the most important to me, but I am also 
interested in single precision and extended precision.  To what degree are 
they supported?  I assume that SSE instructions can't be used for extended 
precision operations.

So there is only support for one dimensional arrays?  When will two 
dimensional arrays be supported?  Once two dimensional arrays are 
implemented, a simple test for the code generator would be to generate 
optimized code for single precision multiplication of 3x3 or 4x4 matrices. 
Optimal code for single precision 3x3 and 4x4 matrix multiplication is 
widely available on the internet.

-Craig

Nov 20 2007

Don Clugston <dac nospam.com.au> writes:

Craig Black wrote:
 "Don Clugston" <dac nospam.com.au> wrote in message 
 news:fhv00l$13eq$1 digitalmars.com...
 For reference, here's the state of BLADE:
 -------

 
 Very very cool Don!  This obviously has huge potential.  Do you have any 
 plans to support GPU's or multicores?  Have you looked into SSE3 or SSE4 to 
 see if there are any new instructions that would be useful?

Yes. Although it seems to me that the GPU world is about to change radically -- 
using OpenGL looks like a horrible hack without long-term future. I'll wait 
until we get a decent API, I think.
I have been looking at SSE3, 4, and 5, but I don't even have a Core2 machine...

 Double precision support is the most important to me, but I am also 
 interested in single precision and extended precision.  To what degree are 
 they supported?  

Yes.

(b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of 
the expression and the types involved.

I assume that SSE instructions can't be used for extended
 precision operations.

Yes. I don't think they're even worth doing when strided arrays are involved, 
except in special cases.

 
 So there is only support for one dimensional arrays?  When will two 
 dimensional arrays be supported? 

Strided arrays are next, then 2D.

  Once two dimensional arrays are
 implemented, a simple test for the code generator would be to generate 
 optimized code for single precision multiplication of 3x3 or 4x4 matrices. 
 Optimal code for single precision 3x3 and 4x4 matrix multiplication is 
 widely available on the internet.
 
 -Craig

Nov 20 2007

"Craig Black" <cblack ara.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:fhv35u$18nc$1 digitalmars.com...
 Craig Black wrote:
 "Don Clugston" <dac nospam.com.au> wrote in message 
 news:fhv00l$13eq$1 digitalmars.com...
 For reference, here's the state of BLADE:
 -------

 Very very cool Don!  This obviously has huge potential.  Do you have any 
 plans to support GPU's or multicores?  Have you looked into SSE3 or SSE4 
 to see if there are any new instructions that would be useful?

 Yes. Although it seems to me that the GPU world is about to change 
 radically -- 
 using OpenGL looks like a horrible hack without long-term future. I'll 
 wait until we get a decent API, I think.
 I have been looking at SSE3, 4, and 5, but I don't even have a Core2 
 machine...

 Double precision support is the most important to me, but I am also 
 interested in single precision and extended precision.  To what degree 
 are they supported?

 Yes.

 (b) generates X87, SSE, SSE2, or inline D code, depending on the 
 complexity of the expression and the types involved.

 I assume that SSE instructions can't be used for extended
 precision operations.

 Yes. I don't think they're even worth doing when strided arrays are 
 involved, except in special cases.

 So there is only support for one dimensional arrays?  When will two 
 dimensional arrays be supported?

 Strided arrays are next, then 2D.

What are strided arrays?  Are they similar to sparse arrays?

Nov 20 2007

Sean Kelly <sean f4.ca> writes:

Craig Black wrote:
 
 What are strided arrays?  Are they similar to sparse arrays? 

Say you have a two-dimensional array and you want to slice a specific 
column.  By design, the elements are not adjacent in memory so a normal 
slice won't work.  I believe such a slice is called a strided array.


Sean

Nov 20 2007

"Craig Black" <cblack ara.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:fhv00l$13eq$1 digitalmars.com...
 For reference, here's the state of BLADE:

 Syntax is:

 mixin(vectorize("expression"));

 eg,

 mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * 
 some_constant"));

 It would be so nice to get rid of the mixin() and the quotes. But they 
 don't impede development in any way. Ideally it would be:

 vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);

 Most significant changes since I my last post about it.
 (a) it finally works in a usable form :-). Thanks Walter for some great 
 bugfixes in the past few releases!  I'm using dmd 1.023.
 (b) generates X87, SSE, SSE2, or inline D code, depending on the 
 complexity of the expression and the types involved.
 (c) an expression rewriting step has been added, which performs 
 scalar/const folding, and indexing/slicing folding
    (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
 (d) it gives really, really nice error messages and debug output. No 
 template garbage from the library internals -- just a straightforward 
 one-line error message, giving the line in YOUR code which contains the 
 error.

 eg,   with an array a[],
     mixin(vectorize("any+= old*garbage"));
     mixin(vectorize("a+= 2"));

 you get this output (and NOTHING ELSE):
 demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
 demo.d(39): static assert  "BLADE: Rank mismatch (addition or 
 subtraction)"

 It's still not terribly useful, since it only generates code for packed 
 vectors. But the infrastructure is very solid.

 Here's a particularly nasty example which the constant folding can cope 
 with, to generate SSE2 code:

     double [] a = new double[4];
     double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
     a[0..$] = [3.4, 565, 31.3, 41.8];
     double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 
 2.43]];

     mixin(vectorize(
 ` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));

 -------
 Generates this front-end code (compile with -debug=BladeFrontEnd to see 
 it). Note that there are many asserts to give nice debug info at runtime, 
 but the only runtime code is a single function call, which passes 3 
 pointers and a double into an asm function (there's no inlining work for 
 the compiler to do):

 ------
 // bladedemo.d(34)  a += 
 (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
 assert(a.length==another[][1][(3u-3)..$].length, `Vector length 
 mismatch`);
 assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, 
 `Vector length mismatch`);
 assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
 assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
 misalignment: d[2..($-1)][(3u-3)..$]`);
 assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE 
 Vector misalignment: another[][1][(3u-3)..$]`);

 SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);

 -----
 The function consists of this ASM code (compile with -debug=BladeBackEnd 
 to see it; BTW all comments are auto-generated). Note there are only 8 asm 
 instructions in the inner loop:

 -------
 // Operation : ACB*D-+A=

 asm {
  push EBX;
   mov EAX, veclength;
   lea ECX, [8*EAX];     add ECX, values[0];  //A
   movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
   lea EDX, [8*EAX];     add EDX, values[2];  //C
   lea EBX, [8*EAX];     add EBX, values[3];  //D
   xor EAX, EAX;
   sub EAX, veclength; // counter=-length
   jz short L2; // test for length==0

   align 16;
 L1:
   movapd XMM1, [ECX + 8*EAX];  // A
   movapd XMM2, [EDX + 8*EAX];  // C
   mulpd XMM2, XMM0; // B*
   subpd  XMM2, [EBX + 8*EAX];  // D-
   addpd XMM1, XMM2;  //+
   movapd [ECX + 8*EAX], XMM1;  // A=
   add EAX,2;
   js L1;
 L2:
   sub EAX, 2;
   jns L4;
   movsd XMM1, [ECX + 8*EAX+16];  // A
   movsd XMM2, [EDX + 8*EAX+16];  // C
   mulsd XMM2, XMM0; // B*
   subsd  XMM2, [EBX + 8*EAX+16];  // D-
   addsd XMM1, XMM2;  //+
   movsd [ECX + 8*EAX+16], XMM1;  // A=
 L4:
 ;  pop EBX;
 }
 -------

Another question.  How practical would this mixin technique be to generate 
optimal floating point code for regular floating point operations (not 
vectors or matrices)?  The DMD backend currently doesn't optimize floating 
point very well.  Perhaps this approach could be used for floating point 
operations in general.

Nov 20 2007

=?ISO-8859-1?Q?Pablo_Ripoll=e9s?= <in-call gmx.net> writes:

Hello!

Just a few sort comments and ideas...

Don Clugston Wrote:

 No, I don't have the perfect Matrix. I just want to explore the question of 
 whether it exists.
 There are a plethora of C++ matrix libraries, all of which are far from
perfect; 
 they all involve trade-offs between various language limitations.

Appart from these C++ matrix libs, I've come to appreciate these array/matrix
API's:

http://numpy.scipy.org/
Fortran 95 arrays
MATLAB arrays
Maple linear algebra API's

 In particular, the usage of expression templates creates all kinds of evil.
 
 Requirements:
 1. Must be compatible with future multi-dimensional array syntax such as
 double [7,5] m;

As it has already been mentioned, is the target of this Matrix only
2-dimensional? In mathematical terms a matrix is always a 2-dimensional array
with a proper algebra.  For example, AFAIK the transpose operation is not
defined for a 3-dimensional array.

 2. Must be theoretically capable of optimal performance.
 3. Needs to support slices. This means we also need to support strided
 vectors. Therefore we also need matrix slices, and matrix references.
 Probably distinguishing between the T[] and T[new] concepts.
 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).
 5. Can be generalized to tensors.

What does this mean?  AFAIK tensors are not a mere dimensional generalization
of matrices.  Tensors are "geometric" entities that are independent of the
coordinate system.   Besides all this play with the covariant/contravariant you
haven't got with matrices.  0-order tensors are scalars, 1-order tensors are
vectors.  We can use matrices to represent scalars, different
covariant/contravariant vectors and 2-order tensors.  I would like to see a
Matrix type but respecting the mathematical abstraction it should represent.

 6. Nice error messages.
 
 I now have enough experience with my BLADE library to be confident that all of 
 these features can be achieved in existing D. In particular, the crucial 
 requirement 2 can be completely decoupled from the others.
 
 But, there are many requirements (dozens?) which are not on my brief list, and 
 perhaps they cannot all be met simultaneously, even in theory. Yet I'm not 
 convinced that it's impossible. What are the other requirements?

IMHO I think that when implementing (classic) mathematical abstractions, its
primitive conceptual design should be retained.

Pablo

Nov 20 2007

Neal Becker <ndbecker2 gmail.com> writes:

I'm fond of blitz++ approach:

1.  An array is a view of memory, with dimension, strides, and (possibly)
non-zero bases.  This design allows multiple, independent views of the same
memory.

2. Memory allocation itself is ref counted.

Benefits: 

* This design allows a slice, or view to be an lvalue.
* Plays nicely with python - semantics are similar to numpy/numeric.
  specifically, I can return a view as a python object.  Python can hold it,
and ref counting keeps the underlying memory alive.
* Generalized views.  For example real(complex_array) is a view, and is an
lvalue.

Nov 20 2007

Dan <murpsoft hotmail.com> writes:

Neal Becker Wrote:

 I'm fond of blitz++ approach:
 
 1.  An array is a view of memory, with dimension, strides, and (possibly)
 non-zero bases.  This design allows multiple, independent views of the same
 memory.

An array already has a clear definition.  Anyone attempting to redefine terms
doesn't have enough familiarity with them.  Arrays are 1 dimensional segments
of discrete information.

 2. Memory allocation itself is ref counted.
 
 Benefits: 
 
 * This design allows a slice, or view to be an lvalue.
 * Plays nicely with python - semantics are similar to numpy/numeric.
   specifically, I can return a view as a python object.  Python can hold it,
 and ref counting keeps the underlying memory alive.
 * Generalized views.  For example real(complex_array) is a view, and is an
 lvalue.

Recreating the Python square wheel in D won't serve the needs of very many
people.

Especially if there's no value added.

Can we please get over the cryptolect bullshit and talk straight?  

D's slices already perform 90% of the stuff we want in order to achieve a
Matrix.  You more or less just want a standard library that lets you do it,
right?

D let's us use the sugar coated format:

myArray.method() if the method is written to take a myArray as the first
argument.

Regards,
Dan

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Dan wrote:
 Neal Becker Wrote:
 
 I'm fond of blitz++ approach:

 1.  An array is a view of memory, with dimension, strides, and (possibly)
 non-zero bases.  This design allows multiple, independent views of the same
 memory.

 
 An array already has a clear definition.  Anyone attempting to redefine terms
doesn't have enough familiarity with them.  Arrays are 1 dimensional segments
of discrete information.
 
 2. Memory allocation itself is ref counted.

 Benefits: 

 * This design allows a slice, or view to be an lvalue.
 * Plays nicely with python - semantics are similar to numpy/numeric.
   specifically, I can return a view as a python object.  Python can hold it,
 and ref counting keeps the underlying memory alive.
 * Generalized views.  For example real(complex_array) is a view, and is an
 lvalue.

 
 Recreating the Python square wheel in D won't serve the needs of very many
people.

Maybe not, but if you'd like to take it for a test drive, go visit 
www.dsource.org/projects/multiarray .  :-)

 Especially if there's no value added.

The value added is that it's no longer interpreted python code where the 
use of a for loop kills your performance.  And being similar to numpy 
under the hood opens the door for using D and pyD to easily create 
extension modules for Numpy.  Though multiarray doesn't do that currently.

 D's slices already perform 90% of the stuff we want in order to achieve a
Matrix.  You more or less just want a standard library that lets you do it,
right?

I'd say it lets you do about 50% of the stuff you want for a Matrix, but 
putting a single percentage figure on it is not really possible, because 
different features have different importance to different users.

In particular:
1) $ can't be used for opIndex or opSlice in user-classes.
2) D lacks a clean syntax for slices of multiple dimensions.  opSlice 
calls must have one and only one ".."
3) D lacks a way to return a reference.  opIndexAssign only takes you so 
far.
4) D lacks a way to transparently reference count memory.  (No 
equivalent of struct copy constructors or destructors)
5) D lacks a way to pass by const-reference (even in D2 -- but 
presumably this is just a bug in the D2.x compiler)

 D let's us use the sugar coated format:
 
 myArray.method() if the method is written to take a myArray as the first
argument.

I have no idea what problem this is suggesting a solution to.

--bb

Nov 20 2007

Robert DaSilva <sp.unit.262+digitalmars gmail.com> writes:

Bill Baxter wrote:
 Dan wrote: 
 D's slices already perform 90% of the stuff we want in order to
 achieve a Matrix.  You more or less just want a standard library that
 lets you do it, right?

 
 I'd say it lets you do about 50% of the stuff you want for a Matrix, but
 putting a single percentage figure on it is not really possible, because
 different features have different importance to different users.
 
 In particular:
 1) $ can't be used for opIndex or opSlice in user-classes.

I think there's a proposal for that already.

 2) D lacks a clean syntax for slices of multiple dimensions.  opSlice
 calls must have one and only one ".."

Can you propose a syntax for that?

 3) D lacks a way to return a reference.  opIndexAssign only takes you so
 far.

Isn't that in the works?

 4) D lacks a way to transparently reference count memory.  (No
 equivalent of struct copy constructors or destructors)

Can you think of a something that really need reference counting when
you have GC?

 5) D lacks a way to pass by const-reference (even in D2 -- but
 presumably this is just a bug in the D2.x compiler)

The ref parameter storage type combined with const doesn't make sense,
use in and let the complier decide how to pass it.

Nov 20 2007

Ary Borenszweig <ary esperanto.org.ar> writes:

Robert DaSilva escribi�:
 Bill Baxter wrote:
 Dan wrote: 
 D's slices already perform 90% of the stuff we want in order to
 achieve a Matrix.  You more or less just want a standard library that
 lets you do it, right?

 I'd say it lets you do about 50% of the stuff you want for a Matrix, but
 putting a single percentage figure on it is not really possible, because
 different features have different importance to different users.

 In particular:
 1) $ can't be used for opIndex or opSlice in user-classes.

 
 I think there's a proposal for that already.
 
 2) D lacks a clean syntax for slices of multiple dimensions.  opSlice
 calls must have one and only one ".."

 
 Can you propose a syntax for that?

[1 .. 2, 3 .. 4]

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Ary Borenszweig wrote:
 Robert DaSilva escribi�:
 Bill Baxter wrote:
 Dan wrote:
 D's slices already perform 90% of the stuff we want in order to
 achieve a Matrix.  You more or less just want a standard library that
 lets you do it, right?

 I'd say it lets you do about 50% of the stuff you want for a Matrix, but
 putting a single percentage figure on it is not really possible, because
 different features have different importance to different users.

 In particular:
 1) $ can't be used for opIndex or opSlice in user-classes.

 I think there's a proposal for that already.

 2) D lacks a clean syntax for slices of multiple dimensions.  opSlice
 calls must have one and only one ".."

 Can you propose a syntax for that?

 
 [1 .. 2, 3 .. 4]

:-) What about the other end?  How do you define the opSlice that gets 
called by every possible combination of N-dimensional slices and indices 
like
    [1..2, N, 4..7]

One way to extend the current syntax would be if there's any .., in an 
argument then all arguments are treated as slices, and a single number 
argument would be treated as N..N+1.  So the above would call 
opSlice(1,2, N,N+1, 4,7)

Another more general solution would be to have 1..2 result in a slice 
object.  So the above would be like calling 
opSlice(slice(1,2),slice(N,N+1),slice(4,7))

--bb

Nov 20 2007

Dan <murpsoft hotmail.com> writes:

Bill Baxter Wrote:
 
 :-) What about the other end?  How do you define the opSlice that gets 
 called by every possible combination of N-dimensional slices and indices 
 like
     [1..2, N, 4..7]
 One way to extend the current syntax would be if there's any .., in an 
 argument then all arguments are treated as slices, and a single number 
 argument would be treated as N..N+1.  So the above would call 
 opSlice(1,2, N,N+1, 4,7)
 
 Another more general solution would be to have 1..2 result in a slice 
 object.  So the above would be like calling 
 opSlice(slice(1,2),slice(N,N+1),slice(4,7))

I would certainly agree that the [n,n] notation seems to have some major
limitations with regards to the other features of D.  If they're fixing them,
awesome.

The way I had thought to achieve this is to use [1..2][0..$][4..7].  This is
arrays of arrays, which I suppose isn't quite as efficient but would allow
slicing multiple ways and you could accomplish alot of cool stuff with that.

Heck, if you use [0..i][($-i)..($-i+1)][3] you can even get diagonal profiles
on a matrix any way you wanna cut it.

You don't have to worry about opSlice weirdness or much else.  You also don't
need templates or classes or any wizardry to get 90% of the stuff you want out
of it.

Instead of boxes or templates to handle the type nastiness, I recommend using
some parts of my Value struct from Walnut - it's BSD licensed and will let you
do different stuff to different types at runtime.  That will matter because
values in a matrix can be non-numerical.

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Robert DaSilva wrote:

Hey there Robert, I took Don's original message to be asking about what 
we could do *today*.  There are certainly some features in the works 
that will address a number of my points, and some potential additions 
that with a reasonable proposal, and enough lobbying and support might 
be added sometime in the future.  But that's not what I was writing about.

--bb

 Bill Baxter wrote:
 Dan wrote: 
 D's slices already perform 90% of the stuff we want in order to
 achieve a Matrix.  You more or less just want a standard library that
 lets you do it, right?

 I'd say it lets you do about 50% of the stuff you want for a Matrix, but
 putting a single percentage figure on it is not really possible, because
 different features have different importance to different users.

 In particular:
 1) $ can't be used for opIndex or opSlice in user-classes.

 
 I think there's a proposal for that already.
 
 2) D lacks a clean syntax for slices of multiple dimensions.  opSlice
 calls must have one and only one ".."

 
 Can you propose a syntax for that?
 
 3) D lacks a way to return a reference.  opIndexAssign only takes you so
 far.

 
 Isn't that in the works?
 
 4) D lacks a way to transparently reference count memory.  (No
 equivalent of struct copy constructors or destructors)

 
 Can you think of a something that really need reference counting when
 you have GC?
 
 5) D lacks a way to pass by const-reference (even in D2 -- but
 presumably this is just a bug in the D2.x compiler)

 
 The ref parameter storage type combined with const doesn't make sense,
 use in and let the complier decide how to pass it.

Nov 20 2007

Don Clugston <dac nospam.com.au> writes:

Bill Baxter wrote:
 Robert DaSilva wrote:
 
 Hey there Robert, I took Don's original message to be asking about what 
 we could do *today*.

Not really. I was saying, * what does the ultimate goal look like?
And the follow-up question is, * how far are we from the goal?

   There are certainly some features in the works
 that will address a number of my points, and some potential additions 
 that with a reasonable proposal, and enough lobbying and support might 
 be added sometime in the future.  But that's not what I was writing about.
 
 --bb
 
 Bill Baxter wrote:
 Dan wrote:
 D's slices already perform 90% of the stuff we want in order to
 achieve a Matrix.  You more or less just want a standard library that
 lets you do it, right?

 I'd say it lets you do about 50% of the stuff you want for a Matrix, but
 putting a single percentage figure on it is not really possible, because
 different features have different importance to different users.

 In particular:
 1) $ can't be used for opIndex or opSlice in user-classes.

 I think there's a proposal for that already.

 2) D lacks a clean syntax for slices of multiple dimensions.  opSlice
 calls must have one and only one ".."

 Can you propose a syntax for that?

 3) D lacks a way to return a reference.  opIndexAssign only takes you so
 far.

 Isn't that in the works?

 4) D lacks a way to transparently reference count memory.  (No
 equivalent of struct copy constructors or destructors)

 Can you think of a something that really need reference counting when
 you have GC?

 5) D lacks a way to pass by const-reference (even in D2 -- but
 presumably this is just a bug in the D2.x compiler)

 The ref parameter storage type combined with const doesn't make sense,
 use in and let the complier decide how to pass it.

Nov 20 2007

0ffh <frank frankhirsch.youknow.what.todo.net> writes:

Hi, regarding the organisation of matrices maybe it'd pay to take a look at 
how Lush is doing them, as it's designed (and known) to do them quite well.
I'll quote from the Lush tutorial:

Matrices are really composed of two separate entities:

  * the storage or srg which contains the following fields: a pointer to the
actual data , an element type identifier (and the size thereof) , and flags
that indicate if the data is writable or not , if it is in RAM or
memory-mapped from a file.

  * the index or idx points to a srg and contains the following fields: the 
number of dimensions of the tensor , the size in each dimension , the 
address increment from one element to the next in each dimension (called 
modulo ) , and the offset of the first tensor element in the storage . This 
structure allows multiple idx to point to the same srg thereby allowing the 
data to be accessed in multiple ways without copying.

regards, frank

Nov 20 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

0ffh wrote:
 
 Hi, regarding the organisation of matrices maybe it'd pay to take a look 
 at how Lush is doing them, as it's designed (and known) to do them quite 
 well.
 I'll quote from the Lush tutorial:
 
 Matrices are really composed of two separate entities:
 
  * the storage or srg which contains the following fields: a pointer to the
 actual data , an element type identifier (and the size thereof) , and flags
 that indicate if the data is writable or not , if it is in RAM or
 memory-mapped from a file.
 
  * the index or idx points to a srg and contains the following fields: 
 the number of dimensions of the tensor , the size in each dimension , 
 the address increment from one element to the next in each dimension 
 (called modulo ) , and the offset of the first tensor element in the 
 storage . This structure allows multiple idx to point to the same srg 
 thereby allowing the data to be accessed in multiple ways without copying.
 

That sounds like the way NumPy in python does it too.  And maybe Blitz++ 
as well?

--bb

Nov 20 2007

0ffh <frank frankhirsch.youknow.what.todo.net> writes:

Bill Baxter wrote:
 That sounds like the way NumPy in python does it too.  And maybe Blitz++ 
 as well?

Humm, upon reading the last few posts on this topic I decided that I was
probably way behind with this comment... I should read more first.
Seems this way of handling things is widely used; the more reason to
suspect there might be something to it. =)

regards, frank

Nov 20 2007

Knud Soerensen <4tuu4k002 sneakemail.com> writes:

Don Clugston wrote:
 No, I don't have the perfect Matrix. I just want to explore the question
 of whether it exists.
 There are a plethora of C++ matrix libraries, all of which are far from
 perfect; they all involve trade-offs between various language limitations.
 In particular, the usage of expression templates creates all kinds of evil.
 
 Requirements:
 1. Must be compatible with future multi-dimensional array syntax such as
 double [7,5] m;
 2. Must be theoretically capable of optimal performance.
 3. Needs to support slices. This means we also need to support strided
 vectors. Therefore we also need matrix slices, and matrix references.
 Probably distinguishing between the T[] and T[new] concepts.
 4. For storage we need packed, triangular, symmetric, banded, and sparse
 matrix support. (Possibly even 'calculated' matrix support, for
 user-defined matrix types which generate entries on-demand).
 5. Can be generalized to tensors.
 6. Nice error messages.

Take a look at the old vectorization debate.

http://all-technology.com/eigenpolls/dwishlist/index.php?it=10
http://www.google.com/search?q=vectorization+&domains=www.digitalmars.com&sitesearch=www.digitalmars.com

Dec 01 2007

D Programming

C/C++ Programming

Other

digitalmars.D - The Matrix to end all Matrix classes (Let's dream!)