digitalmars.D.announce - Multidimensional Arrays

Oskar Linde (175/175) Sep 08 2006 I have been implementing a strided multidimensional array type and am

Don Clugston (22/27) Sep 10 2006 [snip]
Bill Baxter (33/280) Feb 16 2007 Hey Oskar,

Neal Becker (4/4) Feb 17 2007 I think it makes a lot of sense to build arrays from 2 components. One

Bill Baxter (13/17) Feb 18 2007 You mean exactly how Oskar and I are doing it? Or you mean a more

Oskar Linde (48/102) Mar 16 2007 Hi Bill,

Bill Baxter (42/130) Mar 16 2007 Hmm, how do you accomplish that? I'll take a look at the code but it

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

I have been implementing a strided multidimensional array type and am 
interested in comments.

It is basically designed after Norbert Nemec's suggestion:
http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
+/- some things.

The basic array type looks like this:

struct Array(T,int dims = 1) {
	T* ptr;
	size_t[dims] range;
	ptrdiff_t[dims] stride;

	const dimensions = dims;
	alias T ElementType;

	...
}

Usage:

Initialization

// Create a 3x3 array
auto A = Array!(int,2)(3,3);

//This could use a nicer syntax
A.assign(5,6,7,
          3,1,9,
          8,8,1);

writefln("A = %s",A.toString);
/*
A = (3x3)
       5        6        7
       3        1        9
       8        8        1
*/

auto B = Array!(int,2)(3,3);
B.assign(0,0,1,
          0,2,1,
          7,8,9);

Element wise operators:
writefln("A*B = %s",(A*B).toString);

Scalar operators:
writefln("5*B = %s",(5*B).toString);

As in Norberts proposal:
C.transpose returns a transposed array view (all dimensions reversed)
C.transpose(dimA,dimB) swaps dimension dimA and dimB

C.diag returns a N-1 dimensional view of the diagonal of the matrix. 
E.g. a 5x5 identity matrix:

// Zeros is just a convenience wrapper to generate a zero initialized array
Auto A = zeros!(int)(5,5);
A.diag[] = 1;
writefln("A = %s",A.toString);

A = (5x5)
       1        0        0        0        0
       0        1        0        0        0
       0        0        1        0        0
       0        0        0        1        0
       0        0        0        0        1

C.reverse(dim) reverses the given dimension (inverts the stride)

Indexing and slicing:

auto D = Array!(int,3)(7,7,7); // 7x7x7 dimensional array

D[1,2,3] => int
D[1,2,all] => 7 dimensional slice
D[1,all,all] => 7x7 dimensional slice
D[range(1,end-1), all, range(1,end-1)] => 5x7x5 dimensional subarray
D[end-1, range(end-3,end-1), all] => 2x7 dimensional slice

End works in the same way as $ does for regular arrays.
range(a,b) is the replacement for a..b (D doesn't support mixed 
indexing/slicing otherwise)

Slices/subarrays are of course views that share the same data:

auto M = zeros!(int)(4,4);
M[range(0,2), range(0,2)][] = 1;
M[range(2,end), range(2,end)][] = 1;
writefln("M = %s", M.toString);

M = (4x4)
       1        1        0        0
       1        1        0        0
       0        0        1        1
       0        0        1        1


.dup => Returns a compact copy of the array

In place map over a function:

M.doMap((int y){return 1<<y;});

Functional map:

writefln("cos(M) = %s",M.map(&cos).toString);

cos(M) = (4x4)
  0.5403   0.5403   1.0000   1.0000
  0.5403   0.5403   1.0000   1.0000
  1.0000   1.0000   0.5403   0.5403
  1.0000   1.0000   0.5403   0.5403

Foreach iteration:
auto N = zeros!(int)(5,5);
int i = 0;
foreach(inout x; N)
   x = ++i;



Of course, custom types are supported. Some examples just for fun: :)

auto Q = ones!(Rational!(long))(5,5) * 3;
writefln("Q/N = %s",(Q/N).toString);

Q/N = (5x5)
       3      3/2        1      3/4      3/5
     1/2      3/7      3/8      1/3     3/10
    3/11      1/4     3/13     3/14      1/5
    3/16     3/17      1/6     3/19     3/20
     1/7     3/22     3/23      1/8     3/25



auto U = zeros!(Uncertain!(double))(4,4);

double val = -8;
foreach(inout T;U) {
     // +- 10% uncertainty + 0.1
     T += Uncertain!(double)(val,abs(0.1*val)+0.1);
     val += 1;
}
writefln("U = %s",U.toString);

U = (4x4)
   (-8.0 ± 0.9)    (-7.0 ± 0.8)    (-6.0 ± 0.7)    (-5.0 ± 0.6)
   (-4.0 ± 0.5)    (-3.0 ± 0.4)    (-2.0 ± 0.3)    (-1.0 ± 0.2)
     (0. ± 0.1)     (1.0 ± 0.2)     (2.0 ± 0.3)     (3.0 ± 0.4)
    (4.0 ± 0.5)     (5.0 ± 0.6)     (6.0 ± 0.7)     (7.0 ± 0.8)

writefln("U/U = %s",(U/U).toString);

U/U = (4x4)
    (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
    (1.0 ± 0.3)     (1.0 ± 0.3)     (1.0 ± 0.3)     (1.1 ± 0.4)
    (nan ± inf)     (1.1 ± 0.4)     (1.0 ± 0.3)     (1.0 ± 0.3)
    (1.0 ± 0.3)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)




auto V = zeros!(Voltage)(4,4);
   val = 0;
   foreach(inout v; V) {
     v = val * volt;
     val += 0.1;
   }

auto R = zeros!(Resistance)(4,4);
   val = 100;
   foreach(inout r; R) {
     r = val * ohm;
     val *= 2;
   }

writefln("V = %s",V.toString);
writefln("R = %s",R.toString);
writefln("V/R = %s",(V/R).toString);


V = (4x4)
             0 V           100 mV           200 mV           300 mV
          400 mV           500 mV           600 mV           700 mV
          800 mV           900 mV         1e+03 mV            1.1 V
           1.2 V            1.3 V            1.4 V            1.5 V

R = (4x4)
           100 Ω            200 Ω            400 Ω            800 Ω
          1.6 kΩ           3.2 kΩ           6.4 kΩ          12.8 kΩ
         25.6 kΩ          51.2 kΩ           102 kΩ           205 kΩ
          410 kΩ           819 kΩ          1.64 MΩ          3.28 MΩ

V/R = (4x4)
             0 A           500 µA           500 µA           375 µA
          250 µA           156 µA          93.7 µA          54.7 µA
         31.2 µA          17.6 µA          9.77 µA          5.37 µA
         2.93 µA          1.59 µA           854 nA           458 nA





Comments:

The only tricky part about all this was getting opIndex working with 
compile time variadic arguments. I have not yet gotten DMD to inline the 
generated indexing functions, but it should be doable as they are all 
trivial. I have not been able to pursue this due to a DMD inlining bug 
that in some cases generates "xx is not an lvalue" when compiling with 
-inline.

The main issue that remains is op_r operators that have to work a bit 
differently from non _r operators(*).

I see this all more as a proof of concept than anything else. If anyone 
is interested in the sources, I will find a way to put them up somewhere.

Future extensions would be writing Matrix types, and possibly hook them 
up with a BLAS library. Anyone know if there exists BLAS implementations 
that supports 80-bit reals?

Other things I want to look at are adding stride support to the range() 
type and investigate the possibilities of generating expression templates.

*) Currently, if you have an:

opAdd(T)(T x) {...}

you can't also have an:

opAdd_r(T)(T x) {...}

since this will create ambiguous overloads. I find myself wanting to 
define opAdd_r for all cases where there are no regular non _r operator, 
but there is (AFAIK) no way to do that.

I would go as far as suggesting a change to D: If both a matching opX 
and a matching opX_r overload exists, the opX overload is chosen. This 
will demote the opX_r overloads a bit and is consistent with the way 
commutative operators work, where if for example no a.opAdd(b) exists, 
b.opAdd(a) is called.

/Oskar

Sep 08 2006

Don Clugston <dac nospam.com.au> writes:

Oskar Linde wrote:
 I have been implementing a strided multidimensional array type and am 
 interested in comments.

[snip]

This looks really promising -- that improved IFTI support in the last 
release has clearly opened a lot of possibilities. For now, I'll just 
comment on one thing.

 Future extensions would be writing Matrix types, and possibly hook them 
 up with a BLAS library. Anyone know if there exists BLAS implementations 
 that supports 80-bit reals?

This is a really interesting issue. Personally I don't of any 
implementations, but I haven't really looked. My first reaction was, 
"you'd always be using doubles or floats in matrices, so you'd use 
SIMD". But even with 64-bit matrices, 80-bit temporary results can still 
be useful, especially for operations on the diagonal elements of 
matrices. Basically, all vectors parameters would still be 64-bit 
doubles, but scalar parameters and returned results would be 80-bit.

And then I got thinking -- it's not clear to me that SIMD would be very 
beneficial when dealing with strided matrices, unless you're extremely 
clever(and this is probably why the AMD BLAS libraries have hand-crafted 
ASM even for the BLAS3 functions, you'd don't get much benefit 
otherwise). Consequently, I strongly suspect that you'd always want to 
treat strided arrays differently from built-in arrays (ie, stride == 1).
I don't know to what extent expression templates could keep track of 
whether an array has a stride or not, but there'd be a massive 
performance boost if you could.

-Don

Sep 10 2006

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 I have been implementing a strided multidimensional array type and am 
 interested in comments.
 
 It is basically designed after Norbert Nemec's suggestion:
 http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
 +/- some things.
 
 The basic array type looks like this:
 
 struct Array(T,int dims = 1) {
     T* ptr;
     size_t[dims] range;
     ptrdiff_t[dims] stride;
 
     const dimensions = dims;
     alias T ElementType;
 
     ...
 }

Hey Oskar,
I've been working on implementing something similar.  I'd be interested 
to see some parts of your implementation.
Mine's at
http://www.dsource.org/projects/multiarray

The main difference between our approaches seems to be that I've made 
'dims' non-const (and therefore not a template parameter).

I thought that would be better since sometimes it's handy to reshape a 
1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the 
potential to do broadcasting more easily that way too (E.g. multiply an 
N dim 1-d array times an Nx5 array just by duplicating the one axis 
implicitly).  The down side is that range and stride then become dynamic 
arrays.  I guess with your approach you can create a new view with the 

need more templated member functions, parameterized by Array dimension.

Another tradeoff is that I can also have indexing and slicing return 
arrays of different dimension, but I can't have indexing return a single 
element (because the return type of opIndex is Array not float and you 
can't have two different return types).  Instead I overloaded opCall for 
that.  So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element 
view.

I also made it a class in the end, though I started out with it being a 
struct.  Partly that decision was motivated by the idea that generally I 
don't really want to pass 6 or more 32-bit numbers around by value, plus 
the presence of pointer member data, so using a class seemed reasonable. 
  Having to remember to say 'new' everywhere has been a pain, though.

I have mine hooked up with some basic BLAS routines -- the routines for 
VV, MV, and MM multiplication, and the simplest least squares and linear 
system solvers.  I don't know of any 80-bit capable BLAS, but really 
single precision is good enough for most of what I do.  Double is mostly 
overkill, but I use it to be sure.

--bb

 
 Usage:
 
 Initialization
 
 // Create a 3x3 array
 auto A = Array!(int,2)(3,3);
 
 //This could use a nicer syntax
 A.assign(5,6,7,
          3,1,9,
          8,8,1);
 
 writefln("A = %s",A.toString);
 /*
 A = (3x3)
       5        6        7
       3        1        9
       8        8        1
 */
 
 auto B = Array!(int,2)(3,3);
 B.assign(0,0,1,
          0,2,1,
          7,8,9);
 
 Element wise operators:
 writefln("A*B = %s",(A*B).toString);
 
 Scalar operators:
 writefln("5*B = %s",(5*B).toString);
 
 As in Norberts proposal:
 C.transpose returns a transposed array view (all dimensions reversed)
 C.transpose(dimA,dimB) swaps dimension dimA and dimB
 
 C.diag returns a N-1 dimensional view of the diagonal of the matrix. 
 E.g. a 5x5 identity matrix:
 
 // Zeros is just a convenience wrapper to generate a zero initialized array
 Auto A = zeros!(int)(5,5);
 A.diag[] = 1;
 writefln("A = %s",A.toString);
 
 A = (5x5)
       1        0        0        0        0
       0        1        0        0        0
       0        0        1        0        0
       0        0        0        1        0
       0        0        0        0        1
 
 C.reverse(dim) reverses the given dimension (inverts the stride)
 
 Indexing and slicing:
 
 auto D = Array!(int,3)(7,7,7); // 7x7x7 dimensional array
 
 D[1,2,3] => int
 D[1,2,all] => 7 dimensional slice
 D[1,all,all] => 7x7 dimensional slice
 D[range(1,end-1), all, range(1,end-1)] => 5x7x5 dimensional subarray
 D[end-1, range(end-3,end-1), all] => 2x7 dimensional slice
 
 End works in the same way as $ does for regular arrays.
 range(a,b) is the replacement for a..b (D doesn't support mixed 
 indexing/slicing otherwise)
 
 Slices/subarrays are of course views that share the same data:
 
 auto M = zeros!(int)(4,4);
 M[range(0,2), range(0,2)][] = 1;
 M[range(2,end), range(2,end)][] = 1;
 writefln("M = %s", M.toString);
 
 M = (4x4)
       1        1        0        0
       1        1        0        0
       0        0        1        1
       0        0        1        1
 
 
 .dup => Returns a compact copy of the array
 
 In place map over a function:
 
 M.doMap((int y){return 1<<y;});
 
 Functional map:
 
 writefln("cos(M) = %s",M.map(&cos).toString);
 
 cos(M) = (4x4)
  0.5403   0.5403   1.0000   1.0000
  0.5403   0.5403   1.0000   1.0000
  1.0000   1.0000   0.5403   0.5403
  1.0000   1.0000   0.5403   0.5403
 
 Foreach iteration:
 auto N = zeros!(int)(5,5);
 int i = 0;
 foreach(inout x; N)
   x = ++i;
 
 
 
 Of course, custom types are supported. Some examples just for fun: :)
 
 auto Q = ones!(Rational!(long))(5,5) * 3;
 writefln("Q/N = %s",(Q/N).toString);
 
 Q/N = (5x5)
       3      3/2        1      3/4      3/5
     1/2      3/7      3/8      1/3     3/10
    3/11      1/4     3/13     3/14      1/5
    3/16     3/17      1/6     3/19     3/20
     1/7     3/22     3/23      1/8     3/25
 
 
 
 auto U = zeros!(Uncertain!(double))(4,4);
 
 double val = -8;
 foreach(inout T;U) {
     // +- 10% uncertainty + 0.1
     T += Uncertain!(double)(val,abs(0.1*val)+0.1);
     val += 1;
 }
 writefln("U = %s",U.toString);
 
 U = (4x4)
   (-8.0 ± 0.9)    (-7.0 ± 0.8)    (-6.0 ± 0.7)    (-5.0 ± 0.6)
   (-4.0 ± 0.5)    (-3.0 ± 0.4)    (-2.0 ± 0.3)    (-1.0 ± 0.2)
     (0. ± 0.1)     (1.0 ± 0.2)     (2.0 ± 0.3)     (3.0 ± 0.4)
    (4.0 ± 0.5)     (5.0 ± 0.6)     (6.0 ± 0.7)     (7.0 ± 0.8)
 
 writefln("U/U = %s",(U/U).toString);
 
 U/U = (4x4)
    (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
    (1.0 ± 0.3)     (1.0 ± 0.3)     (1.0 ± 0.3)     (1.1 ± 0.4)
    (nan ± inf)     (1.1 ± 0.4)     (1.0 ± 0.3)     (1.0 ± 0.3)
    (1.0 ± 0.3)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
 
 
 
 
 auto V = zeros!(Voltage)(4,4);
   val = 0;
   foreach(inout v; V) {
     v = val * volt;
     val += 0.1;
   }
 
 auto R = zeros!(Resistance)(4,4);
   val = 100;
   foreach(inout r; R) {
     r = val * ohm;
     val *= 2;
   }
 
 writefln("V = %s",V.toString);
 writefln("R = %s",R.toString);
 writefln("V/R = %s",(V/R).toString);
 
 
 V = (4x4)
             0 V           100 mV           200 mV           300 mV
          400 mV           500 mV           600 mV           700 mV
          800 mV           900 mV         1e+03 mV            1.1 V
           1.2 V            1.3 V            1.4 V            1.5 V
 
 R = (4x4)
           100 Ω            200 Ω            400 Ω            800 Ω
          1.6 kΩ           3.2 kΩ           6.4 kΩ          12.8 kΩ
         25.6 kΩ          51.2 kΩ           102 kΩ           205 kΩ
          410 kΩ           819 kΩ          1.64 MΩ          3.28 MΩ
 
 V/R = (4x4)
             0 A           500 µA           500 µA           375 µA
          250 µA           156 µA          93.7 µA          54.7 µA
         31.2 µA          17.6 µA          9.77 µA          5.37 µA
         2.93 µA          1.59 µA           854 nA           458 nA
 
 
 
 
 
 Comments:
 
 The only tricky part about all this was getting opIndex working with 
 compile time variadic arguments. I have not yet gotten DMD to inline the 
 generated indexing functions, but it should be doable as they are all 
 trivial. I have not been able to pursue this due to a DMD inlining bug 
 that in some cases generates "xx is not an lvalue" when compiling with 
 -inline.
 
 The main issue that remains is op_r operators that have to work a bit 
 differently from non _r operators(*).
 
 I see this all more as a proof of concept than anything else. If anyone 
 is interested in the sources, I will find a way to put them up somewhere.
 
 Future extensions would be writing Matrix types, and possibly hook them 
 up with a BLAS library. Anyone know if there exists BLAS implementations 
 that supports 80-bit reals?
 
 Other things I want to look at are adding stride support to the range() 
 type and investigate the possibilities of generating expression templates.
 
 *) Currently, if you have an:
 
 opAdd(T)(T x) {...}
 
 you can't also have an:
 
 opAdd_r(T)(T x) {...}
 
 since this will create ambiguous overloads. I find myself wanting to 
 define opAdd_r for all cases where there are no regular non _r operator, 
 but there is (AFAIK) no way to do that.
 
 I would go as far as suggesting a change to D: If both a matching opX 
 and a matching opX_r overload exists, the opX overload is chosen. This 
 will demote the opX_r overloads a bit and is consistent with the way 
 commutative operators work, where if for example no a.opAdd(b) exists, 
 b.opAdd(a) is called.
 
 /Oskar

Feb 16 2007

Neal Becker <ndbecker2 gmail.com> writes:

I think it makes a lot of sense to build arrays from 2 components.  One
component is the storage.  The second component is the array, which is an
interpretation of the storage.  I think this simplifies slicing and
projections.

Feb 17 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Neal Becker wrote:
 I think it makes a lot of sense to build arrays from 2 components.  One
 component is the storage.  The second component is the array, which is an
 interpretation of the storage.  I think this simplifies slicing and
 projections.

You mean exactly how Oskar and I are doing it?  Or you mean a more 
flexible notion of storage?  I agree it would be nice to be able to use 
different storage back ends besides strided memory.  Diagonal, 
Triangular, Sparse, Block, Banded, etc. would all be nice.  The Matrix 
Template Library did something like that, though I'd say with mixed results:
    http://www.osl.iu.edu/research/mtl/

The high-level idea is nice, but the end result was overly complicated 
IMHO, both internally and externally.  And it depended critically on 
good optimization from the compiler to get decent performance since 
everything was in C++ (no tie-in with optimized Fortran back-ends like 
ATLAS).

--bb

Feb 18 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Bill Baxter skrev:
 Oskar Linde wrote:
 I have been implementing a strided multidimensional array type and am 
 interested in comments.

 It is basically designed after Norbert Nemec's suggestion:
 http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html 

 +/- some things.

 The basic array type looks like this:

 struct Array(T,int dims = 1) {
     T* ptr;
     size_t[dims] range;
     ptrdiff_t[dims] stride;

     const dimensions = dims;
     alias T ElementType;

     ...
 }

 
 Hey Oskar,
 I've been working on implementing something similar.  I'd be interested 
 to see some parts of your implementation.
 Mine's at
 http://www.dsource.org/projects/multiarray

Hi Bill,

I'm sorry for the late reply. I've had a quite hectic last few months 
and not had time to dig into this. I have now at least extracted my old 
multiarray code and turned it into something that compiles (and 
converted it to use the new D variadic templates instead of the old 
variadic.d hack):

http://www.csc.kth.se/~ol/multiarray.tar.gz

the multiarray module is sci.matrix.multiarray. Everything is in a very 
crude shape.

 The main difference between our approaches seems to be that I've made 
 'dims' non-const (and therefore not a template parameter).
 
 I thought that would be better since sometimes it's handy to reshape a 
 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the 
 potential to do broadcasting more easily that way too (E.g. multiply an 
 N dim 1-d array times an Nx5 array just by duplicating the one axis 
 implicitly).  The down side is that range and stride then become dynamic 
 arrays.  

 I guess with your approach you can create a new view with the 


Yes. This is very powerful.

Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a 
1x10x10 array or anything else. And that is all deduced at compile time.

 On the other hand that probably means you 
 need more templated member functions, parameterized by Array dimension.

Possibly.

 Another tradeoff is that I can also have indexing and slicing return 
 arrays of different dimension, but I can't have indexing return a single 
 element (because the return type of opIndex is Array not float and you 
 can't have two different return types).  Instead I overloaded opCall for 
 that.  So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element 
 view.

In addition to being able to have full indexing return a single element 
vs partial indexing returning slices I think the greatest advantage of 
having the dimensionality as a compile time constant is the improved 
compile time consistency checks.

I believe you will extremely rarely dynamically adjust the 
dimensionality of a multidimensional array. You are almost always 
required to know the dimensionality at compile time anyway. You can 
always make slices of different dimensionality. Having a dimension with 
a stride of 0, you could even duplicate the data over a dimension such 
as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of 
memory. Reshaping has to be done explicitly (which I see as a good thing).

In general, I think this falls back on the same conclusion as the 
discussion regarding intrinsic vector operations and small static arrays 
being value types a while ago. Low dimensionalities will most of the 
time turn out to be static, while larger dimensions are dynamic. The 
dimensionality of a multidimensional array will typically be static and 
of a low order (For example, I've personally never encountered a case 
needing more than 6 dimensions).

 I also made it a class in the end, though I started out with it being a 
 struct.  Partly that decision was motivated by the idea that generally I 
 don't really want to pass 6 or more 32-bit numbers around by value, plus 
 the presence of pointer member data, so using a class seemed reasonable. 
  Having to remember to say 'new' everywhere has been a pain, though.

The biggest disadvantage I see is that data shuffling (such as in a loop 
assigning slices of one array to slices of another one) will result in 
the creation of lots of temporary heap allocated classes instead of, 
when using a struct, a small constant stack memory usage.

One option could possibly be to make slicing return proxy structs that 
(in the future implicitly) converted to class instances.

Note that the by-value passing doesn't necessarily mean any reduction in 
performance. Being careful about making functions not changing the shape 
of the array take the array as an inout (future const/readonly ref) 
while those that need must make the same data-copy when dup-ing the 
class anyway.

Another option to avoid passing lots of data by value could possibly be 
to make a container struct containing a pointer to another struct 
containing all the data.

 I have mine hooked up with some basic BLAS routines -- the routines for 
 VV, MV, and MM multiplication, and the simplest least squares and linear 
 system solvers.  

That sounds really interesting. I will definitely give it a look.

/Oskar

Mar 16 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 Bill Baxter skrev:

 Hi Bill,
 
 I'm sorry for the late reply. I've had a quite hectic last few months 
 and not had time to dig into this. I have now at least extracted my old 
 multiarray code and turned it into something that compiles (and 
 converted it to use the new D variadic templates instead of the old 
 variadic.d hack):
 
 http://www.csc.kth.se/~ol/multiarray.tar.gz

Great!  Thanks for putting this online.

 the multiarray module is sci.matrix.multiarray. Everything is in a very 
 crude shape.
 
 The main difference between our approaches seems to be that I've made 
 'dims' non-const (and therefore not a template parameter).

 I thought that would be better since sometimes it's handy to reshape a 
 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the 
 potential to do broadcasting more easily that way too (E.g. multiply 
 an N dim 1-d array times an Nx5 array just by duplicating the one axis 
 implicitly).  The down side is that range and stride then become 
 dynamic arrays.  

 

 of dims when needed.  

 
 Yes. This is very powerful.
 
 Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a 
 1x10x10 array or anything else. And that is all deduced at compile time.

Hmm, how do you accomplish that?  I'll take a look at the code but it 
seems like you need a lot of partial specializations for the 0 case. 
Any opIndex call that has a zero as one of its arguments needs to have a 
special implementation because it's returning a different type.  But I 
thought specialization and IFTI were mutually exclusive.

 On the other hand that probably means you need more templated member 
 functions, parameterized by Array dimension.

 
 Possibly.
 
 Another tradeoff is that I can also have indexing and slicing return 
 arrays of different dimension, but I can't have indexing return a 
 single element (because the return type of opIndex is Array not float 
 and you can't have two different return types).  Instead I overloaded 
 opCall for that.  So M(4,3) is the value of element 4,3, but M[4,3] is 
 a 1-element view.

 
 In addition to being able to have full indexing return a single element 
 vs partial indexing returning slices I think the greatest advantage of 
 having the dimensionality as a compile time constant is the improved 
 compile time consistency checks.
 
 I believe you will extremely rarely dynamically adjust the 
 dimensionality of a multidimensional array. You are almost always 
 required to know the dimensionality at compile time anyway. You can 
 always make slices of different dimensionality. Having a dimension with 
 a stride of 0, you could even duplicate the data over a dimension such 
 as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of 
 memory. Reshaping has to be done explicitly (which I see as a good thing).

What about allowing flexible expanding of dimensions?  I'm basically 
trying to make a pseudo-clone of numpy (www.numpy.org), and one thing it 
does a lot is "broadcasting".
   http://www.scipy.org/EricsBroadcastingDoc
   http://www.onlamp.com/pub/a/python/2000/09/27/numerically.html

 In general, I think this falls back on the same conclusion as the 
 discussion regarding intrinsic vector operations and small static arrays 
 being value types a while ago. Low dimensionalities will most of the 
 time turn out to be static, while larger dimensions are dynamic. The 
 dimensionality of a multidimensional array will typically be static and 
 of a low order (For example, I've personally never encountered a case 
 needing more than 6 dimensions).

Yeh, there is a limit.  Numpy is supposed to be dimension agnostic, but 
there is a hard-coded MAX_DIMS in the C code that implements the 
backend.  I think it's set to something like 32.

 I also made it a class in the end, though I started out with it being 
 a struct.  Partly that decision was motivated by the idea that 
 generally I don't really want to pass 6 or more 32-bit numbers around 
 by value, plus the presence of pointer member data, so using a class 
 seemed reasonable.  Having to remember to say 'new' everywhere has 
 been a pain, though.

 
 The biggest disadvantage I see is that data shuffling (such as in a loop 
 assigning slices of one array to slices of another one) will result in 
 the creation of lots of temporary heap allocated classes instead of, 
 when using a struct, a small constant stack memory usage.

Yep.  But basically since I was porting from Python code my thinking was 
"Python already does all those allocations, and millions more -- a D 
implementation can only be faster"  :-)

I think the thing that really got me to switch to class was that there 
were some IFTI bugs that only surfaced when using structs and which went 
away when using a class.  They're fixed now, so maybe it's time to go 
back to a struct.    *Especially* if we get some new constref goodness.

 One option could possibly be to make slicing return proxy structs that 
 (in the future implicitly) converted to class instances.
 
 Note that the by-value passing doesn't necessarily mean any reduction in 
 performance. Being careful about making functions not changing the shape 
 of the array take the array as an inout (future const/readonly ref) 
 while those that need must make the same data-copy when dup-ing the 
 class anyway.

Yeh, I hate to make things 'inout' when I really want const reference 
behavior.  I'm going to go an rewrite a bunch of stuff as soon as we get 
const ref.

Another nice thing about going the class route with current D, is that 
you can have a function which potentially returns the input unmodified. 
  For instance a function that makes sure the input is at least 2D, and 
if so just returns the input as-is, otherwise it creates a new view. 
With structs that'll turn into create a new view no matter what.

On the other hand, you need those kinds of functions in Numpy a lot 
because you tend to write functions that take an "A" where "A" could be 
a python array, or a python tuple, or a numpy array of any dimension, or 
any other sequence type.  Using numpy.asarray_2d(A) at the top of your 
functions is in many ways a substitute for not having static typing.

It's hard to say without actually going through the exercise.  I don't 
have much experience using big flexible array packages in statically 
typed languages like C++.  My experience is with the array handling in 
Matlab and Numpy, both of which are dynamic.  Maybe static dimension is 
the way to go with a statically typed language.

 Another option to avoid passing lots of data by value could possibly be 
 to make a container struct containing a pointer to another struct 
 containing all the data.

Sounds like significantly more complexity for not so much benefit.

 I have mine hooked up with some basic BLAS routines -- the routines 
 for VV, MV, and MM multiplication, and the simplest least squares and 
 linear system solvers.  

 
 That sounds really interesting. I will definitely give it a look.



--bb

Mar 16 2007

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Multidimensional Arrays