www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Multidimensional Arrays

reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
I have been implementing a strided multidimensional array type and am 
interested in comments.

It is basically designed after Norbert Nemec's suggestion:
http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
+/- some things.

The basic array type looks like this:

struct Array(T,int dims = 1) {
	T* ptr;
	size_t[dims] range;
	ptrdiff_t[dims] stride;

	const dimensions = dims;
	alias T ElementType;

	...
}

Usage:

Initialization

// Create a 3x3 array
auto A = Array!(int,2)(3,3);

//This could use a nicer syntax
A.assign(5,6,7,
          3,1,9,
          8,8,1);

writefln("A = %s",A.toString);
/*
A = (3x3)
       5        6        7
       3        1        9
       8        8        1
*/

auto B = Array!(int,2)(3,3);
B.assign(0,0,1,
          0,2,1,
          7,8,9);

Element wise operators:
writefln("A*B = %s",(A*B).toString);

Scalar operators:
writefln("5*B = %s",(5*B).toString);

As in Norberts proposal:
C.transpose returns a transposed array view (all dimensions reversed)
C.transpose(dimA,dimB) swaps dimension dimA and dimB

C.diag returns a N-1 dimensional view of the diagonal of the matrix. 
E.g. a 5x5 identity matrix:

// Zeros is just a convenience wrapper to generate a zero initialized array
Auto A = zeros!(int)(5,5);
A.diag[] = 1;
writefln("A = %s",A.toString);

A = (5x5)
       1        0        0        0        0
       0        1        0        0        0
       0        0        1        0        0
       0        0        0        1        0
       0        0        0        0        1

C.reverse(dim) reverses the given dimension (inverts the stride)

Indexing and slicing:

auto D = Array!(int,3)(7,7,7); // 7x7x7 dimensional array

D[1,2,3] => int
D[1,2,all] => 7 dimensional slice
D[1,all,all] => 7x7 dimensional slice
D[range(1,end-1), all, range(1,end-1)] => 5x7x5 dimensional subarray
D[end-1, range(end-3,end-1), all] => 2x7 dimensional slice

End works in the same way as $ does for regular arrays.
range(a,b) is the replacement for a..b (D doesn't support mixed 
indexing/slicing otherwise)

Slices/subarrays are of course views that share the same data:

auto M = zeros!(int)(4,4);
M[range(0,2), range(0,2)][] = 1;
M[range(2,end), range(2,end)][] = 1;
writefln("M = %s", M.toString);

M = (4x4)
       1        1        0        0
       1        1        0        0
       0        0        1        1
       0        0        1        1


.dup => Returns a compact copy of the array

In place map over a function:

M.doMap((int y){return 1<<y;});

Functional map:

writefln("cos(M) = %s",M.map(&cos).toString);

cos(M) = (4x4)
  0.5403   0.5403   1.0000   1.0000
  0.5403   0.5403   1.0000   1.0000
  1.0000   1.0000   0.5403   0.5403
  1.0000   1.0000   0.5403   0.5403

Foreach iteration:
auto N = zeros!(int)(5,5);
int i = 0;
foreach(inout x; N)
   x = ++i;



Of course, custom types are supported. Some examples just for fun: :)

auto Q = ones!(Rational!(long))(5,5) * 3;
writefln("Q/N = %s",(Q/N).toString);

Q/N = (5x5)
       3      3/2        1      3/4      3/5
     1/2      3/7      3/8      1/3     3/10
    3/11      1/4     3/13     3/14      1/5
    3/16     3/17      1/6     3/19     3/20
     1/7     3/22     3/23      1/8     3/25



auto U = zeros!(Uncertain!(double))(4,4);

double val = -8;
foreach(inout T;U) {
     // +- 10% uncertainty + 0.1
     T += Uncertain!(double)(val,abs(0.1*val)+0.1);
     val += 1;
}
writefln("U = %s",U.toString);

U = (4x4)
   (-8.0 ± 0.9)    (-7.0 ± 0.8)    (-6.0 ± 0.7)    (-5.0 ± 0.6)
   (-4.0 ± 0.5)    (-3.0 ± 0.4)    (-2.0 ± 0.3)    (-1.0 ± 0.2)
     (0. ± 0.1)     (1.0 ± 0.2)     (2.0 ± 0.3)     (3.0 ± 0.4)
    (4.0 ± 0.5)     (5.0 ± 0.6)     (6.0 ± 0.7)     (7.0 ± 0.8)

writefln("U/U = %s",(U/U).toString);

U/U = (4x4)
    (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
    (1.0 ± 0.3)     (1.0 ± 0.3)     (1.0 ± 0.3)     (1.1 ± 0.4)
    (nan ± inf)     (1.1 ± 0.4)     (1.0 ± 0.3)     (1.0 ± 0.3)
    (1.0 ± 0.3)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)




auto V = zeros!(Voltage)(4,4);
   val = 0;
   foreach(inout v; V) {
     v = val * volt;
     val += 0.1;
   }

auto R = zeros!(Resistance)(4,4);
   val = 100;
   foreach(inout r; R) {
     r = val * ohm;
     val *= 2;
   }

writefln("V = %s",V.toString);
writefln("R = %s",R.toString);
writefln("V/R = %s",(V/R).toString);


V = (4x4)
             0 V           100 mV           200 mV           300 mV
          400 mV           500 mV           600 mV           700 mV
          800 mV           900 mV         1e+03 mV            1.1 V
           1.2 V            1.3 V            1.4 V            1.5 V

R = (4x4)
           100 Ω            200 Ω            400 Ω            800 Ω
          1.6 kΩ           3.2 kΩ           6.4 kΩ          12.8 kΩ
         25.6 kΩ          51.2 kΩ           102 kΩ           205 kΩ
          410 kΩ           819 kΩ          1.64 MΩ          3.28 MΩ

V/R = (4x4)
             0 A           500 µA           500 µA           375 µA
          250 µA           156 µA          93.7 µA          54.7 µA
         31.2 µA          17.6 µA          9.77 µA          5.37 µA
         2.93 µA          1.59 µA           854 nA           458 nA





Comments:

The only tricky part about all this was getting opIndex working with 
compile time variadic arguments. I have not yet gotten DMD to inline the 
generated indexing functions, but it should be doable as they are all 
trivial. I have not been able to pursue this due to a DMD inlining bug 
that in some cases generates "xx is not an lvalue" when compiling with 
-inline.

The main issue that remains is op_r operators that have to work a bit 
differently from non _r operators(*).

I see this all more as a proof of concept than anything else. If anyone 
is interested in the sources, I will find a way to put them up somewhere.

Future extensions would be writing Matrix types, and possibly hook them 
up with a BLAS library. Anyone know if there exists BLAS implementations 
that supports 80-bit reals?

Other things I want to look at are adding stride support to the range() 
type and investigate the possibilities of generating expression templates.

*) Currently, if you have an:

opAdd(T)(T x) {...}

you can't also have an:

opAdd_r(T)(T x) {...}

since this will create ambiguous overloads. I find myself wanting to 
define opAdd_r for all cases where there are no regular non _r operator, 
but there is (AFAIK) no way to do that.

I would go as far as suggesting a change to D: If both a matching opX 
and a matching opX_r overload exists, the opX overload is chosen. This 
will demote the opX_r overloads a bit and is consistent with the way 
commutative operators work, where if for example no a.opAdd(b) exists, 
b.opAdd(a) is called.

/Oskar
Sep 08 2006
next sibling parent Don Clugston <dac nospam.com.au> writes:
Oskar Linde wrote:
 I have been implementing a strided multidimensional array type and am 
 interested in comments.

[snip] This looks really promising -- that improved IFTI support in the last release has clearly opened a lot of possibilities. For now, I'll just comment on one thing.
 Future extensions would be writing Matrix types, and possibly hook them 
 up with a BLAS library. Anyone know if there exists BLAS implementations 
 that supports 80-bit reals?

This is a really interesting issue. Personally I don't of any implementations, but I haven't really looked. My first reaction was, "you'd always be using doubles or floats in matrices, so you'd use SIMD". But even with 64-bit matrices, 80-bit temporary results can still be useful, especially for operations on the diagonal elements of matrices. Basically, all vectors parameters would still be 64-bit doubles, but scalar parameters and returned results would be 80-bit. And then I got thinking -- it's not clear to me that SIMD would be very beneficial when dealing with strided matrices, unless you're extremely clever(and this is probably why the AMD BLAS libraries have hand-crafted ASM even for the BLAS3 functions, you'd don't get much benefit otherwise). Consequently, I strongly suspect that you'd always want to treat strided arrays differently from built-in arrays (ie, stride == 1). I don't know to what extent expression templates could keep track of whether an array has a stride or not, but there'd be a massive performance boost if you could. -Don
Sep 10 2006
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 I have been implementing a strided multidimensional array type and am 
 interested in comments.
 
 It is basically designed after Norbert Nemec's suggestion:
 http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
 +/- some things.
 
 The basic array type looks like this:
 
 struct Array(T,int dims = 1) {
     T* ptr;
     size_t[dims] range;
     ptrdiff_t[dims] stride;
 
     const dimensions = dims;
     alias T ElementType;
 
     ...
 }

Hey Oskar, I've been working on implementing something similar. I'd be interested to see some parts of your implementation. Mine's at http://www.dsource.org/projects/multiarray The main difference between our approaches seems to be that I've made 'dims' non-const (and therefore not a template parameter). I thought that would be better since sometimes it's handy to reshape a 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1. And you have the potential to do broadcasting more easily that way too (E.g. multiply an N dim 1-d array times an Nx5 array just by duplicating the one axis implicitly). The down side is that range and stride then become dynamic arrays. I guess with your approach you can create a new view with the right # of dims when needed. On the other hand that probably means you need more templated member functions, parameterized by Array dimension. Another tradeoff is that I can also have indexing and slicing return arrays of different dimension, but I can't have indexing return a single element (because the return type of opIndex is Array not float and you can't have two different return types). Instead I overloaded opCall for that. So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element view. I also made it a class in the end, though I started out with it being a struct. Partly that decision was motivated by the idea that generally I don't really want to pass 6 or more 32-bit numbers around by value, plus the presence of pointer member data, so using a class seemed reasonable. Having to remember to say 'new' everywhere has been a pain, though. I have mine hooked up with some basic BLAS routines -- the routines for VV, MV, and MM multiplication, and the simplest least squares and linear system solvers. I don't know of any 80-bit capable BLAS, but really single precision is good enough for most of what I do. Double is mostly overkill, but I use it to be sure. --bb
 
 Usage:
 
 Initialization
 
 // Create a 3x3 array
 auto A = Array!(int,2)(3,3);
 
 //This could use a nicer syntax
 A.assign(5,6,7,
          3,1,9,
          8,8,1);
 
 writefln("A = %s",A.toString);
 /*
 A = (3x3)
       5        6        7
       3        1        9
       8        8        1
 */
 
 auto B = Array!(int,2)(3,3);
 B.assign(0,0,1,
          0,2,1,
          7,8,9);
 
 Element wise operators:
 writefln("A*B = %s",(A*B).toString);
 
 Scalar operators:
 writefln("5*B = %s",(5*B).toString);
 
 As in Norberts proposal:
 C.transpose returns a transposed array view (all dimensions reversed)
 C.transpose(dimA,dimB) swaps dimension dimA and dimB
 
 C.diag returns a N-1 dimensional view of the diagonal of the matrix. 
 E.g. a 5x5 identity matrix:
 
 // Zeros is just a convenience wrapper to generate a zero initialized array
 Auto A = zeros!(int)(5,5);
 A.diag[] = 1;
 writefln("A = %s",A.toString);
 
 A = (5x5)
       1        0        0        0        0
       0        1        0        0        0
       0        0        1        0        0
       0        0        0        1        0
       0        0        0        0        1
 
 C.reverse(dim) reverses the given dimension (inverts the stride)
 
 Indexing and slicing:
 
 auto D = Array!(int,3)(7,7,7); // 7x7x7 dimensional array
 
 D[1,2,3] => int
 D[1,2,all] => 7 dimensional slice
 D[1,all,all] => 7x7 dimensional slice
 D[range(1,end-1), all, range(1,end-1)] => 5x7x5 dimensional subarray
 D[end-1, range(end-3,end-1), all] => 2x7 dimensional slice
 
 End works in the same way as $ does for regular arrays.
 range(a,b) is the replacement for a..b (D doesn't support mixed 
 indexing/slicing otherwise)
 
 Slices/subarrays are of course views that share the same data:
 
 auto M = zeros!(int)(4,4);
 M[range(0,2), range(0,2)][] = 1;
 M[range(2,end), range(2,end)][] = 1;
 writefln("M = %s", M.toString);
 
 M = (4x4)
       1        1        0        0
       1        1        0        0
       0        0        1        1
       0        0        1        1
 
 
 .dup => Returns a compact copy of the array
 
 In place map over a function:
 
 M.doMap((int y){return 1<<y;});
 
 Functional map:
 
 writefln("cos(M) = %s",M.map(&cos).toString);
 
 cos(M) = (4x4)
  0.5403   0.5403   1.0000   1.0000
  0.5403   0.5403   1.0000   1.0000
  1.0000   1.0000   0.5403   0.5403
  1.0000   1.0000   0.5403   0.5403
 
 Foreach iteration:
 auto N = zeros!(int)(5,5);
 int i = 0;
 foreach(inout x; N)
   x = ++i;
 
 
 
 Of course, custom types are supported. Some examples just for fun: :)
 
 auto Q = ones!(Rational!(long))(5,5) * 3;
 writefln("Q/N = %s",(Q/N).toString);
 
 Q/N = (5x5)
       3      3/2        1      3/4      3/5
     1/2      3/7      3/8      1/3     3/10
    3/11      1/4     3/13     3/14      1/5
    3/16     3/17      1/6     3/19     3/20
     1/7     3/22     3/23      1/8     3/25
 
 
 
 auto U = zeros!(Uncertain!(double))(4,4);
 
 double val = -8;
 foreach(inout T;U) {
     // +- 10% uncertainty + 0.1
     T += Uncertain!(double)(val,abs(0.1*val)+0.1);
     val += 1;
 }
 writefln("U = %s",U.toString);
 
 U = (4x4)
   (-8.0 ± 0.9)    (-7.0 ± 0.8)    (-6.0 ± 0.7)    (-5.0 ± 0.6)
   (-4.0 ± 0.5)    (-3.0 ± 0.4)    (-2.0 ± 0.3)    (-1.0 ± 0.2)
     (0. ± 0.1)     (1.0 ± 0.2)     (2.0 ± 0.3)     (3.0 ± 0.4)
    (4.0 ± 0.5)     (5.0 ± 0.6)     (6.0 ± 0.7)     (7.0 ± 0.8)
 
 writefln("U/U = %s",(U/U).toString);
 
 U/U = (4x4)
    (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
    (1.0 ± 0.3)     (1.0 ± 0.3)     (1.0 ± 0.3)     (1.1 ± 0.4)
    (nan ± inf)     (1.1 ± 0.4)     (1.0 ± 0.3)     (1.0 ± 0.3)
    (1.0 ± 0.3)     (1.0 ± 0.2)     (1.0 ± 0.2)     (1.0 ± 0.2)
 
 
 
 
 auto V = zeros!(Voltage)(4,4);
   val = 0;
   foreach(inout v; V) {
     v = val * volt;
     val += 0.1;
   }
 
 auto R = zeros!(Resistance)(4,4);
   val = 100;
   foreach(inout r; R) {
     r = val * ohm;
     val *= 2;
   }
 
 writefln("V = %s",V.toString);
 writefln("R = %s",R.toString);
 writefln("V/R = %s",(V/R).toString);
 
 
 V = (4x4)
             0 V           100 mV           200 mV           300 mV
          400 mV           500 mV           600 mV           700 mV
          800 mV           900 mV         1e+03 mV            1.1 V
           1.2 V            1.3 V            1.4 V            1.5 V
 
 R = (4x4)
           100 Ω            200 Ω            400 Ω            800 Ω
          1.6 kΩ           3.2 kΩ           6.4 kΩ          12.8 kΩ
         25.6 kΩ          51.2 kΩ           102 kΩ           205 kΩ
          410 kΩ           819 kΩ          1.64 MΩ          3.28 MΩ
 
 V/R = (4x4)
             0 A           500 µA           500 µA           375 µA
          250 µA           156 µA          93.7 µA          54.7 µA
         31.2 µA          17.6 µA          9.77 µA          5.37 µA
         2.93 µA          1.59 µA           854 nA           458 nA
 
 
 
 
 
 Comments:
 
 The only tricky part about all this was getting opIndex working with 
 compile time variadic arguments. I have not yet gotten DMD to inline the 
 generated indexing functions, but it should be doable as they are all 
 trivial. I have not been able to pursue this due to a DMD inlining bug 
 that in some cases generates "xx is not an lvalue" when compiling with 
 -inline.
 
 The main issue that remains is op_r operators that have to work a bit 
 differently from non _r operators(*).
 
 I see this all more as a proof of concept than anything else. If anyone 
 is interested in the sources, I will find a way to put them up somewhere.
 
 Future extensions would be writing Matrix types, and possibly hook them 
 up with a BLAS library. Anyone know if there exists BLAS implementations 
 that supports 80-bit reals?
 
 Other things I want to look at are adding stride support to the range() 
 type and investigate the possibilities of generating expression templates.
 
 *) Currently, if you have an:
 
 opAdd(T)(T x) {...}
 
 you can't also have an:
 
 opAdd_r(T)(T x) {...}
 
 since this will create ambiguous overloads. I find myself wanting to 
 define opAdd_r for all cases where there are no regular non _r operator, 
 but there is (AFAIK) no way to do that.
 
 I would go as far as suggesting a change to D: If both a matching opX 
 and a matching opX_r overload exists, the opX overload is chosen. This 
 will demote the opX_r overloads a bit and is consistent with the way 
 commutative operators work, where if for example no a.opAdd(b) exists, 
 b.opAdd(a) is called.
 
 /Oskar

Feb 16 2007
next sibling parent reply Neal Becker <ndbecker2 gmail.com> writes:
I think it makes a lot of sense to build arrays from 2 components.  One
component is the storage.  The second component is the array, which is an
interpretation of the storage.  I think this simplifies slicing and
projections.
Feb 17 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Neal Becker wrote:
 I think it makes a lot of sense to build arrays from 2 components.  One
 component is the storage.  The second component is the array, which is an
 interpretation of the storage.  I think this simplifies slicing and
 projections.

You mean exactly how Oskar and I are doing it? Or you mean a more flexible notion of storage? I agree it would be nice to be able to use different storage back ends besides strided memory. Diagonal, Triangular, Sparse, Block, Banded, etc. would all be nice. The Matrix Template Library did something like that, though I'd say with mixed results: http://www.osl.iu.edu/research/mtl/ The high-level idea is nice, but the end result was overly complicated IMHO, both internally and externally. And it depended critically on good optimization from the compiler to get decent performance since everything was in C++ (no tie-in with optimized Fortran back-ends like ATLAS). --bb
Feb 18 2007
prev sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Bill Baxter skrev:
 Oskar Linde wrote:
 I have been implementing a strided multidimensional array type and am 
 interested in comments.

 It is basically designed after Norbert Nemec's suggestion:
 http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html 

 +/- some things.

 The basic array type looks like this:

 struct Array(T,int dims = 1) {
     T* ptr;
     size_t[dims] range;
     ptrdiff_t[dims] stride;

     const dimensions = dims;
     alias T ElementType;

     ...
 }

Hey Oskar, I've been working on implementing something similar. I'd be interested to see some parts of your implementation. Mine's at http://www.dsource.org/projects/multiarray

Hi Bill, I'm sorry for the late reply. I've had a quite hectic last few months and not had time to dig into this. I have now at least extracted my old multiarray code and turned it into something that compiles (and converted it to use the new D variadic templates instead of the old variadic.d hack): http://www.csc.kth.se/~ol/multiarray.tar.gz the multiarray module is sci.matrix.multiarray. Everything is in a very crude shape.
 The main difference between our approaches seems to be that I've made 
 'dims' non-const (and therefore not a template parameter).
 
 I thought that would be better since sometimes it's handy to reshape a 
 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the 
 potential to do broadcasting more easily that way too (E.g. multiply an 
 N dim 1-d array times an Nx5 array just by duplicating the one axis 
 implicitly).  The down side is that range and stride then become dynamic 
 arrays.  

 I guess with your approach you can create a new view with the 
 right # of dims when needed.  

Yes. This is very powerful. Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a 1x10x10 array or anything else. And that is all deduced at compile time.
 On the other hand that probably means you 
 need more templated member functions, parameterized by Array dimension.

Possibly.
 Another tradeoff is that I can also have indexing and slicing return 
 arrays of different dimension, but I can't have indexing return a single 
 element (because the return type of opIndex is Array not float and you 
 can't have two different return types).  Instead I overloaded opCall for 
 that.  So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element 
 view.

In addition to being able to have full indexing return a single element vs partial indexing returning slices I think the greatest advantage of having the dimensionality as a compile time constant is the improved compile time consistency checks. I believe you will extremely rarely dynamically adjust the dimensionality of a multidimensional array. You are almost always required to know the dimensionality at compile time anyway. You can always make slices of different dimensionality. Having a dimension with a stride of 0, you could even duplicate the data over a dimension such as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of memory. Reshaping has to be done explicitly (which I see as a good thing). In general, I think this falls back on the same conclusion as the discussion regarding intrinsic vector operations and small static arrays being value types a while ago. Low dimensionalities will most of the time turn out to be static, while larger dimensions are dynamic. The dimensionality of a multidimensional array will typically be static and of a low order (For example, I've personally never encountered a case needing more than 6 dimensions).
 I also made it a class in the end, though I started out with it being a 
 struct.  Partly that decision was motivated by the idea that generally I 
 don't really want to pass 6 or more 32-bit numbers around by value, plus 
 the presence of pointer member data, so using a class seemed reasonable. 
  Having to remember to say 'new' everywhere has been a pain, though.

The biggest disadvantage I see is that data shuffling (such as in a loop assigning slices of one array to slices of another one) will result in the creation of lots of temporary heap allocated classes instead of, when using a struct, a small constant stack memory usage. One option could possibly be to make slicing return proxy structs that (in the future implicitly) converted to class instances. Note that the by-value passing doesn't necessarily mean any reduction in performance. Being careful about making functions not changing the shape of the array take the array as an inout (future const/readonly ref) while those that need must make the same data-copy when dup-ing the class anyway. Another option to avoid passing lots of data by value could possibly be to make a container struct containing a pointer to another struct containing all the data.
 I have mine hooked up with some basic BLAS routines -- the routines for 
 VV, MV, and MM multiplication, and the simplest least squares and linear 
 system solvers.  

That sounds really interesting. I will definitely give it a look. /Oskar
Mar 16 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 Bill Baxter skrev:

 Hi Bill,
 
 I'm sorry for the late reply. I've had a quite hectic last few months 
 and not had time to dig into this. I have now at least extracted my old 
 multiarray code and turned it into something that compiles (and 
 converted it to use the new D variadic templates instead of the old 
 variadic.d hack):
 
 http://www.csc.kth.se/~ol/multiarray.tar.gz

Great! Thanks for putting this online.
 the multiarray module is sci.matrix.multiarray. Everything is in a very 
 crude shape.
 
 The main difference between our approaches seems to be that I've made 
 'dims' non-const (and therefore not a template parameter).

 I thought that would be better since sometimes it's handy to reshape a 
 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1.  And you have the 
 potential to do broadcasting more easily that way too (E.g. multiply 
 an N dim 1-d array times an Nx5 array just by duplicating the one axis 
 implicitly).  The down side is that range and stride then become 
 dynamic arrays.  

 I guess with your approach you can create a new view with the right # 
 of dims when needed.  

Yes. This is very powerful. Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a 1x10x10 array or anything else. And that is all deduced at compile time.

Hmm, how do you accomplish that? I'll take a look at the code but it seems like you need a lot of partial specializations for the 0 case. Any opIndex call that has a zero as one of its arguments needs to have a special implementation because it's returning a different type. But I thought specialization and IFTI were mutually exclusive.
 On the other hand that probably means you need more templated member 
 functions, parameterized by Array dimension.

Possibly.
 Another tradeoff is that I can also have indexing and slicing return 
 arrays of different dimension, but I can't have indexing return a 
 single element (because the return type of opIndex is Array not float 
 and you can't have two different return types).  Instead I overloaded 
 opCall for that.  So M(4,3) is the value of element 4,3, but M[4,3] is 
 a 1-element view.

In addition to being able to have full indexing return a single element vs partial indexing returning slices I think the greatest advantage of having the dimensionality as a compile time constant is the improved compile time consistency checks. I believe you will extremely rarely dynamically adjust the dimensionality of a multidimensional array. You are almost always required to know the dimensionality at compile time anyway. You can always make slices of different dimensionality. Having a dimension with a stride of 0, you could even duplicate the data over a dimension such as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of memory. Reshaping has to be done explicitly (which I see as a good thing).

What about allowing flexible expanding of dimensions? I'm basically trying to make a pseudo-clone of numpy (www.numpy.org), and one thing it does a lot is "broadcasting". http://www.scipy.org/EricsBroadcastingDoc http://www.onlamp.com/pub/a/python/2000/09/27/numerically.html
 In general, I think this falls back on the same conclusion as the 
 discussion regarding intrinsic vector operations and small static arrays 
 being value types a while ago. Low dimensionalities will most of the 
 time turn out to be static, while larger dimensions are dynamic. The 
 dimensionality of a multidimensional array will typically be static and 
 of a low order (For example, I've personally never encountered a case 
 needing more than 6 dimensions).

Yeh, there is a limit. Numpy is supposed to be dimension agnostic, but there is a hard-coded MAX_DIMS in the C code that implements the backend. I think it's set to something like 32.
 I also made it a class in the end, though I started out with it being 
 a struct.  Partly that decision was motivated by the idea that 
 generally I don't really want to pass 6 or more 32-bit numbers around 
 by value, plus the presence of pointer member data, so using a class 
 seemed reasonable.  Having to remember to say 'new' everywhere has 
 been a pain, though.

The biggest disadvantage I see is that data shuffling (such as in a loop assigning slices of one array to slices of another one) will result in the creation of lots of temporary heap allocated classes instead of, when using a struct, a small constant stack memory usage.

Yep. But basically since I was porting from Python code my thinking was "Python already does all those allocations, and millions more -- a D implementation can only be faster" :-) I think the thing that really got me to switch to class was that there were some IFTI bugs that only surfaced when using structs and which went away when using a class. They're fixed now, so maybe it's time to go back to a struct. *Especially* if we get some new constref goodness.
 One option could possibly be to make slicing return proxy structs that 
 (in the future implicitly) converted to class instances.
 
 Note that the by-value passing doesn't necessarily mean any reduction in 
 performance. Being careful about making functions not changing the shape 
 of the array take the array as an inout (future const/readonly ref) 
 while those that need must make the same data-copy when dup-ing the 
 class anyway.

Yeh, I hate to make things 'inout' when I really want const reference behavior. I'm going to go an rewrite a bunch of stuff as soon as we get const ref. Another nice thing about going the class route with current D, is that you can have a function which potentially returns the input unmodified. For instance a function that makes sure the input is at least 2D, and if so just returns the input as-is, otherwise it creates a new view. With structs that'll turn into create a new view no matter what. On the other hand, you need those kinds of functions in Numpy a lot because you tend to write functions that take an "A" where "A" could be a python array, or a python tuple, or a numpy array of any dimension, or any other sequence type. Using numpy.asarray_2d(A) at the top of your functions is in many ways a substitute for not having static typing. It's hard to say without actually going through the exercise. I don't have much experience using big flexible array packages in statically typed languages like C++. My experience is with the array handling in Matlab and Numpy, both of which are dynamic. Maybe static dimension is the way to go with a statically typed language.
 Another option to avoid passing lots of data by value could possibly be 
 to make a container struct containing a pointer to another struct 
 containing all the data.

Sounds like significantly more complexity for not so much benefit.
 I have mine hooked up with some basic BLAS routines -- the routines 
 for VV, MV, and MM multiplication, and the simplest least squares and 
 linear system solvers.  

That sounds really interesting. I will definitely give it a look.

--bb
Mar 16 2007