www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proposal for multidimensional arrays

reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Hi there,

I just finished the first version of my proposal for multidimensional arrays
in D:

http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html

I believe, the core concept is so absolutely beautiful that you just have to
agree. (Sorry, but I think I've just fallen in love with it while working
on the details...)

Of course, many of the details may still need some refinement. Please:
before you bring any ideas for changes, make sure you read the comments in
the text. I tried to write down the rationals for most of the possibly
controversial decisions I had to make.

Ciao,
Nobbi
May 09 2004
next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Norbert Nemec wrote:

 Hi there,
 
 I just finished the first version of my proposal for multidimensional
 arrays in D:
 

 
 I believe, the core concept is so absolutely beautiful that you just have
 to agree. (Sorry, but I think I've just fallen in love with it while
 working on the details...)
 
 Of course, many of the details may still need some refinement. Please:
 before you bring any ideas for changes, make sure you read the comments in
 the text. I tried to write down the rationals for most of the possibly
 controversial decisions I had to make.
 
 Ciao,
 Nobbi

Very nicely written! Native support of 2d and n-d arrays would be nice, but just as important is a good interface to BLAS, LAPACK and other Fortran code. IIRC, BLAS can accept row-oriented data but LAPACK is column-oriented only. The fact that Fortran indexing is 1-based is probably not going to be a problem but transposing all those matrices to use LAPACK is a problem - though I haven't timed anything to know for sure. Some more mundane questions/comments: - How about having n-d arrays be column oriented? Since your proposal uses a new syntax it wouldn't be confusing to have column-oriented storage. - Why remove dynamic arrays? Assigning to length (the note about assigning to "size" confused me - did you mean "length"?) is pretty useful. - N-D arrays don't have to have any way of getting the ranges. Perhaps you could add a read-only property .length or .range or something that returns a dynamic array of uints. - Is is possible to reshape an n-d array or to perform the n-d equivalent to "slicing a pointer": char* ptr; char[] arr = ptr[0..10]; -Ben
May 09 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Ben Hinkle wrote:
 Very nicely written! Native support of 2d and n-d arrays would be nice,
 but just as important is a good interface to BLAS, LAPACK and other
 Fortran code. IIRC, BLAS can accept row-oriented data but LAPACK is
 column-oriented only. The fact that Fortran indexing is 1-based is
 probably not going to be a problem but transposing all those matrices to
 use LAPACK is a problem - though I haven't timed anything to know for
 sure.

 Some more mundane questions/comments:
 - How about having n-d arrays be column oriented? Since your proposal uses
 a new syntax it wouldn't be confusing to have column-oriented storage.

Actually, I had this in mind from the very beginning. The proposal already contains everything necessary to handle Fortran-arrays with the same ease as C-arrays, I just forgot to mention it explicitely. A few additional tiny features might make it more convenient, but I haven't made up my mind what they would have to look like.
 - Why remove dynamic arrays? Assigning to length (the note about assigning
 to "size" confused me - did you mean "length"?) is pretty useful.

They are not removed, just renamed. The length of an array (of course, length, sorry about talking about "size" there) never was really "dynamic". Assigning to .length just obscured this fact. It is much clearer to either use the extended .dup(int newlength) property if you want a new array with the data copied, or use slicing if the new array reference should point to the same location in memory. I just promote calling the former dynamic arrays what they really are: "array references", i.e. pointers with additional shape information.
 - N-D arrays don't have to have any way of getting the ranges. Perhaps you
 could add a read-only property .length or .range or something that returns
 a dynamic array of uints.

Of course, that has to be added. Anyhow, it should, of course, return a fixed size array, since the number of dimensions is known at compile time.
 - Is is possible to reshape an n-d array or to perform the n-d equivalent
 to "slicing a pointer":
  char* ptr;
  char[] arr = ptr[0..10];

Might be an idea. Free reshaping beyond the capabilities of slicing and transposing should then be done by first converting to a pointer and then back to an array with that "slicing of pointers" idea of yours. Actually, I have not yet thought of any low-level interface at all. Of course one should be able to access all the internas of arrays in some way. It would be unsafe, and it should be made clear that it should be avoided, but it should still be possible in a comfortable, portable way. I'll try to work in the details you mention ASAP. Ciao, Nobbi
May 09 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8Bit

Reading your proposal I started wondering about your earlier attempts to
write Matrix/ND-array templates. What were the main issues? Was it syntax?
Was it performance concerns?

I've attached two files of my experiments for a day. One has some templates
and the other has an example. I think this basic approach was suggested in
a post on the now-defunct newsgroup. Assuming the templates can be inlined
it seems like a nice set of templates would work out. I haven't tried any
experiments, though. Did you try something like this?

-Ben
May 09 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Ben Hinkle wrote:

 Reading your proposal I started wondering about your earlier attempts to
 write Matrix/ND-array templates. What were the main issues? Was it syntax?
 Was it performance concerns?

Both: I could either have gone for pretty syntax, sacrificing performance and clearness of the concepts, or going for performance and clearness, the resulting syntax would have been ugly like something. In neither case, the array templates would have been a good demonstration for what I have in mind for the language, and in neither case, they would have been really useful in practice.
 I've attached two files of my experiments for a day. One has some
 templates and the other has an example. I think this basic approach was
 suggested in a post on the now-defunct newsgroup. Assuming the templates
 can be inlined it seems like a nice set of templates would work out. I
 haven't tried any experiments, though. Did you try something like this?

I did something similar and even got a few steps beyond, working with strides, but once I entered the subject of slicing and partial indexing, everything just became a complete mess, and I realized that I spent so much time trying to find workarounds for current shortcomings of the language that the basic concepts got more and more washed out. My resolution: The language is not yet flexible enough to implement my vision of comfortable and efficient arrays in the library, and I judged that instead of trying to get Walter to include five tiny changes, it might be easier and more worthwhile to go for the big thing... B.t.w: I saw that "reshape" function in your code. I thought about this myself, but came to the conclusion that, in my concept of arrays with strides, this would not work out too well. In general, reshaping would only work for continuous arrays, and even then, it would be an unsafe operation that should rather be done by casting to a pointer and back again.
May 09 2004
next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Norbert Nemec wrote:

 Ben Hinkle wrote:
 
 Reading your proposal I started wondering about your earlier attempts to
 write Matrix/ND-array templates. What were the main issues? Was it
 syntax? Was it performance concerns?

Both: I could either have gone for pretty syntax, sacrificing performance and clearness of the concepts, or going for performance and clearness, the resulting syntax would have been ugly like something. In neither case, the array templates would have been a good demonstration for what I have in mind for the language, and in neither case, they would have been really useful in practice.

Most of the time the strides are 1 so building in the more general case can hurt performance of the common case. Also n-d arrays can have complex storage implementations (think sparse arrays, triangular arrays, etc) so building something into the language will only be practical up to a certain point. At some point the library will have to take over.
 I've attached two files of my experiments for a day. One has some
 templates and the other has an example. I think this basic approach was
 suggested in a post on the now-defunct newsgroup. Assuming the templates
 can be inlined it seems like a nice set of templates would work out. I
 haven't tried any experiments, though. Did you try something like this?

I did something similar and even got a few steps beyond, working with strides, but once I entered the subject of slicing and partial indexing, everything just became a complete mess, and I realized that I spent so much time trying to find workarounds for current shortcomings of the language that the basic concepts got more and more washed out. My resolution: The language is not yet flexible enough to implement my vision of comfortable and efficient arrays in the library, and I judged that instead of trying to get Walter to include five tiny changes, it might be easier and more worthwhile to go for the big thing...

Slicing with a stride or slicing by arbitrary arrays (meaning the indices don't have to be equally spaced or even distinct) came up for dynamic arrays back on the old newsgroup but I can't remember Walter's reaction - I don't think he posted any. A number of people expressed interest in more powerful slicing for 1D arrays.
 B.t.w: I saw that "reshape" function in your code. I thought about this
 myself, but came to the conclusion that, in my concept of arrays with
 strides, this would not work out too well. In general, reshaping would
 only work for continuous arrays, and even then, it would be an unsafe
 operation that should rather be done by casting to a pointer and back
 again.

I agree that reshaping wouldn't be a member function of subarrays or if it was it would extract the subarray before reshaping. I don't think it is "unsafe", though, since you can't change the number of elements (not my little reshape didn't check for this but it should). It is very handy in MATLAB. It is a pity that my reshape can only change the dimension lengths and not the number of dimensions - it would be great to have functions to squeeze out empty dimensions (the equivalent to MATLAB's "squeeze") or to flatten an n-d array into a vector (the equivalent to MATLAB's (:) notation). -Ben
May 10 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Ben Hinkle wrote:

 Most of the time the strides are 1 so building in the more general case
 can hurt performance of the common case.

True. My thought on this was: For multidimensional arrays, all dimensions but the last one (in C-storage alignment) need strides != 1 in any case. In your implementations, these strides are calculated from the ranges of the lower dimensions. It does not change the access time, if, instead, the stride is saved in a separate field. The array reference gets a bit bigger, but usually, this does not mean much overhead. For 1-d arrays, I believe, an unstrided, "lightweight" array type makes sense. For more dimensions, the gains gets smaller and smaller, especially, since the timecritical inner loop can easily be optimized by the compiler for the special case of stride 1.
 Also n-d arrays can have complex
 storage implementations (think sparse arrays, triangular arrays, etc) so
 building something into the language will only be practical up to a
 certain point. At some point the library will have to take over.

True. I really believe that as much as possible should be put into the language. Anyhow: rectangular arrays should clearly be a feature of the language itself. The only reason that I went for striding as well was, that the step from multidimensional arrays to fully strided arrays was so surprisingly small.
 Slicing with a stride or slicing by arbitrary arrays (meaning the indices
 don't have to be equally spaced or even distinct) came up for dynamic
 arrays back on the old newsgroup but I can't remember Walter's reaction -
 I don't think he posted any. A number of people expressed interest in more
 powerful slicing for 1D arrays.

My opinion would be: striding with varying steps is clearly far too complex to be put into the language. Striding of 1D-arrays only would be too complex as well. Striding for ND-arrays comes naturally at minimal additional cost, so if we want ND-arrays, we would be crazy to ignore the power we can get by introducing full striding.
 B.t.w: I saw that "reshape" function in your code. I thought about this
 myself, but came to the conclusion that, in my concept of arrays with
 strides, this would not work out too well. In general, reshaping would
 only work for continuous arrays, and even then, it would be an unsafe
 operation that should rather be done by casting to a pointer and back
 again.

I agree that reshaping wouldn't be a member function of subarrays or if it was it would extract the subarray before reshaping. I don't think it is "unsafe", though, since you can't change the number of elements (not my little reshape didn't check for this but it should). It is very handy in MATLAB. It is a pity that my reshape can only change the dimension lengths and not the number of dimensions - it would be great to have functions to squeeze out empty dimensions (the equivalent to MATLAB's "squeeze") or to flatten an n-d array into a vector (the equivalent to MATLAB's (:) notation).

That is true for arrays that are stored continuously. As soon as you slice a rectangular array (even without explicit strides) you will get an array stored in a non-continuous area. Reshaping this without .dup-ing it first would only be possible in very special cases. In MATLAB, of course, there is no problem, since array contents are just copied around intelligently.
May 10 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Norbert Nemec wrote:

 Ben Hinkle wrote:
 
 Reading your proposal I started wondering about your earlier attempts to
 write Matrix/ND-array templates. What were the main issues? Was it
 syntax? Was it performance concerns?

Both: I could either have gone for pretty syntax, sacrificing performance and clearness of the concepts, or going for performance and clearness, the resulting syntax would have been ugly like something. In neither case, the array templates would have been a good demonstration for what I have in mind for the language, and in neither case, they would have been really useful in practice.

The only thing D lacks now in terms of syntax from your proposal is (aside from the mtype[[N]]) the step in slicing. If A is an nd array then it should be possible to write templates for A[i][j][k] A[i_0..i_1][j][k] A[i][j_0..j_1][k] A[i_0..i_1][j_0..j_1][k] etc and to add slicing step something like A[i_0..i_1].by(i_step)[j][k] A[i][j_0..j_1].by(j_step)[k] etc doesn't seem too bad. Or maybe just make a function A.slice(i_0,i_1,i_step)[j][k] A[i].slice(j_0,j_1,j_step)[k] Are the templates really that nasty? The suggestion about opIndex and opIndexAssign makes alot of sense to me (and, Walter, why isn't there any opSliceAssign to overload slice assignment?) and it would be pretty easy to convince Walter to act on it. If D had opCallAssign then something like A(i)(j)(k) = 10 A(i_0,i_1)(j)(k) = 10 A(i_0,i_1,i_step)(j)(k) = 10 would look pretty usable and be extendable to more non-trivial indexing expressions like int[N] perm; ... fill perm with a permutation of 0..N-1 A(perm,j,k) // permute the 1st dimension of A Maybe with enough opFooAssign overloads we'd be all set :-) -Ben
May 11 2004
parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Ben Hinkle wrote:

 Norbert Nemec wrote:
 
 Ben Hinkle wrote:
 
 Reading your proposal I started wondering about your earlier attempts to
 write Matrix/ND-array templates. What were the main issues? Was it
 syntax? Was it performance concerns?

Both: I could either have gone for pretty syntax, sacrificing performance and clearness of the concepts, or going for performance and clearness, the resulting syntax would have been ugly like something. In neither case, the array templates would have been a good demonstration for what I have in mind for the language, and in neither case, they would have been really useful in practice.

The only thing D lacks now in terms of syntax from your proposal is (aside from the mtype[[N]]) the step in slicing. If A is an nd array then it should be possible to write templates for A[i][j][k] A[i_0..i_1][j][k] A[i][j_0..j_1][k] A[i_0..i_1][j_0..j_1][k]

The first one would work, because [i] could reduce the dimesionality by one. For slicing, this will not work any more: [i_0..i_1] does not reduce the dimensionality of the array, so [j] would again index the first dimension. I've spent some thought on this, but I came to the conclusion that it is not a good idea mixing [i][j] and [i,j]. It is a way to do a crude hack at the moment, but you run into problems very quickly if you want to go beyond the very basic indexing.
 Are the templates really that nasty?

No, but as I said: you can either go for a clean internal structure and clumsy syntax for the user or try to get a comfortable syntax and end up with messy and unmaintainable internals. It is the same in C++, even though, with references, implicit instantiations and other features, I find it a little bit more flexible if you want to implement a syntax that really is comforable to the user. (But then, don't even try to understand or debug the library internals...)
 opSliceAssign

For that, you would first have to say, what a range expression actually is. The .. is not an operator, but it only exists within brackets. opSlice should therefore be something like opSlice(int min0,int max0,int stride0,int min1,int max1,int stride1,...) Might be an idea in the future, but I think we should not decide on this now, as long as it is not clear what happens with respect to multidimensional arrays in the language. If my proposal is stalled and it is decided that something like this should never get into the language, then it might make sense to think about little tweaks that allow to do the same thing in the library. Otherwise, it seems much more reasonable to design the details of opIndexAssign and opSlice in a way that fits into the big picture.
May 11 2004