www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proposal: Multidimensional opSlice solution

reply Norbert Nemec <Norbert Nemec-online.de> writes:
Hi there,

in implementing multi-dimensional arrays, the current way of overloading 
the slicing operator does not scale up.

Currently, there is opIndex working for an arbitrary number of indices, 
but opSlice works only for one dimension. Ultimately, it should be 
possible to allow slicing for more than one dimension, and mixing it 
with indexing for e.g.:
	A[4,7..8,7,2..5]
So far, no clean solution for overloading this has been suggested.

Looking at Python, there is a range operator a:b or a:b:c that takes two 
or three integer and returns a 'slice' object. I.e. a builtin type. The 
__getitem__ method (Python's equivalent to opIndex) then simply receives 
either integers or 'slice' object for each dimension. The range operator 
only exists within indexing expressions.

The more D-ish equivalent of this solution is another opXXX method. How 
about the following:

----------------

* The slicing operation obj[] is overloaded by
	obj.opIndex()

* The slicing operation obj[a..b] is overloaded by
	obj.opIndex(obj.opRange(a,b))

* Within an indexing expression on an object 'obj', an expression of the 
form 'a..b' is transformed into a call
	obj.opRange(a,b)
furthermore, an expression of the form 'a..b:c' is transformed into
	obj.opRange(a,b,c)
to where 'c' is the stride.

* Alternatively, template functions of the form
	obj.opRange(int N)(a,b)
	obj.opRange(int N)(a,b,c)
are tried, with N set to the dimension number within the indexing 
expression. It is an error for both opRange(a,b,c) and
opRange(int N)(a,b,c) to exist in the same user defined type.
For example
	obj[a..b,d..e:f]
would become
	obj.opIndex(obj.opRange!(0)(a,b),obj.opRange!(1)(d,e,f))

* The opRange function may return an object of arbitrary type with is 
passed on to opIndex (or opOpIndex or opIndexAssign) to perform the 
actual slicing on the object. Proper overloading of these functions 
allows arbitrary mixing of slicing and indexing within the same expression.

* All op*Slice* operators become obsolete.

----------------

I am not sure whether opRange is the best name, as it clashes with the 
range concept in the libraries. I would have liked to reuse the name 
opSlice which would become unused by this change. However - the 
transition to this completely different meaning would be rather painful. 
Alternative name suggestions are welcome.

Greetings,
Norbert
Mar 08 2010
parent reply Don <nospam nospam.com> writes:
Norbert Nemec wrote:
 Hi there,
 
 in implementing multi-dimensional arrays, the current way of overloading 
 the slicing operator does not scale up.
 
 Currently, there is opIndex working for an arbitrary number of indices, 
 but opSlice works only for one dimension. Ultimately, it should be 
 possible to allow slicing for more than one dimension, and mixing it 
 with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

A solution was suggested while you were away. You don't need a new opRange operator, a simple tuple struct like: struct Slice(T) { T from; T to; } in std.object is enough. Note that: A[4, Slice(7,8), 7, Slice(2,5)] will work with the existing compiler. So it's just a tiny syntax sugar issue. And another simple possibility is to turn 7..8 into int[2][7,8]. However, Andrei argued that slicing of multidimensional arrays is so rarely used that syntax sugar is not necessary. Thus, it's not in D2.
Mar 08 2010
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-03-08 09:55:53 -0500, Don <nospam nospam.com> said:

 However, Andrei argued that slicing of multidimensional arrays is so 
 rarely used that syntax sugar is not necessary. Thus, it's not in D2.

And that's sad, because there are many other uses for a Slice struct. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 08 2010
prev sibling next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, and 
 mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

A solution was suggested while you were away. You don't need a new opRange operator, a simple tuple struct like: struct Slice(T) { T from; T to; } in std.object is enough. Note that: A[4, Slice(7,8), 7, Slice(2,5)] will work with the existing compiler. So it's just a tiny syntax sugar issue.

I know this solution. It is exactly the "syntax sugar" issue that I see as the problem here: The compiler would need to be aware of the data type Slice. It would therefore have to be something like a "builtin" type. If I am not mistaken, the language definition so far never makes use of "builtin" struct types.
 And another simple possibility is to turn 7..8 into int[2][7,8].

That solution would unnecessarily get in to way of implementing indexing by an array of indices: auto A = MyArray(["zero","one","two","three"]); assert( A[ [2,3,0] ] == MyArray(["two","three","zero"]) );
 However, Andrei argued that slicing of multidimensional arrays is so 
 rarely used that syntax sugar is not necessary. Thus, it's not in D2.

Outside of numerics, multidimensional arrays are indeed rarely used. In numerical programming, however, they are an essential ingredient. My ultimate goal is to multidimensional arrays in D as comfortable as in Fortran, Matlab or Python/NumPy in order to make D a real competitor in that market.
Mar 08 2010
parent reply Don <nospam nospam.com> writes:
Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, and 
 mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

A solution was suggested while you were away. You don't need a new opRange operator, a simple tuple struct like: struct Slice(T) { T from; T to; } in std.object is enough. Note that: A[4, Slice(7,8), 7, Slice(2,5)] will work with the existing compiler. So it's just a tiny syntax sugar issue.

I know this solution. It is exactly the "syntax sugar" issue that I see as the problem here: The compiler would need to be aware of the data type Slice. It would therefore have to be something like a "builtin" type. If I am not mistaken, the language definition so far never makes use of "builtin" struct types.

You are mistaken <g>. object, TypeInfo, AssociativeArray, ... Complex will be added to that list, too.
 And another simple possibility is to turn 7..8 into int[2][7,8].

That solution would unnecessarily get in to way of implementing indexing by an array of indices: auto A = MyArray(["zero","one","two","three"]); assert( A[ [2,3,0] ] == MyArray(["two","three","zero"]) );

But that syntax doesn't allow slices. It's only an issue if a dimension can be BOTH a slice T..T, AND T[2], and they are not equivalent. I tried to come up with a plausible scenario where it was really a problem, but failed. It requires that both T and T[2] are valid indices for the same dimension, AND that slicing is supported. But this is all a bit academic since it's not going to happen. Maybe if you'd been around a few months back, things would be different.
 However, Andrei argued that slicing of multidimensional arrays is so 
 rarely used that syntax sugar is not necessary. Thus, it's not in D2.

Outside of numerics, multidimensional arrays are indeed rarely used. In numerical programming, however, they are an essential ingredient. My ultimate goal is to multidimensional arrays in D as comfortable as in Fortran, Matlab or Python/NumPy in order to make D a real competitor in that market.

Agreed. Multidimensional slicing is a costly operation, though, so I think the absence of syntax sugar is tolerable. I see syntax sugar for slicing as much less important than multidimensional $, where the alternative is really quite ugly. So I pushed hard for opDollar.
Mar 08 2010
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Don wrote:
 Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, and 
 mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

A solution was suggested while you were away. You don't need a new opRange operator, a simple tuple struct like: struct Slice(T) { T from; T to; } in std.object is enough. Note that: A[4, Slice(7,8), 7, Slice(2,5)] will work with the existing compiler. So it's just a tiny syntax sugar issue.

I know this solution. It is exactly the "syntax sugar" issue that I see as the problem here: The compiler would need to be aware of the data type Slice. It would therefore have to be something like a "builtin" type. If I am not mistaken, the language definition so far never makes use of "builtin" struct types.

You are mistaken <g>. object, TypeInfo, AssociativeArray, ... Complex will be added to that list, too.

It will? For complex literals, then? -Lars
Mar 08 2010
parent Don <nospam nospam.com> writes:
Lars T. Kyllingstad wrote:
 Don wrote:
 Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, 
 and mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

A solution was suggested while you were away. You don't need a new opRange operator, a simple tuple struct like: struct Slice(T) { T from; T to; } in std.object is enough. Note that: A[4, Slice(7,8), 7, Slice(2,5)] will work with the existing compiler. So it's just a tiny syntax sugar issue.

I know this solution. It is exactly the "syntax sugar" issue that I see as the problem here: The compiler would need to be aware of the data type Slice. It would therefore have to be something like a "builtin" type. If I am not mistaken, the language definition so far never makes use of "builtin" struct types.

You are mistaken <g>. object, TypeInfo, AssociativeArray, ... Complex will be added to that list, too.

It will? For complex literals, then? -Lars

need them any more.
Mar 08 2010
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Don wrote:
 Norbert Nemec wrote:
 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

You are mistaken <g>. object, TypeInfo, AssociativeArray, ... Complex will be added to that list, too.

Indeed? If that is so, a builtin Slice type as syntactic sugar would be just as good as my proposal. Just note that the slice operator "a..b" should also allow strided slicing "a..b:c"
     auto A = MyArray(["zero","one","two","three"]);
     assert( A[ [2,3,0] ] == MyArray(["two","three","zero"]) );

But that syntax doesn't allow slices. It's only an issue if a dimension can be BOTH a slice T..T, AND T[2], and they are not equivalent. I tried to come up with a plausible scenario where it was really a problem, but failed. It requires that both T and T[2] are valid indices for the same dimension, AND that slicing is supported.

Which is not only feasible but implemented and widely used in Fortran, Matlab and Python/NumPy: a[5] --> returns a single element a[5..8] --> returns a slice [ a[5],a[6],a[7] ] a[[5,3,1,4,6]] --> returns a list of elements selected by the list of indices [a[5],a[3],a[1],a[4],a[6]] In the last case, the list of indices can of course have just two entries, so a[[5,10]] == [ a[5],a[10] ]
 But this is all a bit academic since it's not going to happen. Maybe if 
 you'd been around a few months back, things would be different.

Yea, seems like I missed some important decisions. May indeed be too late for some ideas. Still - not giving up quite yet.
 Agreed. Multidimensional slicing is a costly operation, though, so I 
 think the absence of syntax sugar is tolerable.

Costly? Not at all! Just a bit of integer arithmetic that would need to be done anyway to access the data. Most numerical code consist of many nested loops containing very few operations. Strided multidimensional slicing operations in combination with array expressions allow to express this equivalently in very few lines that can be optimized very effectively. As for the necessity of syntactic sugar, consider one typical line that I picked from my own numerical Python code: Python/NumPy: off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:]) D no slicing for(int i=0;i<off.shape[0]-1;i++) for(int j=0;j<off.shape[1];j++) off[i,j] = (y[i,j]*x[i+1,j]-y[i+1,j]*x[i,j]) / (x[i+1,j]-x[i,j]) D currently possible: off[] = (y[slice(0,$-1),slice(0,$)]*x[slice(1,$),slice(0,$)] - y[slice(1,$),slice(0,$)]*x[slice(0,$-1),slice(0,$)]) / (x[slice(1,$),slice(0,$)] - x[slice(0,$-1),slice(0,$)]); D with syntactic sugar: off[] = (y[0..$-1,0..$]*x[1..$,0..$] - y[1..$,0..$]*x[0..$-1,0..$]) / (x[1..$,0..$] - x[0..$-1,0..$]); D with defaults 0 and $ for missing boundaries: off[] = (y[..$-1,..]*x[1..,..] - y[1..,..]*x[..$-1,..]) / (x[1..,..] - x[..$-1,..]); Admitted, the last case does not work quite as nicely with ".." as it does with Python's ":". Still, the point should be clear.
 I see syntax sugar for slicing as much less important than 
 multidimensional $, where the alternative is really quite ugly. So I 
 pushed hard for opDollar.

Thanks for that. It is indeed essential.
Mar 08 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Norbert Nemec:
 Just note that the slice operator "a..b" 
 should also allow strided slicing "a..b:c"

I have asked for this ages ago :o) But at that time there was no laziness and ranges in Phobos, so that was not very useful.
 Yea, seems like I missed some important decisions. May indeed be too 
 late for some ideas. Still - not giving up quite yet.

D won't be frozen forever, eventually few new things will be added, for example in D3. For example probably some of the limits of CTFE will be removed. Even C language and Fortran keep having changes every ten years or so. What will be hard to do are changes/removals. It's important to tell apart additive changes (like the ones you ask, like using 0 and $ as default bounds when they are missing, or adding a stride) from breaking changes (like replacing .. with a : ). I didn't understand this essential difference until too much late.
 Admitted, the last case does not work quite as nicely with ".." as it 
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why. Bye, bearophile
Mar 08 2010
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
bearophile wrote:
 D won't be frozen forever, eventually few new things will be added, for
example in D3. For example probably some of the limits of CTFE will be removed.
Even C language and Fortran keep having changes every ten years or so.
 What will be hard to do are changes/removals. It's important to tell apart
additive changes (like the ones you ask, like using 0 and $ as default bounds
when they are missing, or adding a stride) from breaking changes (like
replacing .. with a : ). I didn't understand this essential difference until
too much late.

I experienced this crucial difficulty with my very first request, five years ago when I pleaded to replace the names "ireal" and "creal". When no one seemed to catch the deep irony of these names, I learned to accept it as a bit of mathematical humor.
 Admitted, the last case does not work quite as nicely with ".." as it 
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why.

I guess in one dimension, ".." does indeed look more intuitive than ":". It just does not scale up as nicely to higher dimensions. The fundamental issue that I am only gradually beginning to understand is that multidimensional arrays are really much more a niche feature than they seem. Outside of numerics, hardly anybody actually finds them particularly useful.
Mar 08 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Norbert Nemec:
 The fundamental issue that I am only gradually beginning to understand 
 is that multidimensional arrays are really much more a niche feature 
 than they seem. Outside of numerics, hardly anybody actually finds them 
 particularly useful.

I don't agree, 2D arrays are useful in many other situations, for example in games to represent a 2D field of characteristics, etc, etc. But I agree that heavy usage of slices and strides and nD arrays even more complex than your example is common in numerics only: off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:]) D has a chance to become important for the people doing numerical/array processing, so even if this is a niche purpose, it can be an important niche for D. The introduction of the improved $ by Don means that I am not the only one that shares this idea. Bye, bearophile
Mar 09 2010
prev sibling parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 03/08/2010 08:49 PM, bearophile wrote:
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why. Bye, bearophile

Ternary ?:, I suppose.
Mar 08 2010
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why. Bye, bearophile

Ternary ?:, I suppose.

Why not simply split up the ternary a?b:c into a nested expression a?(b:c) The binary ":" could simply be an expression that may only appear in special contexts: on the right side of binary "?" expressions or within indexing expressions.
Mar 09 2010
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2010-03-09 09:47:17 +0100, Norbert Nemec <Norbert Nemec-online.de> said:

 Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why. Bye, bearophile

Ternary ?:, I suppose.

Why not simply split up the ternary a?b:c into a nested expression a?(b:c) The binary ":" could simply be an expression that may only appear in special contexts: on the right side of binary "?" expressions or within indexing expressions.

I don't see why the obvious solution from..to..step would have problems, so that one needs the ":", but maybe I did not think enough about the implications. Anyway I find .. more self descriptive for slices than :. Anyway at the moment there is no syntactic sugar for that. You said that multidimensional arrays are a niche, well t might well be, but still there are some implementations in D. In particular there is my NArray implementation in blip http://dsource.org/projects/blip . Using it your example off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:]) becomes auto off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)])/(x[Range(1,-1)]-x[Range(0,-1)]); not ideal, but not so terrible either. The Range structure negative indexes are from the end, and are inclusive (thus Range(0,-1) means up to the end, the whole range). This removes problems with making $ work correctly. NArrays basically never copy and have a reasonable performance. As they are based on a big flat storage, you can easily write very efficient code, and use already written libraries, for example blas and lapack are used by default for several operations. You can use scope to guarantee collection (well not yet with all compilers it works as it should). All operations can have the target array, so that you can avoid the allocation of temporary arrays. Thus for example auto off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)]); off /= (x[Range(1,-1)]-x[Range(0,-1)]); would already be a bit better (one temporary less). Probably a pool for reusing temporaries would be good, but I did not do it. Also there are no expression templates ( la blitz++), these work well for small expressions, but when you have to follow several read contexts it becomes unclear if it is really a win, so optimizing them can be very complicated. There are optimized versions of commonly used operations. Parallelization has been prepared, but not yet done (it is a rewrite of the ploop iteration away). I think that the library is quite complete from the user point of view, and quite usable... Fawzi
Mar 09 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Fawzi Mohamed:
 I don't see why the obvious solution from..to..step would have 
 problems, so that one needs the ":", but maybe I did not think enough 
 about the implications. Anyway I find .. more self descriptive for 
 slices than :.

If .. is more descriptive for a range of indexes (and I agree, despite it needs two chars instead of one) then to..step is not right, because to..step is not a range. So from..to:step seems more descriptive than from..to..step :-) Bye, bearophile
Mar 09 2010
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Fawzi Mohamed wrote:
 On 2010-03-09 09:47:17 +0100, Norbert Nemec <Norbert Nemec-online.de> said:
 
 Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why. Bye, bearophile

Ternary ?:, I suppose.

Why not simply split up the ternary a?b:c into a nested expression a?(b:c) The binary ":" could simply be an expression that may only appear in special contexts: on the right side of binary "?" expressions or within indexing expressions.

I don't see why the obvious solution from..to..step would have problems, so that one needs the ":", but maybe I did not think enough about the implications. Anyway I find .. more self descriptive for slices than :.

The ".." vs. ":" issue is really more cosmetics. It makes a bit of difference in readability when full slices are used in multiple dimensions. Simply imagine A[..,..,..,..] vs. A[:,:,:,:]
 Anyway at the moment there is no syntactic sugar for that.
 You said that multidimensional arrays are a niche, well t might well be, 
 but still there are some implementations in D.
 In particular there is my NArray implementation in blip 
 http://dsource.org/projects/blip .

Indeed, there is a number of implementations. Will be interesting to do a systematic comparison. Ultimately, I would like to combine the strengths of all these different implementations into one library that has the potential to become standard. Multidimensional arrays are the essence of the interfaces of most numerical libraries. Having several incompatible standards is a major roadblock for acceptance in the numerical community.
 Using it your example
 
     off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])
 
 becomes
 
     auto 
 off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)])/(x[Range(1,-
)]-x[Range(0,-1)]); 
 
 
 not
 ideal, but not so terrible either.

The full slicing indices for the second dimension are missing. You could of course make those implicit, but then what happens if you need that expression to work on swapped dimensions?
 The Range structure negative indexes are from the end, and are inclusive 
 (thus Range(0,-1) means up to the end, the whole range).
 This removes problems with making $ work correctly.

The negative sign solution is nice for scripting languages like Matlab or Python. In a compiled language it leads to unnecessary runtime overhead because every indexing operation needs to be checked for the sign.
 I think that the library is quite complete from the user point of view, 
 and quite usable...

Good to know. I will have a closer look at it.
Mar 09 2010
next sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2010-03-09 12:06:05 +0100, Norbert Nemec <Norbert Nemec-online.de> said:

 Fawzi Mohamed wrote:
 On 2010-03-09 09:47:17 +0100, Norbert Nemec <Norbert Nemec-online.de> said:
 
 Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the better : I'd like to know why. Bye, bearophile

Ternary ?:, I suppose.

Why not simply split up the ternary a?b:c into a nested expression a?(b:c) The binary ":" could simply be an expression that may only appear in special contexts: on the right side of binary "?" expressions or within indexing expressions.

I don't see why the obvious solution from..to..step would have problems, so that one needs the ":", but maybe I did not think enough about the implications. Anyway I find .. more self descriptive for slices than :.

The ".." vs. ":" issue is really more cosmetics. It makes a bit of difference in readability when full slices are used in multiple dimensions. Simply imagine A[..,..,..,..] vs. A[:,:,:,:]

I agree that for the full slice it looks uglier
 Anyway at the moment there is no syntactic sugar for that.
 You said that multidimensional arrays are a niche, well t might well 
 be, but still there are some implementations in D.
 In particular there is my NArray implementation in blip 
 http://dsource.org/projects/blip .

Indeed, there is a number of implementations. Will be interesting to do a systematic comparison. Ultimately, I would like to combine the strengths of all these different implementations into one library that has the potential to become standard. Multidimensional arrays are the essence of the interfaces of most numerical libraries. Having several incompatible standards is a major roadblock for acceptance in the numerical community.

As far as I know for D1.0 and large multidimensional arrays, really usable libraries are just multiarray by braxissimo (that is a kind of abandoned experiment, as he focused on matrixes and vectors), and then my library. Matrix libraries are more (and my library supports dense matrixes/vectors), but really multidimensional libraries, I am not aware of other serious contenders, please share the others...
 Using it your example
 
     off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])
 
 becomes
 
     auto 
 off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)])/(x[Range(1,-1)]-x[Range(0,-1)]);


not ideal,
 
 but not so terrible either.

The full slicing indices for the second dimension are missing. You could of course make those implicit, but then what happens if you need that expression to work on swapped dimensions?

Indeed missing= implicit, to swap dimensions you have to use Range(-1) (the full range from 0 to the last).
 The Range structure negative indexes are from the end, and are 
 inclusive (thus Range(0,-1) means up to the end, the whole range).
 This removes problems with making $ work correctly.

The negative sign solution is nice for scripting languages like Matlab or Python. In a compiled language it leads to unnecessary runtime overhead because every indexing operation needs to be checked for the sign.

Indeed I don't accept negative indexes for indexing, just for Ranges (where the overhead should be acceptable).
 I think that the library is quite complete from the user point of view, 
 and quite usable...

Good to know. I will have a closer look at it.

let me know if you have problems/comment about it (also in the IRC if you want). ciao Fawzi
Mar 09 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Norbert Nemec wrote:
 Multidimensional arrays are the 
 essence of the interfaces of most numerical libraries. Having several 
 incompatible standards is a major roadblock for acceptance in the 
 numerical community.

I'm not so sure about that. I was very enthusiastic about defining an infrastructure of arbitary-dimensional arrays, and did a little research about it. It turns out the applicability of N-dimensional arrays falls off a cliff when N > 3. In turn, this is because high-dimensional space are weird - an N-dimensional space is not just like a 3-dimensional one, only with more dimensions; it's a downright weird beast. Andrei
Mar 09 2010
next sibling parent Norbert Nemec <Norbert Nemec-online.de> writes:
Andrei Alexandrescu wrote:
 Norbert Nemec wrote:
 Multidimensional arrays are the essence of the interfaces of most 
 numerical libraries. Having several incompatible standards is a major 
 roadblock for acceptance in the numerical community.

I'm not so sure about that. I was very enthusiastic about defining an infrastructure of arbitary-dimensional arrays, and did a little research about it. It turns out the applicability of N-dimensional arrays falls off a cliff when N > 3. In turn, this is because high-dimensional space are weird - an N-dimensional space is not just like a 3-dimensional one, only with more dimensions; it's a downright weird beast.

N-dimensional arrays are not tied to N-dimensional spaces. If you view an array as data on a space-grid, then I agree that high dimensionality is awkward. I did work in high-energy physics, where four-dimensional space-time grids are commonly used for calculations. With the cost scaling in the fourth power of the resolution, this is extremely costly. Grid methods for N>4 are indeed rarely used. Array dimensions, however, do not need to refer to space dimension. I, for example, often handle electron spins as additional indices of range two. Or number of particles in the system, and index for the various configurations of the system that need to be handled. There are unlimited possibilities what an index may mean. Fortran 90 limits the number of dimensions to seven and in our current code, we actually were hit by this limit once when adding another dimension would actually have been the cleanest solution. Of course, these are rare cases and probably indicate a questionable design to begin with, but as scientific software grows, one often just wants to add a new feature in the simplest and cleanest way. Having unnecessary limitations can easily get in the way.
Mar 09 2010
prev sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2010-03-09 13:35:36 +0100, Fawzi Mohamed <fawzi gmx.ch> said:

 
 On 9-mar-10, at 13:21, Andrei Alexandrescu wrote:
 
 Norbert Nemec wrote:
 Multidimensional arrays are the essence of the interfaces of most  
 numerical libraries. Having several incompatible standards is a  major 
 roadblock for acceptance in the numerical community.

I'm not so sure about that. I was very enthusiastic about defining an infrastructure of arbitary-dimensional arrays, and did a little research about it. It turns out the applicability of N-dimensional arrays falls off a cliff when N > 3. In turn, this is because high- dimensional space are weird - an N-dimensional space is not just like a 3-dimensional one, only with more dimensions; it's a downright weird beast.

yes and for dense storage, and may operations, the cost is exponential in N, so that one normally doesn't really want to go beyond 3. Still I did need up to 3, and well if you go with dense storage, up to 3 or up to N is more or less the same amount f code if you use templates... But for non dense storage it gets messy really quick because design choices really influence the api that one can use efficiently. Fawzi

By the way on the whole I think that D2.0 improves several things in the language, there are some design choices that I don't share so much (shared, increased use of thread local storage,...), but for sure it will make library implementations of multidimensional array better: * immutability can be put to good use for shape and strides, * maybe a struct implementation might be usable together with a good memory management (I have tried a couple of times to switch to a structure, but the drawbacks were more that the wins in D1.0, especially as you could not ensure deallocation of resources) * the latest additions wrt. operator overloading are nice. Now what is still missing for me is a good compiler for x86_64... :)
Mar 09 2010
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Ellery Newcomer:
 Ternary ?:, I suppose.

But they seem acceptable for AA literals: auto aa = [1: 2, 3: 4]; Bye, bearophile
Mar 09 2010
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 03/09/2010 04:49 AM, bearophile wrote:
 Ellery Newcomer:
 Ternary ?:, I suppose.

But they seem acceptable for AA literals: auto aa = [1: 2, 3: 4]; Bye, bearophile

Yeah, on second thought I don't think ?: poses any such problems.
Mar 09 2010
prev sibling parent reply Don <nospam nospam.com> writes:
Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

You are mistaken <g>. object, TypeInfo, AssociativeArray, ... Complex will be added to that list, too.

Indeed? If that is so, a builtin Slice type as syntactic sugar would be just as good as my proposal.

 Just note that the slice operator "a..b" should also allow strided 

That's a lot less clear. I don't want to touch that issue right now. Anyway, you've convinced me that the int[2] option doesn't work for slice information.
 Agreed. Multidimensional slicing is a costly operation, though, so I 
 think the absence of syntax sugar is tolerable.

Costly? Not at all! Just a bit of integer arithmetic that would need to be done anyway to access the data. Most numerical code consist of many nested loops containing very few operations. Strided multidimensional slicing operations in combination with array expressions allow to express this equivalently in very few lines that can be optimized very effectively.

You're correct about what they do, but I think the conclusion is wrong. Multidimensional slices normally result in appallingly inefficient use of caches. Unless you do something extremely clever. So I think the onus is now on library developers to show that it's possible to create a compelling library which makes efficient and intuitive use of multidimensional slices. I think by the time that task is accomplished (which I imagine may take a couple of years), the language will be ready for a minor update. It's only a minor syntax tweak.
Mar 09 2010
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Don wrote:
 Multidimensional slices normally result in appallingly inefficient use 
 of caches.

Indeed, cache usage is a challenge. My general approach would be fairly conservative: give the user full control over memory layout, but do this as comfortably as possible. Provide good performance for straightforward code but allow the user to tweak the details to improve performance A library function that takes several arrays as input and output should allow arbitrary memory layouts, but it should also specify which memory layout is most efficient. In any case, I think that the expressiveness of multidimensional slices is worth having them even if the performance is not optimal in every case with the first generation of libraries.
Mar 09 2010
parent %fil <fil somewhere.com> writes:
Hi All,

I'm still learning D(2) and was toying around with creating a matrix class and
wanting to overload opSlice for this purpose. As I can only find very limited
documentation on multidimensional arrays in the TDPL book (to my big suprise
given the D should be an attractive language for numerical coding), I stumbled
accross this post looking elsewhere.
What was the result of this discussion in the end? Are there plans to allow
opslice to work for multidimensional arrays or will this be done in Phobos?

Many thanks and sorry to bring us such an old post.

fil


Norbert Nemec Wrote:

 Don wrote:
 Multidimensional slices normally result in appallingly inefficient use 
 of caches.

Indeed, cache usage is a challenge. My general approach would be fairly conservative: give the user full control over memory layout, but do this as comfortably as possible. Provide good performance for straightforward code but allow the user to tweak the details to improve performance A library function that takes several arrays as input and output should allow arbitrary memory layouts, but it should also specify which memory layout is most efficient. In any case, I think that the expressiveness of multidimensional slices is worth having them even if the performance is not optimal in every case with the first generation of libraries.

Jan 18 2011
prev sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 9-mar-10, at 13:21, Andrei Alexandrescu wrote:

 Norbert Nemec wrote:
 Multidimensional arrays are the essence of the interfaces of most  
 numerical libraries. Having several incompatible standards is a  
 major roadblock for acceptance in the numerical community.

I'm not so sure about that. I was very enthusiastic about defining an infrastructure of arbitary-dimensional arrays, and did a little research about it. It turns out the applicability of N-dimensional arrays falls off a cliff when N > 3. In turn, this is because high- dimensional space are weird - an N-dimensional space is not just like a 3-dimensional one, only with more dimensions; it's a downright weird beast.

yes and for dense storage, and may operations, the cost is exponential in N, so that one normally doesn't really want to go beyond 3. Still I did need up to 3, and well if you go with dense storage, up to 3 or up to N is more or less the same amount f code if you use templates... But for non dense storage it gets messy really quick because design choices really influence the api that one can use efficiently. Fawzi
Mar 09 2010