digitalmars.D - Proposal: Multidimensional opSlice solution

Norbert Nemec (50/50) Mar 08 2010 Hi there,

Don (12/23) Mar 08 2010 A solution was suggested while you were away.

Michel Fortin (6/8) Mar 08 2010 And that's sad, because there are many other uses for a Slice struct.
Norbert Nemec (16/41) Mar 08 2010 I know this solution. It is exactly the "syntax sugar" issue that I see

Don (17/63) Mar 08 2010 You are mistaken .

Lars T. Kyllingstad (3/39) Mar 08 2010 It will? For complex literals, then?

Don (3/44) Mar 08 2010 Hmmm. Maybe not. Complex literals will probably just disappear. We don't...

Norbert Nemec (45/71) Mar 08 2010 Indeed? If that is so, a builtin Slice type as syntactic sugar would be

bearophile (8/14) Mar 08 2010 D won't be frozen forever, eventually few new things will be added, for ...

Norbert Nemec (11/18) Mar 08 2010 I experienced this crucial difficulty with my very first request, five

bearophile (7/11) Mar 09 2010 I don't agree, 2D arrays are useful in many other situations, for exampl...

Ellery Newcomer (2/8) Mar 08 2010 Ternary ?:, I suppose.

Norbert Nemec (6/19) Mar 09 2010 Why not simply split up the ternary a?b:c into a nested expression

Fawzi Mohamed (43/65) Mar 09 2010 I don't see why the obvious solution from..to..step would have

bearophile (5/9) Mar 09 2010 If .. is more descriptive for a range of indexes (and I agree, despite i...
Norbert Nemec (21/71) Mar 09 2010 The ".." vs. ":" issue is really more cosmetics. It makes a bit of

Fawzi Mohamed (19/95) Mar 09 2010 As far as I know for D1.0 and large multidimensional arrays, really
Andrei Alexandrescu (8/12) Mar 09 2010 I'm not so sure about that. I was very enthusiastic about defining an

Fawzi Mohamed (8/19) Mar 09 2010 yes and for dense storage, and may operations, the cost is exponential

Fawzi Mohamed (12/37) Mar 09 2010 By the way on the whole I think that D2.0 improves several things in

Norbert Nemec (19/30) Mar 09 2010 N-dimensional arrays are not tied to N-dimensional spaces.

bearophile (5/6) Mar 09 2010 But they seem acceptable for AA literals:

Ellery Newcomer (2/8) Mar 09 2010 Yeah, on second thought I don't think ?: poses any such problems.

Don (13/36) Mar 09 2010 slicing "a..b:c"

Norbert Nemec (11/13) Mar 09 2010 Indeed, cache usage is a challenge. My general approach would be fairly

%fil (6/22) Jan 18 2011 Hi All,

Norbert Nemec <Norbert Nemec-online.de> writes:

Hi there,

in implementing multi-dimensional arrays, the current way of overloading 
the slicing operator does not scale up.

Currently, there is opIndex working for an arbitrary number of indices, 
but opSlice works only for one dimension. Ultimately, it should be 
possible to allow slicing for more than one dimension, and mixing it 
with indexing for e.g.:
	A[4,7..8,7,2..5]
So far, no clean solution for overloading this has been suggested.

Looking at Python, there is a range operator a:b or a:b:c that takes two 
or three integer and returns a 'slice' object. I.e. a builtin type. The 
__getitem__ method (Python's equivalent to opIndex) then simply receives 
either integers or 'slice' object for each dimension. The range operator 
only exists within indexing expressions.

The more D-ish equivalent of this solution is another opXXX method. How 
about the following:

----------------

* The slicing operation obj[] is overloaded by
	obj.opIndex()

* The slicing operation obj[a..b] is overloaded by
	obj.opIndex(obj.opRange(a,b))

* Within an indexing expression on an object 'obj', an expression of the 
form 'a..b' is transformed into a call
	obj.opRange(a,b)
furthermore, an expression of the form 'a..b:c' is transformed into
	obj.opRange(a,b,c)
to where 'c' is the stride.

* Alternatively, template functions of the form
	obj.opRange(int N)(a,b)
	obj.opRange(int N)(a,b,c)
are tried, with N set to the dimension number within the indexing 
expression. It is an error for both opRange(a,b,c) and
opRange(int N)(a,b,c) to exist in the same user defined type.
For example
	obj[a..b,d..e:f]
would become
	obj.opIndex(obj.opRange!(0)(a,b),obj.opRange!(1)(d,e,f))

* The opRange function may return an object of arbitrary type with is 
passed on to opIndex (or opOpIndex or opIndexAssign) to perform the 
actual slicing on the object. Proper overloading of these functions 
allows arbitrary mixing of slicing and indexing within the same expression.

* All op*Slice* operators become obsolete.

----------------

I am not sure whether opRange is the best name, as it clashes with the 
range concept in the libraries. I would have liked to reuse the name 
opSlice which would become unused by this change. However - the 
transition to this completely different meaning would be rather painful. 
Alternative name suggestions are welcome.

Greetings,
Norbert

Mar 08 2010

Don <nospam nospam.com> writes:

Norbert Nemec wrote:
 Hi there,
 
 in implementing multi-dimensional arrays, the current way of overloading 
 the slicing operator does not scale up.
 
 Currently, there is opIndex working for an arbitrary number of indices, 
 but opSlice works only for one dimension. Ultimately, it should be 
 possible to allow slicing for more than one dimension, and mixing it 
 with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

A solution was suggested while you were away.
You don't need a new opRange operator, a simple tuple struct like:
struct Slice(T) { T from; T to; }
in std.object is enough.
Note that:
A[4, Slice(7,8), 7, Slice(2,5)]

will work with the existing compiler. So it's just a tiny syntax sugar 
issue.
And another simple possibility is to turn 7..8 into int[2][7,8].

However, Andrei argued that slicing of multidimensional arrays is so 
rarely used that syntax sugar is not necessary. Thus, it's not in D2.

Mar 08 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-03-08 09:55:53 -0500, Don <nospam nospam.com> said:

 However, Andrei argued that slicing of multidimensional arrays is so 
 rarely used that syntax sugar is not necessary. Thus, it's not in D2.

And that's sad, because there are many other uses for a Slice struct.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 08 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, and 
 mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

 
 A solution was suggested while you were away.
 You don't need a new opRange operator, a simple tuple struct like:
 struct Slice(T) { T from; T to; }
 in std.object is enough.
 Note that:
 A[4, Slice(7,8), 7, Slice(2,5)]
 
 will work with the existing compiler. So it's just a tiny syntax sugar 
 issue.

I know this solution. It is exactly the "syntax sugar" issue that I see 
as the problem here:

The compiler would need to be aware of the data type Slice. It would 
therefore have to be something like a "builtin" type. If I am not 
mistaken, the language definition so far never makes use of "builtin" 
struct types.


 And another simple possibility is to turn 7..8 into int[2][7,8].

That solution would unnecessarily get in to way of implementing indexing 
by an array of indices:

	auto A = MyArray(["zero","one","two","three"]);
	assert( A[ [2,3,0] ] == MyArray(["two","three","zero"]) );


 However, Andrei argued that slicing of multidimensional arrays is so 
 rarely used that syntax sugar is not necessary. Thus, it's not in D2.

Outside of numerics, multidimensional arrays are indeed rarely used. In 
numerical programming, however, they are an essential ingredient. My 
ultimate goal is to multidimensional arrays in D as comfortable as in 
Fortran, Matlab or Python/NumPy in order to make D a real competitor in 
that market.

Mar 08 2010

Don <nospam nospam.com> writes:

Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, and 
 mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

 A solution was suggested while you were away.
 You don't need a new opRange operator, a simple tuple struct like:
 struct Slice(T) { T from; T to; }
 in std.object is enough.
 Note that:
 A[4, Slice(7,8), 7, Slice(2,5)]

 will work with the existing compiler. So it's just a tiny syntax sugar 
 issue.

 
 I know this solution. It is exactly the "syntax sugar" issue that I see 
 as the problem here:
 
 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

You are mistaken <g>.
object, TypeInfo, AssociativeArray, ...
Complex will be added to that list, too.


 And another simple possibility is to turn 7..8 into int[2][7,8].

 
 That solution would unnecessarily get in to way of implementing indexing 
 by an array of indices:
 
     auto A = MyArray(["zero","one","two","three"]);
     assert( A[ [2,3,0] ] == MyArray(["two","three","zero"]) );

But that syntax doesn't allow slices.

It's only an issue if a dimension can be BOTH a slice T..T, AND T[2], 
and they are not equivalent.
I tried to come up with a plausible scenario where it was really a 
problem, but failed. It requires that both T and T[2] are valid indices 
for the same dimension, AND that slicing is supported.

But this is all a bit academic since it's not going to happen. Maybe if 
you'd been around a few months back, things would be different.

 However, Andrei argued that slicing of multidimensional arrays is so 
 rarely used that syntax sugar is not necessary. Thus, it's not in D2.

 
 Outside of numerics, multidimensional arrays are indeed rarely used. In 
 numerical programming, however, they are an essential ingredient. My 
 ultimate goal is to multidimensional arrays in D as comfortable as in 
 Fortran, Matlab or Python/NumPy in order to make D a real competitor in 
 that market.

Agreed. Multidimensional slicing is a costly operation, though, so I 
think the absence of syntax sugar is tolerable.
I see syntax sugar for slicing as much less important than 
multidimensional $, where the alternative is really quite ugly. So I 
pushed hard for opDollar.

Mar 08 2010

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Don wrote:
 Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, and 
 mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

 A solution was suggested while you were away.
 You don't need a new opRange operator, a simple tuple struct like:
 struct Slice(T) { T from; T to; }
 in std.object is enough.
 Note that:
 A[4, Slice(7,8), 7, Slice(2,5)]

 will work with the existing compiler. So it's just a tiny syntax 
 sugar issue.

 I know this solution. It is exactly the "syntax sugar" issue that I 
 see as the problem here:

 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

 
 You are mistaken <g>.
 object, TypeInfo, AssociativeArray, ...
 Complex will be added to that list, too.

It will?  For complex literals, then?

-Lars

Mar 08 2010

Don <nospam nospam.com> writes:

Lars T. Kyllingstad wrote:
 Don wrote:
 Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 Hi there,

 in implementing multi-dimensional arrays, the current way of 
 overloading the slicing operator does not scale up.

 Currently, there is opIndex working for an arbitrary number of 
 indices, but opSlice works only for one dimension. Ultimately, it 
 should be possible to allow slicing for more than one dimension, 
 and mixing it with indexing for e.g.:
     A[4,7..8,7,2..5]
 So far, no clean solution for overloading this has been suggested.

 A solution was suggested while you were away.
 You don't need a new opRange operator, a simple tuple struct like:
 struct Slice(T) { T from; T to; }
 in std.object is enough.
 Note that:
 A[4, Slice(7,8), 7, Slice(2,5)]

 will work with the existing compiler. So it's just a tiny syntax 
 sugar issue.

 I know this solution. It is exactly the "syntax sugar" issue that I 
 see as the problem here:

 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

 You are mistaken <g>.
 object, TypeInfo, AssociativeArray, ...
 Complex will be added to that list, too.

 
 It will?  For complex literals, then?
 
 -Lars

Hmmm. Maybe not. Complex literals will probably just disappear. We don't 
need them any more.

Mar 08 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Don wrote:
 Norbert Nemec wrote:
 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

 
 You are mistaken <g>.
 object, TypeInfo, AssociativeArray, ...
 Complex will be added to that list, too.

Indeed? If that is so, a builtin Slice type as syntactic sugar would be 
just as good as my proposal. Just note that the slice operator "a..b" 
should also allow strided slicing "a..b:c"

     auto A = MyArray(["zero","one","two","three"]);
     assert( A[ [2,3,0] ] == MyArray(["two","three","zero"]) );

 
 But that syntax doesn't allow slices.
 
 It's only an issue if a dimension can be BOTH a slice T..T, AND T[2], 
 and they are not equivalent.
 I tried to come up with a plausible scenario where it was really a 
 problem, but failed. It requires that both T and T[2] are valid indices 
 for the same dimension, AND that slicing is supported.

Which is not only feasible but implemented and widely used in Fortran, 
Matlab and Python/NumPy:

	a[5]
	--> returns a single element

	a[5..8]
	--> returns a slice [ a[5],a[6],a[7] ]

	a[[5,3,1,4,6]]
	--> returns a list of elements selected by the list of indices
	    [a[5],a[3],a[1],a[4],a[6]]

In the last case, the list of indices can of course have just two 
entries, so
	a[[5,10]] == [ a[5],a[10] ]


 But this is all a bit academic since it's not going to happen. Maybe if 
 you'd been around a few months back, things would be different.

Yea, seems like I missed some important decisions. May indeed be too 
late for some ideas. Still - not giving up quite yet.


 Agreed. Multidimensional slicing is a costly operation, though, so I 
 think the absence of syntax sugar is tolerable.

Costly? Not at all! Just a bit of integer arithmetic that would need to 
be done anyway to access the data. Most numerical code consist of many 
nested loops containing very few operations. Strided multidimensional 
slicing operations in combination with array expressions allow to 
express this equivalently in very few lines that can be optimized very 
effectively.

As for the necessity of syntactic sugar, consider one typical line that 
I picked from my own numerical Python code:

Python/NumPy:
   off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])

D no slicing
   for(int i=0;i<off.shape[0]-1;i++)
     for(int j=0;j<off.shape[1];j++)
       off[i,j] = (y[i,j]*x[i+1,j]-y[i+1,j]*x[i,j]) / (x[i+1,j]-x[i,j])

D currently possible:
   off[] = (y[slice(0,$-1),slice(0,$)]*x[slice(1,$),slice(0,$)]
            - y[slice(1,$),slice(0,$)]*x[slice(0,$-1),slice(0,$)]) /
           (x[slice(1,$),slice(0,$)] - x[slice(0,$-1),slice(0,$)]);

D with syntactic sugar:
   off[] = (y[0..$-1,0..$]*x[1..$,0..$] - y[1..$,0..$]*x[0..$-1,0..$]) /
            (x[1..$,0..$] - x[0..$-1,0..$]);

D with defaults 0 and $ for missing boundaries:
   off[] = (y[..$-1,..]*x[1..,..] - y[1..,..]*x[..$-1,..]) /
                       (x[1..,..] - x[..$-1,..]);

Admitted, the last case does not work quite as nicely with ".." as it 
does with Python's ":". Still, the point should be clear.


 I see syntax sugar for slicing as much less important than 
 multidimensional $, where the alternative is really quite ugly. So I 
 pushed hard for opDollar.

Thanks for that. It is indeed essential.

Mar 08 2010

bearophile <bearophileHUGS lycos.com> writes:

Norbert Nemec:
 Just note that the slice operator "a..b" 
 should also allow strided slicing "a..b:c"

I have asked for this ages ago :o) But at that time there was no laziness and
ranges in Phobos, so that was not very useful.


 Yea, seems like I missed some important decisions. May indeed be too 
 late for some ideas. Still - not giving up quite yet.

D won't be frozen forever, eventually few new things will be added, for example
in D3. For example probably some of the limits of CTFE will be removed. Even C
language and Fortran keep having changes every ten years or so.
What will be hard to do are changes/removals. It's important to tell apart
additive changes (like the ones you ask, like using 0 and $ as default bounds
when they are missing, or adding a stride) from breaking changes (like
replacing .. with a : ). I didn't understand this essential difference until
too much late.


 Admitted, the last case does not work quite as nicely with ".." as it 
 does with Python's ":". Still, the point should be clear.

I have never understood why Walter has adopted .. for slices instead of the
better :
I'd like to know why.

Bye,
bearophile

Mar 08 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

bearophile wrote:
 D won't be frozen forever, eventually few new things will be added, for
example in D3. For example probably some of the limits of CTFE will be removed.
Even C language and Fortran keep having changes every ten years or so.
 What will be hard to do are changes/removals. It's important to tell apart
additive changes (like the ones you ask, like using 0 and $ as default bounds
when they are missing, or adding a stride) from breaking changes (like
replacing .. with a : ). I didn't understand this essential difference until
too much late.

I experienced this crucial difficulty with my very first request, five 
years ago when I pleaded to replace the names "ireal" and "creal". When 
no one seemed to catch the deep irony of these names, I learned to 
accept it as a bit of mathematical humor.


 Admitted, the last case does not work quite as nicely with ".." as it 
 does with Python's ":". Still, the point should be clear.

 
 I have never understood why Walter has adopted .. for slices instead of the
better :
 I'd like to know why.

I guess in one dimension, ".." does indeed look more intuitive than ":". 
It just does not scale up as nicely to higher dimensions.

The fundamental issue that I am only gradually beginning to understand 
is that multidimensional arrays are really much more a niche feature 
than they seem. Outside of numerics, hardly anybody actually finds them 
particularly useful.

Mar 08 2010

bearophile <bearophileHUGS lycos.com> writes:

Norbert Nemec:
 The fundamental issue that I am only gradually beginning to understand 
 is that multidimensional arrays are really much more a niche feature 
 than they seem. Outside of numerics, hardly anybody actually finds them 
 particularly useful.

I don't agree, 2D arrays are useful in many other situations, for example in
games to represent a 2D field of characteristics, etc, etc.

But I agree that heavy usage of slices and strides and nD arrays even more
complex than your example is common in numerics only:
off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])
D has a chance to become important for the people doing numerical/array
processing, so even if this is a niche purpose, it can be an important niche
for D. The introduction of the improved $ by Don means that I am not the only
one that shares this idea.

Bye,
bearophile

Mar 09 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/08/2010 08:49 PM, bearophile wrote:
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

 I have never understood why Walter has adopted .. for slices instead of the
better :
 I'd like to know why.

 Bye,
 bearophile

Ternary ?:, I suppose.

Mar 08 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

 I have never understood why Walter has adopted .. for slices instead 
 of the better :
 I'd like to know why.

 Bye,
 bearophile

 
 Ternary ?:, I suppose.

Why not simply split up the ternary a?b:c into a nested expression

	a?(b:c)

The binary ":" could simply be an expression that may only appear in 
special contexts: on the right side of binary "?" expressions or within 
indexing expressions.

Mar 09 2010

Fawzi Mohamed <fmohamed mac.com> writes:

On 2010-03-09 09:47:17 +0100, Norbert Nemec <Norbert Nemec-online.de> said:

 Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

 
 I have never understood why Walter has adopted .. for slices instead of 
 the better :
 I'd like to know why.
 
 Bye,
 bearophile

 
 Ternary ?:, I suppose.

 
 Why not simply split up the ternary a?b:c into a nested expression
 
 	a?(b:c)
 
 The binary ":" could simply be an expression that may only appear in 
 special contexts: on the right side of binary "?" expressions or within 
 indexing expressions.

I don't see why the obvious solution from..to..step would have 
problems, so that one needs the ":", but maybe I did not think enough 
about the implications. Anyway I find .. more self descriptive for 
slices than :.

Anyway at the moment there is no syntactic sugar for that.
You said that multidimensional arrays are a niche, well t might well 
be, but still there are some implementations in D.
In particular there is my NArray implementation in blip 
http://dsource.org/projects/blip .
Using it your example

	off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])

becomes

	auto 
off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)])/(x[Range(1,-1)]-x[Range(0,-1)]);

not 

ideal, but not so terrible either.
The Range structure negative indexes are from the end, and are 
inclusive (thus Range(0,-1) means up to the end, the whole range).
This removes problems with making $ work correctly.
NArrays basically never copy and have a reasonable performance. As they 
are based on a big flat storage, you can easily write very efficient 
code, and use already written libraries, for example blas and lapack 
are used by default for several operations.
You can use scope to guarantee collection (well not yet with all 
compilers it works as it should).
All operations can have the target array, so that you can avoid the 
allocation of temporary arrays.
Thus for example

	auto off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)]);
	off /= (x[Range(1,-1)]-x[Range(0,-1)]);

would already be a bit better (one temporary less).
Probably a pool for reusing temporaries would be good, but I did not do 
it. Also there are no expression templates (� la blitz++), these work 
well for small expressions, but when you have to follow several read 
contexts it becomes unclear if it is really a win, so optimizing them 
can be very complicated. There are optimized versions of commonly used 
operations.
Parallelization has been prepared, but not yet done (it is a rewrite of 
the ploop iteration away).
I think that the library is quite complete from the user point of view, 
and quite usable...

Fawzi

Mar 09 2010

bearophile <bearophileHUGS lycos.com> writes:

Fawzi Mohamed:
 I don't see why the obvious solution from..to..step would have 
 problems, so that one needs the ":", but maybe I did not think enough 
 about the implications. Anyway I find .. more self descriptive for 
 slices than :.

If .. is more descriptive for a range of indexes (and I agree, despite it needs
two chars instead of one) then to..step is not right, because to..step is not a
range.
So from..to:step seems more descriptive than from..to..step :-)

Bye,
bearophile

Mar 09 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Fawzi Mohamed wrote:
 On 2010-03-09 09:47:17 +0100, Norbert Nemec <Norbert Nemec-online.de> said:
 
 Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

 I have never understood why Walter has adopted .. for slices instead 
 of the better :
 I'd like to know why.

 Bye,
 bearophile

 Ternary ?:, I suppose.

 Why not simply split up the ternary a?b:c into a nested expression

     a?(b:c)

 The binary ":" could simply be an expression that may only appear in 
 special contexts: on the right side of binary "?" expressions or 
 within indexing expressions.

 
 I don't see why the obvious solution from..to..step would have problems, 
 so that one needs the ":", but maybe I did not think enough about the 
 implications. Anyway I find .. more self descriptive for slices than :.

The ".." vs. ":" issue is really more cosmetics. It makes a bit of 
difference in readability when full slices are used in multiple 
dimensions. Simply imagine
	A[..,..,..,..]
vs.
	A[:,:,:,:]


 Anyway at the moment there is no syntactic sugar for that.
 You said that multidimensional arrays are a niche, well t might well be, 
 but still there are some implementations in D.
 In particular there is my NArray implementation in blip 
 http://dsource.org/projects/blip .

Indeed, there is a number of implementations. Will be interesting to do 
a systematic comparison. Ultimately, I would like to combine the 
strengths of all these different implementations into one library that 
has the potential to become standard. Multidimensional arrays are the 
essence of the interfaces of most numerical libraries. Having several 
incompatible standards is a major roadblock for acceptance in the 
numerical community.


 Using it your example
 
     off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])
 
 becomes
 
     auto 
 off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)])/(x[Range(1,-
)]-x[Range(0,-1)]); 
 
 
 not
 ideal, but not so terrible either.

The full slicing indices for the second dimension are missing. You could 
of course make those implicit, but then what happens if you need that 
expression to work on swapped dimensions?


 The Range structure negative indexes are from the end, and are inclusive 
 (thus Range(0,-1) means up to the end, the whole range).
 This removes problems with making $ work correctly.

The negative sign solution is nice for scripting languages like Matlab 
or Python. In a compiled language it leads to unnecessary runtime 
overhead because every indexing operation needs to be checked for the sign.


 I think that the library is quite complete from the user point of view, 
 and quite usable...

Good to know. I will have a closer look at it.

Mar 09 2010

Fawzi Mohamed <fmohamed mac.com> writes:

On 2010-03-09 12:06:05 +0100, Norbert Nemec <Norbert Nemec-online.de> said:

 Fawzi Mohamed wrote:
 On 2010-03-09 09:47:17 +0100, Norbert Nemec <Norbert Nemec-online.de> said:
 
 Ellery Newcomer wrote:
 On 03/08/2010 08:49 PM, bearophile wrote:
 
 Admitted, the last case does not work quite as nicely with ".." as it
 does with Python's ":". Still, the point should be clear.

 
 I have never understood why Walter has adopted .. for slices instead of 
 the better :
 I'd like to know why.
 
 Bye,
 bearophile

 
 Ternary ?:, I suppose.

 
 Why not simply split up the ternary a?b:c into a nested expression
 
     a?(b:c)
 
 The binary ":" could simply be an expression that may only appear in 
 special contexts: on the right side of binary "?" expressions or within 
 indexing expressions.

 
 I don't see why the obvious solution from..to..step would have 
 problems, so that one needs the ":", but maybe I did not think enough 
 about the implications. Anyway I find .. more self descriptive for 
 slices than :.

 
 The ".." vs. ":" issue is really more cosmetics. It makes a bit of 
 difference in readability when full slices are used in multiple 
 dimensions. Simply imagine
 	A[..,..,..,..]
 vs.
 	A[:,:,:,:]

I agree that for the full slice it looks uglier

 Anyway at the moment there is no syntactic sugar for that.
 You said that multidimensional arrays are a niche, well t might well 
 be, but still there are some implementations in D.
 In particular there is my NArray implementation in blip 
 http://dsource.org/projects/blip .

 
 Indeed, there is a number of implementations. Will be interesting to do 
 a systematic comparison. Ultimately, I would like to combine the 
 strengths of all these different implementations into one library that 
 has the potential to become standard. Multidimensional arrays are the 
 essence of the interfaces of most numerical libraries. Having several 
 incompatible standards is a major roadblock for acceptance in the 
 numerical community.

As far as I know for D1.0 and large multidimensional arrays, really 
usable libraries are just multiarray by braxissimo (that is a kind of 
abandoned experiment, as he focused on matrixes and vectors), and then 
my library.
Matrix libraries are more (and my library supports dense 
matrixes/vectors), but really multidimensional libraries, I am not 
aware of other serious contenders, please share the others...

 Using it your example
 
     off = (y[:-1,:]*x[1:,:] - y[1:,:]*x[:-1,:]) / (x[1:,:]-x[:-1,:])
 
 becomes
 
     auto 
 off=(y[Range(0,-2)]*x[Range(1,-1)]-y[Range(1,-1)]*x[Range(0,-2)])/(x[Range(1,-1)]-x[Range(0,-1)]);


not
ideal, 
 
 but not so terrible either.

 
 The full slicing indices for the second dimension are missing. You 
 could of course make those implicit, but then what happens if you need 
 that expression to work on swapped dimensions?

Indeed missing= implicit, to swap dimensions you have to use Range(-1) 
(the full range from 0 to the last).

 The Range structure negative indexes are from the end, and are 
 inclusive (thus Range(0,-1) means up to the end, the whole range).
 This removes problems with making $ work correctly.

 
 The negative sign solution is nice for scripting languages like Matlab 
 or Python. In a compiled language it leads to unnecessary runtime 
 overhead because every indexing operation needs to be checked for the 
 sign.

Indeed I don't accept negative indexes for indexing, just for Ranges 
(where the overhead should be acceptable).

 I think that the library is quite complete from the user point of view, 
 and quite usable...

 
 Good to know. I will have a closer look at it.

let me know if you have problems/comment about it (also in the IRC if 
you want).

ciao
Fawzi

Mar 09 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Norbert Nemec wrote:
 Multidimensional arrays are the 
 essence of the interfaces of most numerical libraries. Having several 
 incompatible standards is a major roadblock for acceptance in the 
 numerical community.

I'm not so sure about that. I was very enthusiastic about defining an 
infrastructure of arbitary-dimensional arrays, and did a little research 
about it. It turns out the applicability of N-dimensional arrays falls 
off a cliff when N > 3. In turn, this is because high-dimensional space 
are weird - an N-dimensional space is not just like a 3-dimensional one, 
only with more dimensions; it's a downright weird beast.

Andrei

Mar 09 2010

Fawzi Mohamed <fawzi gmx.ch> writes:

On 9-mar-10, at 13:21, Andrei Alexandrescu wrote:

 Norbert Nemec wrote:
 Multidimensional arrays are the essence of the interfaces of most  
 numerical libraries. Having several incompatible standards is a  
 major roadblock for acceptance in the numerical community.

 I'm not so sure about that. I was very enthusiastic about defining  
 an infrastructure of arbitary-dimensional arrays, and did a little  
 research about it. It turns out the applicability of N-dimensional  
 arrays falls off a cliff when N > 3. In turn, this is because high- 
 dimensional space are weird - an N-dimensional space is not just  
 like a 3-dimensional one, only with more dimensions; it's a  
 downright weird beast.

yes and for dense storage, and may operations, the cost is exponential  
in N, so that one normally doesn't really want to go beyond 3. Still I  
did need up to 3, and well if you go with dense storage, up to 3 or up  
to N is more or less the same amount f code if you use templates...

But for non dense storage it gets messy really quick because design  
choices really influence the api that one can use efficiently.

Fawzi

Mar 09 2010

Fawzi Mohamed <fmohamed mac.com> writes:

On 2010-03-09 13:35:36 +0100, Fawzi Mohamed <fawzi gmx.ch> said:

 
 On 9-mar-10, at 13:21, Andrei Alexandrescu wrote:
 
 Norbert Nemec wrote:
 Multidimensional arrays are the essence of the interfaces of most  
 numerical libraries. Having several incompatible standards is a  major 
 roadblock for acceptance in the numerical community.

 
 I'm not so sure about that. I was very enthusiastic about defining  an 
 infrastructure of arbitary-dimensional arrays, and did a little  
 research about it. It turns out the applicability of N-dimensional  
 arrays falls off a cliff when N > 3. In turn, this is because high- 
 dimensional space are weird - an N-dimensional space is not just  like 
 a 3-dimensional one, only with more dimensions; it's a  downright weird 
 beast.

 
 yes and for dense storage, and may operations, the cost is exponential  
 in N, so that one normally doesn't really want to go beyond 3. Still I  
 did need up to 3, and well if you go with dense storage, up to 3 or up  
 to N is more or less the same amount f code if you use templates...
 
 But for non dense storage it gets messy really quick because design  
 choices really influence the api that one can use efficiently.
 
 Fawzi

By the way on the whole I think that D2.0 improves several things in 
the language, there are some design choices that I don't share so much 
(shared, increased use of thread local storage,...), but for sure it 
will make library implementations of multidimensional array better:
* immutability can be put to good use for shape and strides,
* maybe a struct implementation might be usable together with a good 
memory management (I have tried a couple of times to switch to a 
structure, but the drawbacks were more that the wins in D1.0, 
especially as you could not ensure deallocation of resources)
* the latest additions wrt. operator overloading are nice.

Now what is still missing for me is a good compiler for x86_64... :)

Mar 09 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Andrei Alexandrescu wrote:
 Norbert Nemec wrote:
 Multidimensional arrays are the essence of the interfaces of most 
 numerical libraries. Having several incompatible standards is a major 
 roadblock for acceptance in the numerical community.

 
 I'm not so sure about that. I was very enthusiastic about defining an 
 infrastructure of arbitary-dimensional arrays, and did a little research 
 about it. It turns out the applicability of N-dimensional arrays falls 
 off a cliff when N > 3. In turn, this is because high-dimensional space 
 are weird - an N-dimensional space is not just like a 3-dimensional one, 
 only with more dimensions; it's a downright weird beast.

N-dimensional arrays are not tied to N-dimensional spaces.

If you view an array as data on a space-grid, then I agree that high 
dimensionality is awkward. I did work in high-energy physics, where 
four-dimensional space-time grids are commonly used for calculations. 
With the cost scaling in the fourth power of the resolution, this is 
extremely costly. Grid methods for N>4 are indeed rarely used.

Array dimensions, however, do not need to refer to space dimension. I, 
for example, often handle electron spins as additional indices of range 
two. Or number of particles in the system, and index for the various 
configurations of the system that need to be handled. There are 
unlimited possibilities what an index may mean.

Fortran 90 limits the number of dimensions to seven and in our current 
code, we actually were hit by this limit once when adding another 
dimension would actually have been the cleanest solution.

Of course, these are rare cases and probably indicate a questionable 
design to begin with, but as scientific software grows, one often just 
wants to add a new feature in the simplest and cleanest way. Having 
unnecessary limitations can easily get in the way.

Mar 09 2010

bearophile <bearophileHUGS lycos.com> writes:

Ellery Newcomer:
 Ternary ?:, I suppose.

But they seem acceptable for AA literals:
auto aa = [1: 2, 3: 4];

Bye,
bearophile

Mar 09 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 03/09/2010 04:49 AM, bearophile wrote:
 Ellery Newcomer:
 Ternary ?:, I suppose.

 But they seem acceptable for AA literals:
 auto aa = [1: 2, 3: 4];

 Bye,
 bearophile

Yeah, on second thought I don't think ?: poses any such problems.

Mar 09 2010

Don <nospam nospam.com> writes:

Norbert Nemec wrote:
 Don wrote:
 Norbert Nemec wrote:
 The compiler would need to be aware of the data type Slice. It would 
 therefore have to be something like a "builtin" type. If I am not 
 mistaken, the language definition so far never makes use of "builtin" 
 struct types.

 You are mistaken <g>.
 object, TypeInfo, AssociativeArray, ...
 Complex will be added to that list, too.

 
 Indeed? If that is so, a builtin Slice type as syntactic sugar would be 
 just as good as my proposal. 

 Just note that the slice operator "a..b" should also allow strided 

slicing "a..b:c"
That's a lot less clear. I don't want to touch that issue right now.

Anyway, you've convinced me that the int[2] option doesn't work for 
slice information.


 Agreed. Multidimensional slicing is a costly operation, though, so I 
 think the absence of syntax sugar is tolerable.

 
 Costly? Not at all! Just a bit of integer arithmetic that would need to 
 be done anyway to access the data. Most numerical code consist of many 
 nested loops containing very few operations. Strided multidimensional 
 slicing operations in combination with array expressions allow to 
 express this equivalently in very few lines that can be optimized very 
 effectively.

You're correct about what they do, but I think the conclusion is wrong. 
Multidimensional slices normally result in appallingly inefficient use 
of caches. Unless you do something extremely clever.
So I think the onus is now on library developers to show that it's 
possible to create a compelling library which makes efficient and 
intuitive use of multidimensional slices. I think by the time that task 
is accomplished (which I imagine may take a couple of years), the 
language will be ready for a minor update. It's only a minor syntax tweak.

Mar 09 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Don wrote:
 Multidimensional slices normally result in appallingly inefficient use 
 of caches.

Indeed, cache usage is a challenge. My general approach would be fairly 
conservative: give the user full control over memory layout, but do this 
as comfortably as possible. Provide good performance for straightforward 
code but allow the user to tweak the details to improve performance

A library function that takes several arrays as input and output should 
allow arbitrary memory layouts, but it should also specify which memory 
layout is most efficient.

In any case, I think that the expressiveness of multidimensional slices 
is worth having them even if the performance is not optimal in every 
case with the first generation of libraries.

Mar 09 2010

%fil <fil somewhere.com> writes:

Hi All,

I'm still learning D(2) and was toying around with creating a matrix class and
wanting to overload opSlice for this purpose. As I can only find very limited
documentation on multidimensional arrays in the TDPL book (to my big suprise
given the D should be an attractive language for numerical coding), I stumbled
accross this post looking elsewhere.
What was the result of this discussion in the end? Are there plans to allow
opslice to work for multidimensional arrays or will this be done in Phobos?

Many thanks and sorry to bring us such an old post.

fil


Norbert Nemec Wrote:

 Don wrote:
 Multidimensional slices normally result in appallingly inefficient use 
 of caches.

 
 Indeed, cache usage is a challenge. My general approach would be fairly 
 conservative: give the user full control over memory layout, but do this 
 as comfortably as possible. Provide good performance for straightforward 
 code but allow the user to tweak the details to improve performance
 
 A library function that takes several arrays as input and output should 
 allow arbitrary memory layouts, but it should also specify which memory 
 layout is most efficient.
 
 In any case, I think that the expressiveness of multidimensional slices 
 is worth having them even if the performance is not optimal in every 
 case with the first generation of libraries.

Jan 18 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Proposal: Multidimensional opSlice solution