digitalmars.D - Array Operations: a[] + b[] etc.

John Colvin (86/86) Nov 21 2012 First things first: I'm not actually sure what the current spec

Walter Bright (3/6) Nov 21 2012 This is not done because it puts excessive pressure on the garbage colle...

David Nadlinger (5/12) Nov 21 2012 We really need better error messages for this, though – Andrej?

Andrej Mitrovic (3/5) Nov 21 2012 Considering how slow pulls are being merged.. I don't think it's worth

John Colvin (4/11) Nov 21 2012 Fair enough. When you say excessive pressure, is that performance

Andrei Alexandrescu (4/15) Nov 21 2012 Performance pressure - the design here is rather easy if efficiency is

John Colvin (5/25) Nov 21 2012 In what way does it become a performance problem? Apologies for

Walter Bright (7/8) Nov 21 2012 Allocating memory is always much, much slower than not allocating memory...

John Colvin (24/35) Nov 22 2012 Well yes, of course, I thought you meant something more esoteric.

monarch_dodra (3/7) Nov 22 2012 I'd say the same as for "a[] += b[];": an assertion error.
Walter Bright (2/5) Nov 22 2012 An error.

John Colvin (6/13) Nov 22 2012 Is monarch_dodra correct in saying that one would have to compile

Walter Bright (4/9) Nov 22 2012 I'd have to look at the specific code to see. In any case, it is an erro...

monarch_dodra (8/21) Nov 22 2012 I originally opened this some time back, related to opSlice

Walter Bright (2/8) Nov 23 2012 Thank you.

Walter Bright (6/17) Nov 22 2012 I'll be bold and predict what will happen if this proposal is implemente...

Dmitry Olshansky (15/20) Nov 22 2012 Expending on it and adding more serious reasoning.

John Colvin (39/43) Nov 22 2012 I disagree that array ops are only for speed.

Robert Jacques (10/50) Nov 22 2012 While I think implicit allocation is a good idea in the case of variable...
Dmitry Olshansky (33/73) Nov 23 2012 Mathematical sense doesn't take into account that arrays occupy memory

Walter Bright (8/13) Nov 23 2012 As an example, bearophile is an experienced programmer. He just posted t...

Era Scarecrow (46/53) Nov 23 2012 But if they wanted it anyways, could implement it as a struct...

Mike Wey (6/27) Nov 21 2012 If you want to use this syntax with images, DMagick's ImageView might be...

John Colvin (6/9) Nov 22 2012 I like it :)

Robert Jacques (9/17) Nov 22 2012 Yes and no. Basically, like an array, an ImageView is a thick pointer an...
Mike Wey (5/13) Nov 22 2012 Every dimension has it's own type, so it won't scale well to a lot of

"John Colvin" <john.loughran.colvin gmail.com> writes:

First things first: I'm not actually sure what the current spec 
for this is,
http://dlang.org/arrays.html is not the clearest on the subject 
and seems to rule out a lot of things that I reckon should work.

For this post I'm going to use the latest dmd from github. 
Behaviour is sometimes quite different for different versions of 
dmd, let alone gdc or ldc.

e.g.

int[] a = [1,2,3,4];
int[] b = [6,7,8,9];
int[] c;
int[] d = [10];
int[] e = [0,0,0,0];

a[] += b[];       // result [7, 9, 11, 13], as expected.

c = a[] + b[];    // Error: Array operation a[] + b[] not 
implemented.

c[] = a[] + b[];  // result [], is run-time assert on some 
compiler(s)/versions
d[] = a[] + b[]   // result [7], also a rt assert for some 
compiler(s)/versions


My vision of how things could work:
c = a[] opBinary b[];
should be legal. It should create a new array that is then 
reference assigned to c.

d[] = a[] opBinary b[];
should be d[i] = a[i] + b[i] for all i in 0..length.
What should the length be? Do we silently truncate to the 
shortest array or do we run-time assert (like ldc2 does, and so 
did dmd for a while between 2.060 and now). Currently dmd (and 
gdc) does neither of these reliably, e.g.
d[] = a[] + b[] results in [7],
a[] = d[] + b[] results in [16, 32747, 38805832, 67108873]

Another nice things to be able to do that i miss from working in 
IDL, I'm not sure how they'd be possible in D though:
given a multidimensional array I should be able to slice and 
index along any axis.
for example:
int[4][3] m = [[0,1,2,3],
                [4,5,6,7],
                [8,9,10,11]];
I can index vertically, i.e. m[1] == [4,5,6,7], but there's no 
syntactic sugar for indexing horizontally. Obviously m[][2] just 
gives me the 3rd row, so what could be a nice concise statement 
suddenly requires a manually written loop that the compiler has 
to work it's way through, extracting the meaning (see Walter on 
this, here: http://www.drdobbs.com/loop-optimizations/229300270)

A possible approach, heavily tried and tested in numpy and IDL: 
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
http://www.atmos.umd.edu/~gcm/usefuldocs/IDL.html#operations

Use multiple indices within the brackets.
     m[1,2] would be identical to m[1][2], returning 6
     m[0..2,3] would return [3,7]
     m[,2] would give me [2,6,10]
     Alternative syntax could be m[*,2], m[:,2] or we could even 
require m[0..$,2], I don't know how much of a technical challenge 
each of these would be for parsing and lexing.

//An example, lets imagine a greyscale image, stored as an array 
of pixel rows:

double[][] img = read_bmp(fn,"grey");

//we want to crop it to some user defined co-ords (x1,y1),(x2,y2):

//Version A, current syntax

auto img_cropped = img[y1..y2].dup;
foreach(ref row; img_cropped) {
     row = row[x1..x2];
}
//3 lines of code for a very simple idea.

//Version B, new syntax

auto img_cropped = img[y1..y2, x1..x2];

//Very simple, easy to read code that is clear in it's purpose.

I propose that Version B would be equivalent to A: An independent 
window on the data. Any reassignment of a row (i.e. pointing it 
to somewhere else, not copying new data in) will have no effect 
on the data. This scales naturally to higher dimensions and is in 
agreement with the normal slicing rules: the slice itself is 
independent of the original, but the data inside is shared.

I believe this would be a significant improvement to D, 
particularly for image processing and scientific applications.

P.S.
As you can probably tell, I have no experience in compiler 
design! I may be missing something that makes all of this 
impossible/impractical. I also don't think this would have to 
cause any code breakage at all, but again, I could be wrong.

P.P.S.
I think there many be something quite wrong with how the frontend 
understands current array expression syntax... see here: 
http://dpaste.dzfl.pl/f4a931db

Nov 21 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/21/2012 10:02 AM, John Colvin wrote:
 My vision of how things could work:
 c = a[] opBinary b[];
 should be legal. It should create a new array that is then reference assigned
to c.

This is not done because it puts excessive pressure on the garbage collector. 
Array ops do not allocate memory by design.

Nov 21 2012

"David Nadlinger" <see klickverbot.at> writes:

On Wednesday, 21 November 2012 at 18:15:51 UTC, Walter Bright 
wrote:
 On 11/21/2012 10:02 AM, John Colvin wrote:
 My vision of how things could work:
 c = a[] opBinary b[];
 should be legal. It should create a new array that is then 
 reference assigned to c.

 This is not done because it puts excessive pressure on the 
 garbage collector. Array ops do not allocate memory by design.

We really need better error messages for this, though – Andrej? 
;)

David

Nov 21 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 11/21/12, David Nadlinger <see klickverbot.at> wrote:
 We really need better error messages for this, though =96 Andrej?
 ;)

Considering how slow pulls are being merged.. I don't think it's worth
my time hacking on dmd. Anyway I have other things I'm working on now.

Nov 21 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 21 November 2012 at 18:15:51 UTC, Walter Bright 
wrote:
 On 11/21/2012 10:02 AM, John Colvin wrote:
 My vision of how things could work:
 c = a[] opBinary b[];
 should be legal. It should create a new array that is then 
 reference assigned to c.

 This is not done because it puts excessive pressure on the 
 garbage collector. Array ops do not allocate memory by design.

Fair enough. When you say excessive pressure, is that performance 
pressure or design pressure?

Nov 21 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/21/12 1:20 PM, John Colvin wrote:
 On Wednesday, 21 November 2012 at 18:15:51 UTC, Walter Bright wrote:
 On 11/21/2012 10:02 AM, John Colvin wrote:
 My vision of how things could work:
 c = a[] opBinary b[];
 should be legal. It should create a new array that is then reference
 assigned to c.

 This is not done because it puts excessive pressure on the garbage
 collector. Array ops do not allocate memory by design.

 Fair enough. When you say excessive pressure, is that performance
 pressure or design pressure?

Performance pressure - the design here is rather easy if efficiency is 
not a concern.

Andrei

Nov 21 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 21 November 2012 at 18:38:59 UTC, Andrei 
Alexandrescu wrote:
 On 11/21/12 1:20 PM, John Colvin wrote:
 On Wednesday, 21 November 2012 at 18:15:51 UTC, Walter Bright 
 wrote:
 On 11/21/2012 10:02 AM, John Colvin wrote:
 My vision of how things could work:
 c = a[] opBinary b[];
 should be legal. It should create a new array that is then 
 reference
 assigned to c.

 This is not done because it puts excessive pressure on the 
 garbage
 collector. Array ops do not allocate memory by design.

 Fair enough. When you say excessive pressure, is that 
 performance
 pressure or design pressure?

 Performance pressure - the design here is rather easy if 
 efficiency is not a concern.

 Andrei

In what way does it become a performance problem? Apologies for 
the naive questions, I have nothing more than a passing 
understanding of how garbage collection works.

Nov 21 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/21/2012 10:41 AM, John Colvin wrote:
 In what way does it become a performance problem?

Allocating memory is always much, much slower than not allocating memory.

A design that forces allocating new memory and discarding the old as opposed to 
reusing existing already allocated memory is going to be far slower. In fact, 
the allocation/deallocation is going to dominate the performance timings, not 
the array operation itself.

Generally, people who use array operations want them to be really fast.

Nov 21 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 21 November 2012 at 20:16:59 UTC, Walter Bright 
wrote:
 On 11/21/2012 10:41 AM, John Colvin wrote:
 In what way does it become a performance problem?

 Allocating memory is always much, much slower than not 
 allocating memory.

 A design that forces allocating new memory and discarding the 
 old as opposed to reusing existing already allocated memory is 
 going to be far slower. In fact, the allocation/deallocation is 
 going to dominate the performance timings, not the array 
 operation itself.

 Generally, people who use array operations want them to be 
 really fast.

Well yes, of course, I thought you meant something more esoteric. 
I'm not suggesting that we replace the current design, simply 
extend it.

We'd have:

c[] = a[] + b[];
fast, in place array operation, the cost of allocation happens 
earlier in the code.

but also
c = a[] + b[];
a much slower, memory assigning array operation, pretty much just 
shorthand for
c = new T[a.length];
c[] = a[] + b[];

You could argue that hiding an allocation is bad, but I would 
think it's quite obvious to any programmer that if you add 2 
arrays together, you're going to have to create somewhere to put 
them... Having the shorthand prevents any possible mistakes with 
the length of the new array and saves a line of code.

Anyway, this is a pretty trivial matter, I'd be more interested 
in seeing a definitive answer for what the correct behaviour for 
the statement a[] = b[] + c[] is when the arrays have different 
lengths.

Nov 22 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 22 November 2012 at 11:25:31 UTC, John Colvin wrote:
 Anyway, this is a pretty trivial matter, I'd be more interested 
 in seeing a definitive answer for what the correct behaviour 
 for the statement a[] = b[] + c[] is when the arrays have 
 different lengths.

I'd say the same as for "a[] += b[];": an assertion error.

You have to compile druntime in non-release to see it though.

Nov 22 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/22/2012 3:25 AM, John Colvin wrote:
 Anyway, this is a pretty trivial matter, I'd be more interested in seeing a
 definitive answer for what the correct behaviour for the statement a[] = b[] +
 c[] is when the arrays have different lengths.

An error.

Nov 22 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 22 November 2012 at 20:58:25 UTC, Walter Bright 
wrote:
 On 11/22/2012 3:25 AM, John Colvin wrote:
 Anyway, this is a pretty trivial matter, I'd be more 
 interested in seeing a
 definitive answer for what the correct behaviour for the 
 statement a[] = b[] +
 c[] is when the arrays have different lengths.

 An error.

Is monarch_dodra correct in saying that one would have to compile 
druntime in non-release to see this error? That would be a pity, 
couldn't this be implemented somehow so that it would depend on 
the user code being compiled non-release, not druntime?

Nov 22 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/22/2012 6:11 PM, John Colvin wrote:
 An error.

 Is monarch_dodra correct in saying that one would have to compile druntime in
 non-release to see this error? That would be a pity, couldn't this be
 implemented somehow so that it would depend on the user code being compiled
 non-release, not druntime?

I'd have to look at the specific code to see. In any case, it is an error. It 
takes a runtime check to do it, which can be turned on or off with the 
-noboundscheck switch.

Nov 22 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 23 November 2012 at 06:41:06 UTC, Walter Bright wrote:
 On 11/22/2012 6:11 PM, John Colvin wrote:
 An error.

 Is monarch_dodra correct in saying that one would have to 
 compile druntime in
 non-release to see this error? That would be a pity, couldn't 
 this be
 implemented somehow so that it would depend on the user code 
 being compiled
 non-release, not druntime?

 I'd have to look at the specific code to see. In any case, it 
 is an error. It takes a runtime check to do it, which can be 
 turned on or off with the -noboundscheck switch.

I originally opened this some time back, related to opSlice 
operations not error-ing:
http://d.puremagic.com/issues/show_bug.cgi?id=8650

I've since learned to build druntime as non-release, which 
"fixes" the problem.

I don't know if you plan to change anything about this, but just 
wanted to point out there's an Bugzilla entry for it.

Nov 22 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/22/2012 10:49 PM, monarch_dodra wrote:
 I originally opened this some time back, related to opSlice operations not
 error-ing:
 http://d.puremagic.com/issues/show_bug.cgi?id=8650

 I've since learned to build druntime as non-release, which "fixes" the problem.

 I don't know if you plan to change anything about this, but just wanted to
point
 out there's an Bugzilla entry for it.

Thank you.

Nov 23 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/22/2012 3:25 AM, John Colvin wrote:
 c[] = a[] + b[];
 fast, in place array operation, the cost of allocation happens earlier in the
code.

 but also
 c = a[] + b[];
 a much slower, memory assigning array operation, pretty much just shorthand for
 c = new T[a.length];
 c[] = a[] + b[];

 You could argue that hiding an allocation is bad, but I would think it's quite
 obvious to any programmer that if you add 2 arrays together, you're going to
 have to create somewhere to put them... Having the shorthand prevents any
 possible mistakes with the length of the new array and saves a line of code.

I'll be bold and predict what will happen if this proposal is implemented:

     "Array operations in D are cool but are incredibly slow. D sux."

Few will notice that the hidden memory allocation can be easily removed, 
certainly not people casually looking at D to see if they should use it, and
the 
damage will be done.

Nov 22 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/23/2012 1:02 AM, Walter Bright пишет:

 I'll be bold and predict what will happen if this proposal is implemented:

      "Array operations in D are cool but are incredibly slow. D sux."

 Few will notice that the hidden memory allocation can be easily removed,
 certainly not people casually looking at D to see if they should use it,
 and the damage will be done.

Expending on it and adding more serious reasoning.

Array ops supposed to be overhead-free loops transparently leveraging 
SIMD parallelism of modern CPUs. No more and no less. It's like 
auto-vectorization but it's guaranteed and obvious in the form.

Now if array ops did the checking for matching lengths it would slow 
them down. And that's something you can't turn off when you know the 
lengths match as it's a built-in. Ditto for checking if the left side is 
already allocated and allocating if not (but it's even worse).

Basically you can't make the fastest primitive on something wrapped in 
safeguards. Doing the other way around is easy, for example via defining 
special wrapper type with custom opSlice, opSliceAssign etc..
that will do the checks.


-- 
Dmitry Olshansky

Nov 22 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 22 November 2012 at 21:37:19 UTC, Dmitry Olshansky 
wrote:
 Array ops supposed to be overhead-free loops transparently 
 leveraging SIMD parallelism of modern CPUs. No more and no 
 less. It's like auto-vectorization but it's guaranteed and 
 obvious in the form.

I disagree that array ops are only for speed.
I would argue that their primary significance lies in their 
ability to make code significantly more readable, and more 
importantly, writeable. For example, the vector distance between 
2 position vectors can be written as:
dv[] = v2[] - v1[]
or
dv = v2[] - v1[]
anyone with an understanding of mathematical vectors instantly 
understands the general intent of the code.
With documentation something vaguely like this:
"An array is a reference to a chunk of memory that contains a 
list of data, all of the same type. v[] means the set of elements 
in the array, while v on it's own refers to just the reference. 
Operations on sets of elements e.g. dv[] = v2[] - v1[] work 
element-wise along the arrays {insert mathematical notation and 
picture of 3 arrays as columns next to each other etc.}.
Array operations can be very fast, as they are sometimes lowered 
directly to cpu vector instructions. However, be aware of 
situations where a new array has to be created implicitly, e.g. 
dv = v2[] - v1[]; Let's look at what this really means: we are 
asking for dv to be set to refer to the vector difference between 
v2 and v1. Note we said nothing about the current elements of dv, 
it might not even have any! This means we need to put the result 
of v2[] - v1] in a new chunk of memory, which we then set dv to 
refer to. Allocating new memory takes time, potentially taking a 
lot longer than the array operation itself, so if you can, avoid 
it!",
anyone with the most basic programming and mathematical knowledge 
can write concise code operating on arrays, taking advantage of 
the potential speedups while being aware of the pitfalls.

In short:
Vector syntax/array ops is/are great. Concise code that's easy to 
read and write. They fulfill one of the guiding principles of D: 
the most obvious code is fast and safe (or if not 100% safe, at 
least not too error-prone).
More vector syntax capabilities please!

Nov 22 2012

"Robert Jacques" <rjacque2 live.johnshopkins.edu> writes:

On Thu, 22 Nov 2012 20:06:44 -0600, John Colvin  
<john.loughran.colvin gmail.com> wrote:
 On Thursday, 22 November 2012 at 21:37:19 UTC, Dmitry Olshansky wrote:
 Array ops supposed to be overhead-free loops transparently leveraging  
 SIMD parallelism of modern CPUs. No more and no less. It's like  
 auto-vectorization but it's guaranteed and obvious in the form.

 I disagree that array ops are only for speed.
 I would argue that their primary significance lies in their ability to  
 make code significantly more readable, and more importantly, writeable.  
 For example, the vector distance between 2 position vectors can be  
 written as:
 dv[] = v2[] - v1[]
 or
 dv = v2[] - v1[]
 anyone with an understanding of mathematical vectors instantly  
 understands the general intent of the code.
 With documentation something vaguely like this:
 "An array is a reference to a chunk of memory that contains a list of  
 data, all of the same type. v[] means the set of elements in the array,  
 while v on it's own refers to just the reference. Operations on sets of  
 elements e.g. dv[] = v2[] - v1[] work element-wise along the arrays  
 {insert mathematical notation and picture of 3 arrays as columns next to  
 each other etc.}.
 Array operations can be very fast, as they are sometimes lowered  
 directly to cpu vector instructions. However, be aware of situations  
 where a new array has to be created implicitly, e.g. dv = v2[] - v1[];  
 Let's look at what this really means: we are asking for dv to be set to  
 refer to the vector difference between v2 and v1. Note we said nothing  
 about the current elements of dv, it might not even have any! This means  
 we need to put the result of v2[] - v1] in a new chunk of memory, which  
 we then set dv to refer to. Allocating new memory takes time,  
 potentially taking a lot longer than the array operation itself, so if  
 you can, avoid it!",
 anyone with the most basic programming and mathematical knowledge can  
 write concise code operating on arrays, taking advantage of the  
 potential speedups while being aware of the pitfalls.

 In short:
 Vector syntax/array ops is/are great. Concise code that's easy to read  
 and write. They fulfill one of the guiding principles of D: the most  
 obvious code is fast and safe (or if not 100% safe, at least not too  
 error-prone).
 More vector syntax capabilities please!

While I think implicit allocation is a good idea in the case of variable  
initialization, i.e.:

auto dv = v2[] - v1[];

however, as a general statement, i.e. dv = v2[] - v1[];, it could just as  
easily be a typo and result in a silent and hard to find performance bug.

// An alternative syntax for variable initialization by an array operation  
expression:
auto dv[] = v2[] - v1[];

Nov 22 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/23/2012 6:06 AM, John Colvin пишет:
 On Thursday, 22 November 2012 at 21:37:19 UTC, Dmitry Olshansky wrote:
 Array ops supposed to be overhead-free loops transparently leveraging
 SIMD parallelism of modern CPUs. No more and no less. It's like
 auto-vectorization but it's guaranteed and obvious in the form.


 I disagree that array ops are only for speed.

Well that and intuitive syntax.

 I would argue that their primary significance lies in their ability to
 make code significantly more readable, and more importantly, writeable.
 For example, the vector distance between 2 position vectors can be
 written as:
 dv[] = v2[] - v1[]
 or
 dv = v2[] - v1[]
 anyone with an understanding of mathematical vectors instantly
 understands the general intent of the code.

Mathematical sense doesn't take into account that arrays occupy memory 
and generally the cost of operations.
Also :
dv = v2 - v1
Is plenty as obvious, thus structs + operator overloading covers the 
usability department of this problem. Operating on raw arrays directly 
as N-dimensional vectors is fine but hardly helps 
maintainability/readability as the program grows over time.

 With documentation something vaguely like this:
 "An array is a reference to a chunk of memory that contains a list of
 data, all of the same type. v[] means the set of elements in the array,
 while v on it's own refers to just the reference. Operations on sets of
 elements e.g. dv[] = v2[] - v1[] work element-wise along the arrays
 {insert mathematical notation and picture of 3 arrays as columns next to
 each other etc.}.

....
So far so good, but I'd rather not use 'list' to define array nor the 
'set' of elements. Semantically v[] means the slice of the whole array - 
nothing more and nothing less.

 Array operations can be very fast, as they are sometimes lowered
 directly to cpu vector instructions. However, be aware of situations
 where a new array has to be created implicitly, e.g. dv = v2[] - v1[];
 Let's look at what this really means: we are asking for dv to be set to
 refer to the vector difference between v2 and v1. Note we said nothing
 about the current elements of dv, it might not even have any! This means
 we need to put the result of v2[] - v1] in a new chunk of memory, which
 we then set dv to refer to. Allocating new memory takes time,
 potentially taking a lot longer than the array operation itself, so if
 you can, avoid it!",

IMHO I'd shot this kind of documentation on sight. "There is a fast tool 
but here is our peculiar set of rules that makes certain constructs slow 
as a pig. So, watch out! Isn't that convenient?"

 anyone with the most basic programming and mathematical knowledge can
 write concise code operating on arrays, taking advantage of the
 potential speedups while being aware of the pitfalls.

People typically are not aware as long as it seems to work.

 In short:
 Vector syntax/array ops is/are great. Concise code that's easy to read
 and write. They fulfill one of the guiding principles of D: the most
 obvious code is fast and safe (or if not 100% safe, at least not too
 error-prone).

This change fits scripting language more then system.
For me
a[] = b[] + c[];
implies:
a[0..$] = b[0..$] + c[0..$]
so it's obvious that lengths better match and 'a' must be preallocated.


 More vector syntax capabilities please!

It would have been nice to write things like:
a[] = min(b[], c[]);
where min is a regular function.

But again I don't see the pressing need:
- if speed is of concern then 'arbitrary function' can't be sped up much 
by hardware
- if flexibility then range-style operation is far more flexible
-- 
Dmitry Olshansky

Nov 23 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 11/23/2012 7:58 AM, Dmitry Olshansky wrote:
 anyone with the most basic programming and mathematical knowledge can
 write concise code operating on arrays, taking advantage of the
 potential speedups while being aware of the pitfalls.

 People typically are not aware as long as it seems to work.

As an example, bearophile is an experienced programmer. He just posted two 
loops, one using pointers and another using arrays, and was mystified why the 
array version was slower. He even posted the assembler output, where it was 
pretty obvious (to me, anyway) that he had array bounds checking turned on in 
the array version, which will slow it down.

So yes, it's a problem when subtle changes in code can result in significant 
slowdowns, and yes, even experienced programmers get caught by that.

Nov 23 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Wednesday, 21 November 2012 at 18:15:51 UTC, Walter Bright 
wrote:
 On 11/21/2012 10:02 AM, John Colvin wrote:
 My vision of how things could work:
 c = a[] opBinary b[];
 should be legal. It should create a new array that is then 
 reference assigned to c.

 This is not done because it puts excessive pressure on the 
 garbage collector. Array ops do not allocate memory by design.

  But if they wanted it anyways, could implement it as a struct... 
Here's a rough build... Should be fairly obvious what's happening.

struct AllocatingVectorArray(T) {
   T[] data;
   alias data this;
   alias AllocatingVectorArray AVA;

   //forces slice operations for vector format only
   static struct AVASlice {
     T[] data;
     alias data this;

     this(T[] rhs) {
       data = rhs;
     }

     AVA opBinary(string op)(const AVASlice rhs) {
       assert(rhs.length == data.length, "Lengths don't match, 
cannot use vector operations");

       AVA var; var.data = data.dup;
       mixin("var[] " ~ op ~ "= rhs[];");

       return var;
     }
   }

   this(T[] rhs) {
     data = rhs;
   }

   ref AVA opAssign(T[] rhs) {
     data = rhs;
     return this;
   }

   AVASlice opSlice() {
     return AVASlice(this);
   }
}

unittest {
   alias AllocatingVectorArray!int AVAint;
   AVAint a = [1,2,3,4];
   AVAint b = [5,6,7,8];

   AVAint c;

//  c = a + b; //not allowed, 'not implemented error'
//  assert(c == [6,8,10,12]);

   c = a[] + b[]; //known vector syntax
   assert(c == [6,8,10,12]);

   c[] = a[] + b[]; //more obvious what's happening
   assert(c == [6,8,10,12]);
}

Nov 23 2012

Mike Wey <mike-wey example.com> writes:

On 11/21/2012 07:02 PM, John Colvin wrote:
 //An example, lets imagine a greyscale image, stored as an array of
 pixel rows:

 double[][] img = read_bmp(fn,"grey");

 //we want to crop it to some user defined co-ords (x1,y1),(x2,y2):

 //Version A, current syntax

 auto img_cropped = img[y1..y2].dup;
 foreach(ref row; img_cropped) {
      row = row[x1..x2];
 }
 //3 lines of code for a very simple idea.

 //Version B, new syntax

 auto img_cropped = img[y1..y2, x1..x2];

 //Very simple, easy to read code that is clear in it's purpose.

 I propose that Version B would be equivalent to A: An independent window
 on the data. Any reassignment of a row (i.e. pointing it to somewhere
 else, not copying new data in) will have no effect on the data. This
 scales naturally to higher dimensions and is in agreement with the
 normal slicing rules: the slice itself is independent of the original,
 but the data inside is shared.

 I believe this would be a significant improvement to D, particularly for
 image processing and scientific applications.

If you want to use this syntax with images, DMagick's ImageView might be 
interesting:
http://dmagick.mikewey.eu/docs/ImageView.html

-- 
Mike Wey

Nov 21 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 21 November 2012 at 19:40:25 UTC, Mike Wey wrote:
 If you want to use this syntax with images, DMagick's ImageView 
 might be interesting:
 http://dmagick.mikewey.eu/docs/ImageView.html

I like it :)
 From what I can see it provides exactly what i'm talking about 
for 2D. I haven't looked at the implementation in detail, but do 
you think that such an approach could be scaled up to arbitrary 
N-dimensional arrays?

Nov 22 2012

"Robert Jacques" <rjacque2 live.johnshopkins.edu> writes:

On Thu, 22 Nov 2012 06:10:04 -0600, John Colvin  
<john.loughran.colvin gmail.com> wrote:

 On Wednesday, 21 November 2012 at 19:40:25 UTC, Mike Wey wrote:
 If you want to use this syntax with images, DMagick's ImageView might  
 be interesting:
 http://dmagick.mikewey.eu/docs/ImageView.html

 I like it :)
  From what I can see it provides exactly what i'm talking about for 2D.  
 I haven't looked at the implementation in detail, but do you think that  
 such an approach could be scaled up to arbitrary N-dimensional arrays?

Yes and no. Basically, like an array, an ImageView is a thick pointer and  
as the dimensions increase the pointer gets thicker by 1-2 words a  
dimension. And each indexing or slicing operation has to create a  
temporary with this framework, which leads to stack churn as the  
dimensions get large. An another syntax that can be used until we get  
true, multi-dimensional slicing is to use opIndex with int[2] arguments,  
i.e: view[[4,40],[5,50]] = new Color("red");

Nov 22 2012

Mike Wey <mike-wey example.com> writes:

On 11/22/2012 01:10 PM, John Colvin wrote:
 On Wednesday, 21 November 2012 at 19:40:25 UTC, Mike Wey wrote:
 If you want to use this syntax with images, DMagick's ImageView might
 be interesting:
 http://dmagick.mikewey.eu/docs/ImageView.html

 I like it :)
  From what I can see it provides exactly what i'm talking about for 2D.
 I haven't looked at the implementation in detail, but do you think that
 such an approach could be scaled up to arbitrary N-dimensional arrays?

Every dimension has it's own type, so it won't scale well to a lot of 
dimensions. When sliceing every dimension would create an temporary.

-- 
Mike Wey

Nov 22 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Array Operations: a[] + b[] etc.