digitalmars.D - opSlice as lvalue

Derek Parnell (13/13) Jul 06 2004 Hi all,

Norbert Nemec (13/26) Jul 07 2004 I don't know about Walters reasoning in that respect, but I can state my

Sam McCall (18/52) Jul 07 2004 Foo opSlice(int start1,end1,start2,end2,...)

Norbert Nemec (30/58) Jul 07 2004 The expression

Sam McCall (55/98) Jul 07 2004 Apologies, I did meant to type

Stephen Waits (5/8) Jul 07 2004 But, if we could get good vectorization (in the sense that the language

Derek (53/82) Jul 07 2004 So what? Single dimension is just fine for my needs. As far as I can see...

Stephen Waits (7/14) Jul 07 2004 Exactly - your needs. However, there are lots of people who care, or

Derek (30/45) Jul 07 2004 In which case I have a magic machine on my desk, because D is *not* kill...

Norbert Nemec (44/92) Jul 07 2004 The result of my thinking about this mail can be seen in the newly start...

Derek (29/71) Jul 07 2004 Ok, so you are saying that D generates sub-optimal code for these sorts ...

Regan Heath (8/11) Jul 07 2004 [snip]
Norbert Nemec (15/22) Jul 07 2004 I think, this is just for the D-community and Walter to decide: if I wou...

Arcane Jill (9/14) Jul 08 2004 Forgive my language, but I can't figure how better to express this...

Norbert Nemec (22/40) Jul 08 2004 I can show you a piece of C-code I wrote for a class in

Arcane Jill (9/11) Jul 08 2004 Well, there's "interested" and there's "having time". Personally I would...

Norbert Nemec (53/66) Jul 08 2004 I think we can save that time by giving some numbers:

Stephen Waits (8/14) Jul 08 2004 You hit the nail on the head here.

Stephen Waits (10/15) Jul 08 2004 It's one thing to code a few small routines in assembler. It's another

Arcane Jill (9/18) Jul 08 2004 Pretty much every language on the planet can do this. D can do this too....

Derek Parnell (14/38) Jul 08 2004 That's why I started this thread. I assumed it was overloadable but soon
Norbert Nemec (3/5) Jul 08 2004 I just thought it was, but I changed my mind after I found that idea of
Sam McCall (4/6) Jul 08 2004 Agree with you, I don't see any problem at all.
Charles Hixson (21/38) Jul 08 2004 What if you don't have integers as your index?

Charles Hixson (38/78) Jul 08 2004 Perhaps this is primarily an argument asserting that [] should be a

Derek Parnell <derek psych.ward> writes:

Hi all,
I'm trying to implement a structure in which array copying would be useful.
I expected to be able to overload this operation with a variety of
opSlice(). 

We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
apply for opSlice.

Is there a good reason for this?

I'm now doing this operation via a roll-your-own function, but overloading
would be a bit more consistent.

-- 
Derek
Melbourne, Australia
7/Jul/04 12:36:43 PM

Jul 06 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

I don't know about Walters reasoning in that respect, but I can state my
position:

I think that even the opSlice that is there should be used with care. As I
see it, there is no way to cleanly extend it to multiple dimensions and
furthermore it collides badly with the concept vectorization That is
hopefully coming into the language in the future. opIndex is not a problem,
as the concept of indexing itself already breaks vectorization.
Implementing slicing via opSlice, though, would always mean to create a
temporary copy of the data which is killing performance.

Maybe some version of the existing read-access opSlice could still make
sense in the future. Slice assignment though would - if overloadable at all
- need a completely new concept.



Derek Parnell wrote:

 Hi all,
 I'm trying to implement a structure in which array copying would be
 useful. I expected to be able to overload this operation with a variety of
 opSlice().
 
 We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
 apply for opSlice.
 
 Is there a good reason for this?
 
 I'm now doing this operation via a roll-your-own function, but overloading
 would be a bit more consistent.

Jul 07 2004

Sam McCall <tunah.d tunah.net> writes:

Norbert Nemec wrote:
 I don't know about Walters reasoning in that respect, but I can state my
 position:
 
 I think that even the opSlice that is there should be used with care. As I
 see it, there is no way to cleanly extend it to multiple dimensions 

Foo opSlice(int start1,end1,start2,end2,...)
foo[start1..end1][start2..end2]

 furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a problem,
 as the concept of indexing itself already breaks vectorization.

Surely being able to do something at a speed that's adequate for most 
applications (as evidenced by the fact that it works fine now, without 
vectorisation) is better than not being able to do it at all?

 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.

Nope, if there's some underlying array, create a new object that wraps a 
slice of that. Or create a new object implementing the same interface, 
and tell it the bounds, like List.subList in java. (D still doesn't have 
inner classes, but they can be faked).

 Maybe some version of the existing read-access opSlice could still make
 sense in the future. Slice assignment though would - if overloadable at all
 - need a completely new concept.

I disagree. opSlice works well in almost all situations now. Yes, 
multidimensional extensions would be nice. The fact that opIndex is 
assignable but opSlice isn't is a nasty wart, if slicing is to be 
allowed (and it should be, to keep parity with native types[1]) then 
slice assignment should also be allowed.

Sam

[1]Speaking of which, delete foo[bar] for associative arrays should be 
overloadable (and probably renamed, but that's another story...)
 
 
 
 Derek Parnell wrote:
 
 
Hi all,
I'm trying to implement a structure in which array copying would be
useful. I expected to be able to overload this operation with a variety of
opSlice().

We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
apply for opSlice.

Is there a good reason for this?

I'm now doing this operation via a roll-your-own function, but overloading
would be a bit more consistent.

Jul 07 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Sam McCall wrote:

 Norbert Nemec wrote:
 I don't know about Walters reasoning in that respect, but I can state my
 position:
 
 I think that even the opSlice that is there should be used with care. As
 I see it, there is no way to cleanly extend it to multiple dimensions

 Foo opSlice(int start1,end1,start2,end2,...)
 foo[start1..end1][start2..end2]

The expression
        foo[start1..end1][start2..end2]
does not do what you expect. It is parsed as
        (foo[start1..end1])[start2..end2]
where the slice resulting from the first expression is sliced again. If you
try to handle a nested array like this, you will slice the outermost
dimension twice.

There is no way to slice the inner dimension of nested arrays.

For true multidimensional arrays as they will hopefully be part of the
language sometime in the future, the slicing expression would look like:
        foo[start1..end1,start2..end2]
Defining opSlice the way you do it would certainly be possible, but it would
get rather ugly once you consider that you can also do partial slicing
        foo[idx1,start2..end2]
returning a one-dimensional array, which is different from
        foo[idx1..idx1+1,start2..end2]
which returns a two-dimensional array with .range[0]==1.

If that is not ugly enough for you than consider what happens when you
introduce striding as well:
        foo[start1..end1:stride1,start2..end2:stride2]


 furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a
 problem, as the concept of indexing itself already breaks vectorization.

 Surely being able to do something at a speed that's adequate for most
 applications (as evidenced by the fact that it works fine now, without
 vectorisation) is better than not being able to do it at all?

It is not just a matter of performance but a matter of concept: what does a
assignment to a slice actually mean. The the overloaded slicing has
different semantics than the vectorizable slicing of arrays, then it is
questionable whether it is a good idea to have it. Currently I doubt, that
the semantics of the vectorizable array slicing could be captured be any
kind of a opSlice function.

 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.

 Nope, if there's some underlying array, create a new object that wraps a
 slice of that. Or create a new object implementing the same interface,
 and tell it the bounds, like List.subList in java. (D still doesn't have
 inner classes, but they can be faked).

True, that might be possible

 Maybe some version of the existing read-access opSlice could still make
 sense in the future. Slice assignment though would - if overloadable at
 all - need a completely new concept.

 I disagree. opSlice works well in almost all situations now. Yes,
 multidimensional extensions would be nice. The fact that opIndex is
 assignable but opSlice isn't is a nasty wart, if slicing is to be
 allowed (and it should be, to keep parity with native types[1]) then
 slice assignment should also be allowed.

I don't think it will be possible to keep that parity. At least, any
solution I can think of, would be too ugly to put it into D.

Jul 07 2004

Sam McCall <tunah.d tunah.net> writes:

Norbert Nemec wrote:

 Sam McCall wrote:
 
 
Norbert Nemec wrote:

I don't know about Walters reasoning in that respect, but I can state my
position:

I think that even the opSlice that is there should be used with care. As
I see it, there is no way to cleanly extend it to multiple dimensions

Foo opSlice(int start1,end1,start2,end2,...)
foo[start1..end1][start2..end2]

 
 
 The expression
         foo[start1..end1][start2..end2]
 does not do what you expect. It is parsed as
         (foo[start1..end1])[start2..end2]
 where the slice resulting from the first expression is sliced again. If you
 try to handle a nested array like this, you will slice the outermost
 dimension twice.

Apologies, I did meant to type
foo[start1..end1,start2..end2]

 For true multidimensional arrays as they will hopefully be part of the
 language sometime in the future, the slicing expression would look like:
         foo[start1..end1,start2..end2]
 Defining opSlice the way you do it would certainly be possible, but it would
 get rather ugly once you consider that you can also do partial slicing
         foo[idx1,start2..end2]
 returning a one-dimensional array, which is different from
         foo[idx1..idx1+1,start2..end2]
 which returns a two-dimensional array with .range[0]==1.

This is interesting, you've obviously thought about this a lot more than 
I have. What if there were two allowed signatures for opSlice:
T opSlice(start1,end1,...);
T opSlice(start1,end1,...,bool[N] single)
For a "full" slice (all indices are ranges), the first would be 
preferred, else fall back to the second with single all true.
Partial slices would only be allowed to use the second one.
In the second one, N would have to match the number of indices, and if 
single[i] was true then endI==startI (or startI+1, whatever), and the 
intent is that that dimension is being indexed, not sliced.
This is going to be a bit tricky to code for, but I think it fits with 
"simple things simple, hard things possible".

 If that is not ugly enough for you than consider what happens when you
 introduce striding as well:
         foo[start1..end1:stride1,start2..end2:stride2]

Personally, I don't see this getting into slice notation, vectorisation 
or no. AFAICS, this is advanced/rare functionality and while I'm sure 
it's useful in many cases, it doesn't warrant an overload. As you say, 
things get ugly.

 It is not just a matter of performance but a matter of concept: what does a
 assignment to a slice actually mean. The the overloaded slicing has
 different semantics than the vectorizable slicing of arrays, then it is
 questionable whether it is a good idea to have it.

The semantics won't be exactly the same, just the same way that 
overriding addition for sets (union) isn't going to have the same 
semantics as overriding addition for numbers. While I agree that 
overloads of an operator should be /analagous/ (in the way iostreams 
aren't), I don't think a rule that says we're completely tied to what 
the language does is fair.
I may be misunderstanding you here.

 Currently I doubt, that
 the semantics of the vectorizable array slicing could be captured be any
 kind of a opSlice function.

I'm not sure what semantics you're referring to, again, I might have 
missed the point entirely.
a) the "high performance" bit: no, a class with overloads won't be as 
fast as the native data structure. I don't think this negates the utility.
b) slice assignment elements being reorderable: if this is imposed on 
native slices, it can be imposed as a condition of overloading opslice.
If the indexer of the result of the slice is inlinable (maybe if not?) 
then the thing should be vectorisable in a similar (but more complex) 
way to arrays.
c) slice assignment overwriting the original: this should be a guideline 
to be ignored at one's own peril ;-). Operator overloading can be 
abused, but we trust the programmer.
d) There being a one-to-one relationship between source cells and 
destination cells: being able to replace, eg, a small segment with a 
long segment in a string is one of the reasons I want opSliceAssign.
e) Something else... help!

 I don't think it will be possible to keep that parity. At least, any
 solution I can think of, would be too ugly to put it into D.

But it would be a pity to give it up. In java, I sometimes find myself 
using (native) arrays instead of (library) vectors because the syntax is 
better and no library can achieve it.

I agree that D should remain compatible with powerful optimisations like 
vectorisation, I don't think overloading slices neccesarily breaks it.
I think full slicing is a very useful feature, and even if custom slices 
_weren't_ vectorisable, I would argue to keep them.
If you need maximum performance, then you will always need to be careful 
about doing/using inefficient things. It would be ideal if opSlice 
wasn't one of them, but if it has to be, removing it will just mean one 
less feature you couldn't have used anyway. This doesn't seem like a big 
win to me.

Sam

Jul 07 2004

Stephen Waits <steve waits.net> writes:

Sam McCall wrote:
 Surely being able to do something at a speed that's adequate for most 
 applications (as evidenced by the fact that it works fine now, without 
 vectorisation) is better than not being able to do it at all?

But, if we could get good vectorization (in the sense that the language 
makes it much easier on compilers) with little compromise, wouldn't 
everyone agree that's best?

--Steve

Jul 07 2004

Derek <derek psyc.ward> writes:

On Wed, 07 Jul 2004 09:14:32 +0200, Norbert Nemec wrote:

 I don't know about Walters reasoning in that respect, but I can state my
 position:
 
 I think that even the opSlice that is there should be used with care. As I
 see it, there is no way to cleanly extend it to multiple dimensions

So what? Single dimension is just fine for my needs. As far as I can see, a
slice operation is one that is only done on a vector (single dimension
array). To me is just a way of specifying a subset of vector elements. 

Once one moves into multi-dimensional array operations, eg. maxtrix; a
vector of vectors, we need a new syntax anyway. opSlice can remain the same
(a vector slice) and D can have some new opXXX function for 'matrix
slicing'.

 and furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a problem,
 as the concept of indexing itself already breaks vectorization.

Hmmmm..."collides"? Are there not two different paradigms at work here?
Both just as valid as each other? How is this a collision? 

 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.

Excuse me for being a bit blunt, but "killing performance" is just overly
dramatic. Is not "performance" a relative thing...something in the eye of
the beholder? If the end user isn't impacted by an application's speed, who
cares? 

My needs are obviously different to your needs. And that's just fine. I
don't tend to write CPU-intensive applications. There is no way that D's
slicing operation is "killing" my apps. 

 Maybe some version of the existing read-access opSlice could still make
 sense in the future.

Yep! - it makes sense now so why wouldn't it in the future?

 Slice assignment though would - if overloadable at all
 - need a completely new concept.

Yep! But not such a totally "new" concept, as its old hat in Basic and lots
of other languages already!

  MID(myData, 4, 5) = "Derek"

or in a new D ...

   myData[4..9] = "Derek";

Not too big a stretch of the imagination, eh?

Its just bloody shorthand for ...

  myData[4] = 'D';
  myData[5] = 'e';
  myData[6] = 'r';
  myData[7] = 'e';
  myData[8] = 'k';

or ...

  source = "Derek";
  for(i=4, j=0; j < source.length; i++,j++)
     myData[i] = source[j];

 
and that's it! Nothing fancy. Just a syntax shorthand for updating a bunch
of adjacent vector elements.

 Derek Parnell wrote:
 
 Hi all,
 I'm trying to implement a structure in which array copying would be
 useful. I expected to be able to overload this operation with a variety of
 opSlice().
 
 We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
 apply for opSlice.
 
 Is there a good reason for this?
 
 I'm now doing this operation via a roll-your-own function, but overloading
 would be a bit more consistent.


My methods currently looks a lot like the longhand above ...

    void opSlice(int i, int j, char[] x)
    {
        for (int k = 0; i < j; i++,k++)
        {
            if ((i >= 0) && (i < Elem.length) && (k < x.length))
                Elem[i]( cast(int)x[k] );
        };
    }

in anticipation of being able to write ...

   Foo[i..j] = x;

whereas I now write ...

   Foo.opSlice(i, j, x);

-- 
Derek
Melbourne, Australia

Jul 07 2004

Stephen Waits <steve waits.net> writes:

Derek wrote:
 Excuse me for being a bit blunt, but "killing performance" is just overly
 dramatic. Is not "performance" a relative thing...something in the eye of
 the beholder? If the end user isn't impacted by an application's speed, who
 cares? 

Who's really being dramatic here?  "killing performance" is the truth.

 My needs are obviously different to your needs. And that's just fine. I
 don't tend to write CPU-intensive applications. There is no way that D's
 slicing operation is "killing" my apps. 

Exactly - your needs.  However, there are lots of people who care, or 
even require, that generated code be as efficient as possible.

Remember, this is a general purpose language - not a language just for 
those who don't care about multi-dimensional array performance.

--Steve

Jul 07 2004

Derek <derek psyc.ward> writes:

On Wed, 07 Jul 2004 10:37:19 -0700, Stephen Waits wrote:

 Derek wrote:
 Excuse me for being a bit blunt, but "killing performance" is just overly
 dramatic. Is not "performance" a relative thing...something in the eye of
 the beholder? If the end user isn't impacted by an application's speed, who
 cares? 

 
 Who's really being dramatic here?  "killing performance" is the truth.

In which case I have a magic machine on my desk, because D is *not* killing
it's performance. To be more precise, I am not being prevented from using
this machine to the extent that I need to, by D's executables.

 My needs are obviously different to your needs. And that's just fine. I
 don't tend to write CPU-intensive applications. There is no way that D's
 slicing operation is "killing" my apps. 

 
 Exactly - your needs.  However, there are lots of people who care, or 
 even require, that generated code be as efficient as possible.

I'm sorry. I didn't mean to imply that I didn't care about "efficient as
possible". Of course a D compiler should produce efficient code. And in
additional to that, D's current vector slicing is already efficient enough
for me. True, it may not be efficient enough for other people, and thus a
different implementation may be required. I have no problems with that
either.

The implication, as I read it, in "killing performance" is that *every*
instance of the use of D's current slicing is a 'bad thing'. It is with
that implication that I have an issue with. For I simply have no evidence
of that. In fact I have evidence of the contrary view, namely that my
machine is not 'dying' under the load.

 Remember, this is a general purpose language - not a language just for 
 those who don't care about multi-dimensional array performance.

And yet, it sounds like some are asking for special-purpose
multi-dimensional array operations. I've been writing business/financial
applications for 30 years and have yet to need special-purpose
multi-dimensional array operations. It seems that I travel in a different
world.

I do not care whether D implements multi-dimensional array operations or
not. I neither argue for them nor against them. It has no impact on the
types of applications that I need to write. If Walter has time to do it,
then fine! I changes my world not one iota.

I was afraid that I wouldn't be very clear, and thus be misunderstood. 

I was really just asking for a coding shorthand that would 'appear' like a
sliced lvalue. That's all.

-- 
Derek
Melbourne, Australia

Jul 07 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

The result of my thinking about this mail can be seen in the newly started
thread. Here just a few comments

Derek wrote:

 On Wed, 07 Jul 2004 09:14:32 +0200, Norbert Nemec wrote:
 Once one moves into multi-dimensional array operations, eg. maxtrix; a
 vector of vectors, we need a new syntax anyway. opSlice can remain the
 same (a vector slice) and D can have some new opXXX function for 'matrix
 slicing'.

No need for that. As you can see in the other message everything fits nicely
into one concept.

 and furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a
 problem, as the concept of indexing itself already breaks vectorization.

 
 Hmmmm..."collides"? Are there not two different paradigms at work here?
 Both just as valid as each other? How is this a collision?

Forget my words. By now, my concept of vector expression has evolved to a
point, where the collision is gone.

 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.

 
 Excuse me for being a bit blunt, but "killing performance" is just overly
 dramatic. Is not "performance" a relative thing...something in the eye of
 the beholder? If the end user isn't impacted by an application's speed,
 who cares?
 
 My needs are obviously different to your needs. And that's just fine. I
 don't tend to write CPU-intensive applications. There is no way that D's
 slicing operation is "killing" my apps.

When I talk about performance, I think about high-performance numerics. This
is the area where the design of a language is most crucial. C++ for example
is inherently slower than Fortran (unless you pull out some nasty tricks)
because the language gets in the way of optimizing. What I am concerned
about, is to avoid that kind of problems in D. Numerics are important in
science (which is my field) but also in computer graphics and gaming.
Anyone working in one of these fields will agree that array handling is too
important to go for some quick solution.

Of course, I might be a bit overly concerned. It already turns out that the
solutions get simpler and simpler the more I think about it all.

In the case at hand, I just realized how things could work together that you
have all the flexibility you need for user-defined types, and all the
performance you want for native arrays.

 Slice assignment though would - if overloadable at all
 - need a completely new concept.

 
 Yep! But not such a totally "new" concept, as its old hat in Basic and
 lots of other languages already!
 
   MID(myData, 4, 5) = "Derek"
 
 or in a new D ...
 
    myData[4..9] = "Derek";
 
 Not too big a stretch of the imagination, eh?

True.

 Its just bloody shorthand for ...
 
   myData[4] = 'D';
   myData[5] = 'e';
   myData[6] = 'r';
   myData[7] = 'e';
   myData[8] = 'k';
 
 or ...
 
   source = "Derek";
   for(i=4, j=0; j < source.length; i++,j++)
      myData[i] = source[j];

Not true. OK, I confess that this is, what the specs say, but here I
strongly disagree:

The important detail about a vector assignment is, that it does not specify
the order of the assignments. This is crucial for allowing the compiler to
optimize aggressively. If you write a loop, the compiler might be
intelligent enough in some cases to realize that the iterations are
independent and can be swapped. In general, this can not always be
determined, so the compiler often has to be overly careful, even though it
would actually be possible to swap iterations.

Using a vector expression, the programmer indicates clearly, that the order
does not matter, allowing the compiler to swap around.

For example, the two statements

        myDataA[2..5] = myDataA[3..6];
        myDataB[3..6] = myDataB[2..5];

both give undetermined results. They are legal (the compiler can, in
general, not detect the overlap) but it is the programmers problem that the
outcome is undefined.

As I said, the specs say something different: following them, the first
would be correct, the second garbage. But in this point, the specs should
definitely be corrected.

B.t.w: This would be a case, where compiler-warnings would be reasonable: if
the compiler finds it, it clearly is a mistake by the programmer, but since
the compiler cannot find it in general, it cannot be made an error.

Jul 07 2004

Derek <derek psyc.ward> writes:

On Wed, 07 Jul 2004 23:09:27 +0200, Norbert Nemec wrote:


[snip]

 When I talk about performance, I think about high-performance numerics. This
 is the area where the design of a language is most crucial. C++ for example
 is inherently slower than Fortran (unless you pull out some nasty tricks)
 because the language gets in the way of optimizing. What I am concerned
 about, is to avoid that kind of problems in D. Numerics are important in
 science (which is my field) but also in computer graphics and gaming.
 Anyone working in one of these fields will agree that array handling is too
 important to go for some quick solution.
 

Ok, so you are saying that D generates sub-optimal code for these sorts of
applications. So don't use D! Use assembler or some specialized programming
language. If D is to be a general purpose language, one should expect that
it will not specialize in every programming domain.

[snip]
 
    myData[4..9] = "Derek";
 
 Not too big a stretch of the imagination, eh?

 
 True.
 
 Its just bloody shorthand for ...
 
   myData[4] = 'D';
   myData[5] = 'e';
   myData[6] = 'r';
   myData[7] = 'e';
   myData[8] = 'k';
 
 or ...
 
   source = "Derek";
   for(i=4, j=0; j < source.length; i++,j++)
      myData[i] = source[j];

 
 Not true. OK, I confess that this is, what the specs say, but here I
 strongly disagree:

Here I see a fundamental difference in the way we are approaching this
issue. I'm looking at the results, and you are looking at the process.

If I were to reword my example above (with a pedantic streak)...

Its just bloody shorthand that ensures that after the operation is complete
... 

   myData[4] ends up with the value 'D'
   myData[5] ends up with the value 'e'
   myData[6] ends up with the value 'r'
   myData[7] ends up with the value 'e'
   myData[8] ends up with the value 'k'

And I don't really care how the magic happens.

 The important detail about a vector assignment is,

I'm not talking about "vector assignment". I just want a specific subset of
elements to end up with the values I need them to be. Vector assignment is
one of many methods to achieve that end.

 that it does not specify
 the order of the assignments. This is crucial for allowing the compiler to
 optimize aggressively. If you write a loop, the compiler might be
 intelligent enough in some cases to realize that the iterations are
 independent and can be swapped. In general, this can not always be
 determined, so the compiler often has to be overly careful, even though it
 would actually be possible to swap iterations.
 
 Using a vector expression, the programmer indicates clearly, that the order
 does not matter, allowing the compiler to swap around.

Fine! Whatever! So long as I end up with the values in the right order, I
don't care.

If I really needed Ferrarri fast code, I'd drop into assembler. The Porche
speed of D generated code is okay for me.


-- 
Derek
Melbourne, Australia

Jul 07 2004

Regan Heath <regan netwin.co.nz> writes:

On Thu, 8 Jul 2004 09:26:19 +1000, Derek <derek psyc.ward> wrote:

[snip]
 If I really needed Ferrarri fast code, I'd drop into assembler. The 
 Porche
 speed of D generated code is okay for me.

[snip]

Is a "Porche" a fancy french porch? My porch is attached to my house, and 
it's not fast at all ;o)

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 07 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Derek wrote:

 On Wed, 07 Jul 2004 23:09:27 +0200, Norbert Nemec wrote:
 Ok, so you are saying that D generates sub-optimal code for these sorts of
 applications. So don't use D! Use assembler or some specialized
 programming language. If D is to be a general purpose language, one should
 expect that it will not specialize in every programming domain.

I think, this is just for the D-community and Walter to decide: if I would
feel alone on my quest to bring high-performance-numerics into D, I'd give
quite soon. But I have the feeling that there is general interest for this.
Obviously numerics is not so exotic after all.

The term "general purpose language" does not mean that the language should
not try to excel in any special area. It just means that it should not do
so on cost of the others. The amount of time, effort and complexity put
into special areas is then just up to the personal interest of the language
developers. In this case: Walter.

 If I really needed Ferrarri fast code, I'd drop into assembler. The Porche
 speed of D generated code is okay for me.

Assembler is not the way to go for really high-performance-code. The
performace of modern processors depends highly on putting the instructions
in the right order. Doing something like this in assembler would give
complete spaghetti code. A good compiler for a good language can do much
better than any programmer could do in a reasonable amount of time.

Jul 07 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ccioo4$dj6$1 digitaldaemon.com>, Norbert Nemec says...

Assembler is not the way to go for really high-performance-code. The
performace of modern processors depends highly on putting the instructions
in the right order. Doing something like this in assembler would give
complete spaghetti code. A good compiler for a good language can do much
better than any programmer could do in a reasonable amount of time.

Forgive my language, but I can't figure how better to express this...

Bollocks!

I can do better in assembler than any C, C++ or D compiler I have ever met. I
think you're going to need some hard evidence to back up the above assertion.
Either that, or maybe set a competition - assember-written-by-human versus
assembler-compiled-from-C/C++/D-written-by human. But what would we get as a
prize?

Jill

Jul 08 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Arcane Jill wrote:

 In article <ccioo4$dj6$1 digitaldaemon.com>, Norbert Nemec says...
 
Assembler is not the way to go for really high-performance-code. The
performace of modern processors depends highly on putting the instructions
in the right order. Doing something like this in assembler would give
complete spaghetti code. A good compiler for a good language can do much
better than any programmer could do in a reasonable amount of time.

 
 Forgive my language, but I can't figure how better to express this...
 
 Bollocks!
 
 I can do better in assembler than any C, C++ or D compiler I have ever
 met. I think you're going to need some hard evidence to back up the above
 assertion. Either that, or maybe set a competition -
 assember-written-by-human versus
 assembler-compiled-from-C/C++/D-written-by human. But what would we get as
 a prize?

I can show you a piece of C-code I wrote for a class in
High-Performance-Computing: red-black-relaxation.

We started out with a fairly simple algorithm: Four nested loops iterating
over the same array over and over again. Over the course of one semester,
we improved this code step by reordering, interlacing and partially
unrolling the loops. The algorithm stayed the same, only the ordering of
the commands was changed. In the end, the code (initially ~20 simple lines)
had grown to several hundred lines of highly complex code, running roughly
10 times as fast.

Handling this kind of code in C is complex but possible, in Assembler, you
could probably forget about it. A good compiler for a well-designed
language would have been able to do all the steps of instruction reordering
automatically and intelligently. In C/C++/D, this is not possible yet,
because the language is not designed for it.

Currently, such languages and compilers are still in the experimental stage
(see www.sac-home.org, for example). My intention is to bring the crucial
language elements into D now, so that tomorrows compilers have a chance to
optimize. Of course, this should happen without sacrificing the general
qualities of D.

If you are interested in seeing the code, I can try to dig it up and bring
it to a presentable shape. Will take some time, though.

Jul 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ccivu1$pgc$1 digitaldaemon.com>, Norbert Nemec says...

If you are interested in seeing the code, I can try to dig it up and bring
it to a presentable shape. Will take some time, though.

Well, there's "interested" and there's "having time". Personally I would *love*
to take on a head-to-head optimization challenge - my assembler versus your
compiled C. But, I am just too busy right now. I have things to do that are more
important than my pride.

But one day, when I'm less busy (Free time? What's that?), I'll take you up on
that challenge. And to make it interesting, the loser buys the winner a
Guinness, yes?

Jill

Jul 08 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Arcane Jill wrote:

 In article <ccivu1$pgc$1 digitaldaemon.com>, Norbert Nemec says...
 
If you are interested in seeing the code, I can try to dig it up and bring
it to a presentable shape. Will take some time, though.

 
 Well, there's "interested" and there's "having time". Personally I would
 *love* to take on a head-to-head optimization challenge - my assembler
 versus your compiled C. But, I am just too busy right now. I have things
 to do that are more important than my pride.
 
 But one day, when I'm less busy (Free time? What's that?), I'll take you
 up on that challenge. And to make it interesting, the loser buys the
 winner a Guinness, yes?

I think we can save that time by giving some numbers:

The program we were optimizing used a fixed number of floating point
operations per run. For a given size of the data, say, 10 Million floating
point additions/multiplications, etc. No matter what optimization, these
operations had to be done. (Different algorithms might have needed less,
but the task was to implement this algorithm)

The theoretical speed limit of a processor can be given in MFLOPS (Million
Floating Point Operations Per Second) The processor we were using back then
hat it had roughly 1000 MFLOPS maximum performance. Therefore, it is clear
that - no matter what - the program will run at least 10 seconds. The only
problem is, to come as near to that limit as possible.

The original version achieved a performance of ~50 MFLOPS, therefore
exploiting roughly 5% of the potential of the processor. This is a typical
value for naively written C code, using simple nested loops running over
data that is much larger than the cache. 10Mio. ops at 50 MFLOPS gives
200sec running time.

In the course of the semester, we hand-optimized this code up to ~600
MFLOPS, i.e. exploiting ~60% of the processors capabilities. This is about
the limit that can be achieved for data-intensive real-world calculations.
(programs on small data-sets that fits into the L1-Cache can often get near
the theoretical limit, but they are rather academic)

Now, 600MFLOPS means ~17seconds run-time. 10 seconds of this were used
practically, 7 were wasted somewhere. A hardware-profiler showed where the
time was lost:

* integer-operations and flow-control are negligible. They happen in a
separate unit of the CPU, so they are effectively happening in parallel to
the productive floating-point operations.

* very little time was lost by pipeline stalls (the operations were very
well interlaced)

* very little time was lost on branch-mispredictions (the loops were so
simply structured that the prediction unit of the CPU had no problem)

* the vast amount of the lost time was still spend on cache misses. Even
though we did everything to exploit the cache as much as possible, we still
had some bottleneck left there.

Rewriting the code in assembler might perhaps give you an increase of a few
more percent, but the only way to really go beyond what we already achieved
would be to further optimize the locality of the data access making more
use of the cache. And this really means that doing the whole thing in
Assembler would get more and more complicated.

As a last word about the topic: everything I said here is not special to the
algorithm we were looking at, but it is the daily bread of numerics.
"Numerics" means that there are three possible bottlenecks

* The floating point unit itself
* Pipeline-stalls
* Data access.

The first can only be overcome by mathematics searching for different
algorithms. The second one means that fine-grained instruction reordering
is important, the third one, that instructions should be reordered on a
larger scale. None of these problems can be solved by using hand-coded
assembler (except in rare, very simple cases). Using plain C cleverly is
possible but ugly. In the long run, optimizing compilers for well-designed
languages might take this ugly job from the user.

Jul 08 2004

Stephen Waits <steve waits.net> writes:

Norbert Nemec wrote:
 
 Rewriting the code in assembler might perhaps give you an increase of a few
 more percent, but the only way to really go beyond what we already achieved
 would be to further optimize the locality of the data access making more
 use of the cache. And this really means that doing the whole thing in
 Assembler would get more and more complicated.

You hit the nail on the head here.

My experience in game programming has been exactly as this.

Typically we may have a few, very small, inner loops hand coded in 
Assembler by the time a game is finished.

Cache misses, and hence painfully slow memory access (relatively), 
always seems to end up as our bottleneck (on a main CPU anyway).

--Steve

Jul 08 2004

Stephen Waits <steve waits.net> writes:

Arcane Jill wrote:
 
 Well, there's "interested" and there's "having time". Personally I would *love*
 to take on a head-to-head optimization challenge - my assembler versus your
 compiled C. But, I am just too busy right now. I have things to do that are
more
 important than my pride.

It's one thing to code a few small routines in assembler.  It's another 
that the language can deal with it well enough that it really makes a 
difference.  For instance, you may still have temporaries floating 
around, or, because you coded a few small atomic operations in 
assembler, you may be giving up opportunity for reduction, and 
combination of those operations.

Assembler, though it may help, is not the solution to the above 
mentioned issues.

--Steve

Jul 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <miydn03kpeka$.4mtnnmprlmi5.dlg 40tude.net>, Derek says...

If I were to reword my example above (with a pedantic streak)...

Its just bloody shorthand that ensures that after the operation is complete
... 

   myData[4] ends up with the value 'D'
   myData[5] ends up with the value 'e'
   myData[6] ends up with the value 'r'
   myData[7] ends up with the value 'e'
   myData[8] ends up with the value 'k'

And I don't really care how the magic happens.

Pretty much every language on the planet can do this. D can do this too. The
statement:



does actually compile and execute correctly. But what we DON'T have is the
ability to overload that expression.

I don't get why adding an opSliceAssign() overload would be a problem though.
Why does anyone think it would be a problem?

Arcane Jill

Jul 08 2004

Derek Parnell <derek psych.ward> writes:

On Thu, 8 Jul 2004 07:14:00 +0000 (UTC), Arcane Jill wrote:

 In article <miydn03kpeka$.4mtnnmprlmi5.dlg 40tude.net>, Derek says...
 
If I were to reword my example above (with a pedantic streak)...

Its just bloody shorthand that ensures that after the operation is complete
... 

   myData[4] ends up with the value 'D'
   myData[5] ends up with the value 'e'
   myData[6] ends up with the value 'r'
   myData[7] ends up with the value 'e'
   myData[8] ends up with the value 'k'

And I don't really care how the magic happens.

 
 Pretty much every language on the planet can do this. D can do this too. The
 statement:
 

 
 does actually compile and execute correctly. But what we DON'T have is the
 ability to overload that expression.

That's why I started this thread. I assumed it was overloadable but soon
found out otherwise. It's only an operator overload which means we can use
function call syntax instead. That is what I'm doing now, BTW.

    void opSlice(int i, int j, whatever x) { . . .  }

and using it thus...

    Foo.opSlice(i,j,x);

instead of the more obvious

    Foo[i,j] = x;

 I don't get why adding an opSliceAssign() overload would be a problem though.
 Why does anyone think it would be a problem?

Beats me!

-- 
Derek
Melbourne, Australia
8/Jul/04 5:28:45 PM

Jul 08 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Arcane Jill wrote:

 I don't get why adding an opSliceAssign() overload would be a problem
 though. Why does anyone think it would be a problem?

I just thought it was, but I changed my mind after I found that idea of
opPartialIndex. See the other thread about the full concept.

Jul 08 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:

 I don't get why adding an opSliceAssign() overload would be a problem though.
 Why does anyone think it would be a problem?

Agree with you, I don't see any problem at all.
The current situation strikes me as a borderline-bug.
Sam

Jul 08 2004

Charles Hixson <charleshixsn earthlink.net> writes:

Arcane Jill wrote:
 In article <miydn03kpeka$.4mtnnmprlmi5.dlg 40tude.net>, Derek says...
 
 
If I were to reword my example above (with a pedantic streak)...
..

 Pretty much every language on the planet can do this. D can do this too. The
 statement:
 

 
 does actually compile and execute correctly. But what we DON'T have is the
 ability to overload that expression.
 
 I don't get why adding an opSliceAssign() overload would be a problem though.
 Why does anyone think it would be a problem?
 
 Arcane Jill

What if you don't have integers as your index?
What I'd like is an extension of what you are proposing, + an 
extension of what a range is (i.e., allow range to be overloaded for 
different types of arguments).  So that one would be able to define:


Note that here myData would be defined as
	DType[Name[]]	myData;
and it would be Name that was required to define what a range meant. 
  And Name would need to have a special access method (or 
constructor?) that took a string as a value.  This may be a bit of a 
stretch beyond what we would want to add before D 1.0 comes out, but 
might be reasonable for consideration for D 2.0.

Access method vs. constructor:  I tend to think of this as being 
most useful when one already has a list that contains the indicies 
that one is attempting to access, so this could just be syntactic 

as an access method.

P.S.:  What I'm REALLY thinking of...sort of, is a B+Tree or B*Tree 
that's accessed as if it were an array.  But I want the tree to have 
keys of any comparable type, not just strings or integers.

Jul 08 2004

Charles Hixson <charleshixsn earthlink.net> writes:

Derek wrote:
 On Wed, 07 Jul 2004 09:14:32 +0200, Norbert Nemec wrote:
 
 
I don't know about Walters reasoning in that respect, but I can state my
position:

I think that even the opSlice that is there should be used with care. As I
see it, there is no way to cleanly extend it to multiple dimensions

 
 
 So what? Single dimension is just fine for my needs. As far as I can see, a
 slice operation is one that is only done on a vector (single dimension
 array). To me is just a way of specifying a subset of vector elements. 
 
 Once one moves into multi-dimensional array operations, eg. maxtrix; a
 vector of vectors, we need a new syntax anyway. opSlice can remain the same
 (a vector slice) and D can have some new opXXX function for 'matrix
 slicing'.
.....
 
 
 My methods currently looks a lot like the longhand above ...
 
     void opSlice(int i, int j, char[] x)
     {
         for (int k = 0; i < j; i++,k++)
         {
             if ((i >= 0) && (i < Elem.length) && (k < x.length))
                 Elem[i]( cast(int)x[k] );
         };
     }
 
 in anticipation of being able to write ...
 
    Foo[i..j] = x;
 
 whereas I now write ...
 
    Foo.opSlice(i, j, x);
 

Perhaps this is primarily an argument asserting that [] should be a 
more flexibly overloadable operator?  Perhaps one with a Template? 
So, e.g., a Hash could be indexed with something like:
	x	=	myHash["Betty".."Bill"]
which would definitely require an overloading of the [] operation, 
as hashes don't usually have any sensibly defined ordering.  (And if 
you want that, a tree is a better data structure.)

I will admit to a fond rememberance of Fortran multi-dimensional 
array subscripting, which I still consider superior for most 
purposes to the C version.  But Fortran never dealt with ranges, or 
user defined types being used as indexes (or even strings as 
indexes).  So something new is needed to accomplish what is intended.

If tuples were a part of D, I would suggest that array indicies be 
implicit tuples of ranges.  I.e., that every index be implicitly 
converted to a tuple, even if only of one element.  And that each 
tuple member be implicitly a range, even if the start and finish 
were the same.  This would product a kind of orthogonal simplicity 
in thinking about what is intended here, and simplify the parsing.
Thus one could have:

Foo[i..j, k..l, m..n] = x

And (using "(" as a tuple marker, this would be understood as:
Foo[(i..j), (k..l), (m..n)] = x
which would be (approx.) the same as:
Foo[i..j][k..l][m..n] = x

But note that for this to work a range would need to be a more 
general object, so that it could be specified how it would act over 
unexpected types, e.g., over strings.  Or names.  (We would want 
Betty..Bill to mean all of the people I'm thinking of between Betty 
and Bill, inclusive), this means that the strings would need to be 
references to some particular type of string that had a range 
operation defined over itself.)

Ranges:  Integers is a relatively simple case of range, and even 
there the documentation is quite confusing.  I don't understand why 
x[1..3] should mean (x[1], x[2]).  That isn't three elements long. 
It doesn't end with the third index of x (which would be x[3].  All 
I can think of is that in this case we have decided to use 1 based 
indexing rather than 0 based indexing, which seems quite strange.

Jul 08 2004

D Programming

C/C++ Programming

Other

digitalmars.D - opSlice as lvalue