www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - opSlice as lvalue

reply Derek Parnell <derek psych.ward> writes:
Hi all,
I'm trying to implement a structure in which array copying would be useful.
I expected to be able to overload this operation with a variety of
opSlice(). 

We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
apply for opSlice.

Is there a good reason for this?

I'm now doing this operation via a roll-your-own function, but overloading
would be a bit more consistent.

-- 
Derek
Melbourne, Australia
7/Jul/04 12:36:43 PM
Jul 06 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
I don't know about Walters reasoning in that respect, but I can state my
position:

I think that even the opSlice that is there should be used with care. As I
see it, there is no way to cleanly extend it to multiple dimensions and
furthermore it collides badly with the concept vectorization That is
hopefully coming into the language in the future. opIndex is not a problem,
as the concept of indexing itself already breaks vectorization.
Implementing slicing via opSlice, though, would always mean to create a
temporary copy of the data which is killing performance.

Maybe some version of the existing read-access opSlice could still make
sense in the future. Slice assignment though would - if overloadable at all
- need a completely new concept.



Derek Parnell wrote:

 Hi all,
 I'm trying to implement a structure in which array copying would be
 useful. I expected to be able to overload this operation with a variety of
 opSlice().
 
 We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
 apply for opSlice.
 
 Is there a good reason for this?
 
 I'm now doing this operation via a roll-your-own function, but overloading
 would be a bit more consistent.
 
Jul 07 2004
next sibling parent reply Sam McCall <tunah.d tunah.net> writes:
Norbert Nemec wrote:
 I don't know about Walters reasoning in that respect, but I can state my
 position:
 
 I think that even the opSlice that is there should be used with care. As I
 see it, there is no way to cleanly extend it to multiple dimensions 
Foo opSlice(int start1,end1,start2,end2,...) foo[start1..end1][start2..end2]
 furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a problem,
 as the concept of indexing itself already breaks vectorization.
Surely being able to do something at a speed that's adequate for most applications (as evidenced by the fact that it works fine now, without vectorisation) is better than not being able to do it at all?
 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.
Nope, if there's some underlying array, create a new object that wraps a slice of that. Or create a new object implementing the same interface, and tell it the bounds, like List.subList in java. (D still doesn't have inner classes, but they can be faked).
 Maybe some version of the existing read-access opSlice could still make
 sense in the future. Slice assignment though would - if overloadable at all
 - need a completely new concept.
I disagree. opSlice works well in almost all situations now. Yes, multidimensional extensions would be nice. The fact that opIndex is assignable but opSlice isn't is a nasty wart, if slicing is to be allowed (and it should be, to keep parity with native types[1]) then slice assignment should also be allowed. Sam [1]Speaking of which, delete foo[bar] for associative arrays should be overloadable (and probably renamed, but that's another story...)
 
 
 
 Derek Parnell wrote:
 
 
Hi all,
I'm trying to implement a structure in which array copying would be
useful. I expected to be able to overload this operation with a variety of
opSlice().

We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
apply for opSlice.

Is there a good reason for this?

I'm now doing this operation via a roll-your-own function, but overloading
would be a bit more consistent.
Jul 07 2004
next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Sam McCall wrote:

 Norbert Nemec wrote:
 I don't know about Walters reasoning in that respect, but I can state my
 position:
 
 I think that even the opSlice that is there should be used with care. As
 I see it, there is no way to cleanly extend it to multiple dimensions
Foo opSlice(int start1,end1,start2,end2,...) foo[start1..end1][start2..end2]
The expression foo[start1..end1][start2..end2] does not do what you expect. It is parsed as (foo[start1..end1])[start2..end2] where the slice resulting from the first expression is sliced again. If you try to handle a nested array like this, you will slice the outermost dimension twice. There is no way to slice the inner dimension of nested arrays. For true multidimensional arrays as they will hopefully be part of the language sometime in the future, the slicing expression would look like: foo[start1..end1,start2..end2] Defining opSlice the way you do it would certainly be possible, but it would get rather ugly once you consider that you can also do partial slicing foo[idx1,start2..end2] returning a one-dimensional array, which is different from foo[idx1..idx1+1,start2..end2] which returns a two-dimensional array with .range[0]==1. If that is not ugly enough for you than consider what happens when you introduce striding as well: foo[start1..end1:stride1,start2..end2:stride2]
 furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a
 problem, as the concept of indexing itself already breaks vectorization.
Surely being able to do something at a speed that's adequate for most applications (as evidenced by the fact that it works fine now, without vectorisation) is better than not being able to do it at all?
It is not just a matter of performance but a matter of concept: what does a assignment to a slice actually mean. The the overloaded slicing has different semantics than the vectorizable slicing of arrays, then it is questionable whether it is a good idea to have it. Currently I doubt, that the semantics of the vectorizable array slicing could be captured be any kind of a opSlice function.
 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.
Nope, if there's some underlying array, create a new object that wraps a slice of that. Or create a new object implementing the same interface, and tell it the bounds, like List.subList in java. (D still doesn't have inner classes, but they can be faked).
True, that might be possible
 Maybe some version of the existing read-access opSlice could still make
 sense in the future. Slice assignment though would - if overloadable at
 all - need a completely new concept.
I disagree. opSlice works well in almost all situations now. Yes, multidimensional extensions would be nice. The fact that opIndex is assignable but opSlice isn't is a nasty wart, if slicing is to be allowed (and it should be, to keep parity with native types[1]) then slice assignment should also be allowed.
I don't think it will be possible to keep that parity. At least, any solution I can think of, would be too ugly to put it into D.
Jul 07 2004
parent Sam McCall <tunah.d tunah.net> writes:
Norbert Nemec wrote:

 Sam McCall wrote:
 
 
Norbert Nemec wrote:

I don't know about Walters reasoning in that respect, but I can state my
position:

I think that even the opSlice that is there should be used with care. As
I see it, there is no way to cleanly extend it to multiple dimensions
Foo opSlice(int start1,end1,start2,end2,...) foo[start1..end1][start2..end2]
The expression foo[start1..end1][start2..end2] does not do what you expect. It is parsed as (foo[start1..end1])[start2..end2] where the slice resulting from the first expression is sliced again. If you try to handle a nested array like this, you will slice the outermost dimension twice.
Apologies, I did meant to type foo[start1..end1,start2..end2]
 For true multidimensional arrays as they will hopefully be part of the
 language sometime in the future, the slicing expression would look like:
         foo[start1..end1,start2..end2]
 Defining opSlice the way you do it would certainly be possible, but it would
 get rather ugly once you consider that you can also do partial slicing
         foo[idx1,start2..end2]
 returning a one-dimensional array, which is different from
         foo[idx1..idx1+1,start2..end2]
 which returns a two-dimensional array with .range[0]==1.
This is interesting, you've obviously thought about this a lot more than I have. What if there were two allowed signatures for opSlice: T opSlice(start1,end1,...); T opSlice(start1,end1,...,bool[N] single) For a "full" slice (all indices are ranges), the first would be preferred, else fall back to the second with single all true. Partial slices would only be allowed to use the second one. In the second one, N would have to match the number of indices, and if single[i] was true then endI==startI (or startI+1, whatever), and the intent is that that dimension is being indexed, not sliced. This is going to be a bit tricky to code for, but I think it fits with "simple things simple, hard things possible".
 If that is not ugly enough for you than consider what happens when you
 introduce striding as well:
         foo[start1..end1:stride1,start2..end2:stride2]
Personally, I don't see this getting into slice notation, vectorisation or no. AFAICS, this is advanced/rare functionality and while I'm sure it's useful in many cases, it doesn't warrant an overload. As you say, things get ugly.
 It is not just a matter of performance but a matter of concept: what does a
 assignment to a slice actually mean. The the overloaded slicing has
 different semantics than the vectorizable slicing of arrays, then it is
 questionable whether it is a good idea to have it.
The semantics won't be exactly the same, just the same way that overriding addition for sets (union) isn't going to have the same semantics as overriding addition for numbers. While I agree that overloads of an operator should be /analagous/ (in the way iostreams aren't), I don't think a rule that says we're completely tied to what the language does is fair. I may be misunderstanding you here.
 Currently I doubt, that
 the semantics of the vectorizable array slicing could be captured be any
 kind of a opSlice function.
I'm not sure what semantics you're referring to, again, I might have missed the point entirely. a) the "high performance" bit: no, a class with overloads won't be as fast as the native data structure. I don't think this negates the utility. b) slice assignment elements being reorderable: if this is imposed on native slices, it can be imposed as a condition of overloading opslice. If the indexer of the result of the slice is inlinable (maybe if not?) then the thing should be vectorisable in a similar (but more complex) way to arrays. c) slice assignment overwriting the original: this should be a guideline to be ignored at one's own peril ;-). Operator overloading can be abused, but we trust the programmer. d) There being a one-to-one relationship between source cells and destination cells: being able to replace, eg, a small segment with a long segment in a string is one of the reasons I want opSliceAssign. e) Something else... help!
 I don't think it will be possible to keep that parity. At least, any
 solution I can think of, would be too ugly to put it into D.
But it would be a pity to give it up. In java, I sometimes find myself using (native) arrays instead of (library) vectors because the syntax is better and no library can achieve it. I agree that D should remain compatible with powerful optimisations like vectorisation, I don't think overloading slices neccesarily breaks it. I think full slicing is a very useful feature, and even if custom slices _weren't_ vectorisable, I would argue to keep them. If you need maximum performance, then you will always need to be careful about doing/using inefficient things. It would be ideal if opSlice wasn't one of them, but if it has to be, removing it will just mean one less feature you couldn't have used anyway. This doesn't seem like a big win to me. Sam
Jul 07 2004
prev sibling parent Stephen Waits <steve waits.net> writes:
Sam McCall wrote:
 Surely being able to do something at a speed that's adequate for most 
 applications (as evidenced by the fact that it works fine now, without 
 vectorisation) is better than not being able to do it at all?
But, if we could get good vectorization (in the sense that the language makes it much easier on compilers) with little compromise, wouldn't everyone agree that's best? --Steve
Jul 07 2004
prev sibling parent reply Derek <derek psyc.ward> writes:
On Wed, 07 Jul 2004 09:14:32 +0200, Norbert Nemec wrote:

 I don't know about Walters reasoning in that respect, but I can state my
 position:
 
 I think that even the opSlice that is there should be used with care. As I
 see it, there is no way to cleanly extend it to multiple dimensions
So what? Single dimension is just fine for my needs. As far as I can see, a slice operation is one that is only done on a vector (single dimension array). To me is just a way of specifying a subset of vector elements. Once one moves into multi-dimensional array operations, eg. maxtrix; a vector of vectors, we need a new syntax anyway. opSlice can remain the same (a vector slice) and D can have some new opXXX function for 'matrix slicing'.
 and furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a problem,
 as the concept of indexing itself already breaks vectorization.
Hmmmm..."collides"? Are there not two different paradigms at work here? Both just as valid as each other? How is this a collision?
 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.
Excuse me for being a bit blunt, but "killing performance" is just overly dramatic. Is not "performance" a relative thing...something in the eye of the beholder? If the end user isn't impacted by an application's speed, who cares? My needs are obviously different to your needs. And that's just fine. I don't tend to write CPU-intensive applications. There is no way that D's slicing operation is "killing" my apps.
 Maybe some version of the existing read-access opSlice could still make
 sense in the future.
Yep! - it makes sense now so why wouldn't it in the future?
 Slice assignment though would - if overloadable at all
 - need a completely new concept.
Yep! But not such a totally "new" concept, as its old hat in Basic and lots of other languages already! MID(myData, 4, 5) = "Derek" or in a new D ... myData[4..9] = "Derek"; Not too big a stretch of the imagination, eh? Its just bloody shorthand for ... myData[4] = 'D'; myData[5] = 'e'; myData[6] = 'r'; myData[7] = 'e'; myData[8] = 'k'; or ... source = "Derek"; for(i=4, j=0; j < source.length; i++,j++) myData[i] = source[j]; and that's it! Nothing fancy. Just a syntax shorthand for updating a bunch of adjacent vector elements.
 Derek Parnell wrote:
 
 Hi all,
 I'm trying to implement a structure in which array copying would be
 useful. I expected to be able to overload this operation with a variety of
 opSlice().
 
 We can have opIndex() as both lvalue and rvalue, but this doesn't seem to
 apply for opSlice.
 
 Is there a good reason for this?
 
 I'm now doing this operation via a roll-your-own function, but overloading
 would be a bit more consistent.
My methods currently looks a lot like the longhand above ... void opSlice(int i, int j, char[] x) { for (int k = 0; i < j; i++,k++) { if ((i >= 0) && (i < Elem.length) && (k < x.length)) Elem[i]( cast(int)x[k] ); }; } in anticipation of being able to write ... Foo[i..j] = x; whereas I now write ... Foo.opSlice(i, j, x); -- Derek Melbourne, Australia
Jul 07 2004
next sibling parent reply Stephen Waits <steve waits.net> writes:
Derek wrote:
 Excuse me for being a bit blunt, but "killing performance" is just overly
 dramatic. Is not "performance" a relative thing...something in the eye of
 the beholder? If the end user isn't impacted by an application's speed, who
 cares? 
Who's really being dramatic here? "killing performance" is the truth.
 My needs are obviously different to your needs. And that's just fine. I
 don't tend to write CPU-intensive applications. There is no way that D's
 slicing operation is "killing" my apps. 
Exactly - your needs. However, there are lots of people who care, or even require, that generated code be as efficient as possible. Remember, this is a general purpose language - not a language just for those who don't care about multi-dimensional array performance. --Steve
Jul 07 2004
parent Derek <derek psyc.ward> writes:
On Wed, 07 Jul 2004 10:37:19 -0700, Stephen Waits wrote:

 Derek wrote:
 Excuse me for being a bit blunt, but "killing performance" is just overly
 dramatic. Is not "performance" a relative thing...something in the eye of
 the beholder? If the end user isn't impacted by an application's speed, who
 cares? 
Who's really being dramatic here? "killing performance" is the truth.
In which case I have a magic machine on my desk, because D is *not* killing it's performance. To be more precise, I am not being prevented from using this machine to the extent that I need to, by D's executables.
 My needs are obviously different to your needs. And that's just fine. I
 don't tend to write CPU-intensive applications. There is no way that D's
 slicing operation is "killing" my apps. 
Exactly - your needs. However, there are lots of people who care, or even require, that generated code be as efficient as possible.
I'm sorry. I didn't mean to imply that I didn't care about "efficient as possible". Of course a D compiler should produce efficient code. And in additional to that, D's current vector slicing is already efficient enough for me. True, it may not be efficient enough for other people, and thus a different implementation may be required. I have no problems with that either. The implication, as I read it, in "killing performance" is that *every* instance of the use of D's current slicing is a 'bad thing'. It is with that implication that I have an issue with. For I simply have no evidence of that. In fact I have evidence of the contrary view, namely that my machine is not 'dying' under the load.
 Remember, this is a general purpose language - not a language just for 
 those who don't care about multi-dimensional array performance.
And yet, it sounds like some are asking for special-purpose multi-dimensional array operations. I've been writing business/financial applications for 30 years and have yet to need special-purpose multi-dimensional array operations. It seems that I travel in a different world. I do not care whether D implements multi-dimensional array operations or not. I neither argue for them nor against them. It has no impact on the types of applications that I need to write. If Walter has time to do it, then fine! I changes my world not one iota. I was afraid that I wouldn't be very clear, and thus be misunderstood. I was really just asking for a coding shorthand that would 'appear' like a sliced lvalue. That's all. -- Derek Melbourne, Australia
Jul 07 2004
prev sibling next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
The result of my thinking about this mail can be seen in the newly started
thread. Here just a few comments

Derek wrote:

 On Wed, 07 Jul 2004 09:14:32 +0200, Norbert Nemec wrote:
 Once one moves into multi-dimensional array operations, eg. maxtrix; a
 vector of vectors, we need a new syntax anyway. opSlice can remain the
 same (a vector slice) and D can have some new opXXX function for 'matrix
 slicing'.
No need for that. As you can see in the other message everything fits nicely into one concept.
 and furthermore it collides badly with the concept vectorization That is
 hopefully coming into the language in the future. opIndex is not a
 problem, as the concept of indexing itself already breaks vectorization.
Hmmmm..."collides"? Are there not two different paradigms at work here? Both just as valid as each other? How is this a collision?
Forget my words. By now, my concept of vector expression has evolved to a point, where the collision is gone.
 Implementing slicing via opSlice, though, would always mean to create a
 temporary copy of the data which is killing performance.
Excuse me for being a bit blunt, but "killing performance" is just overly dramatic. Is not "performance" a relative thing...something in the eye of the beholder? If the end user isn't impacted by an application's speed, who cares? My needs are obviously different to your needs. And that's just fine. I don't tend to write CPU-intensive applications. There is no way that D's slicing operation is "killing" my apps.
When I talk about performance, I think about high-performance numerics. This is the area where the design of a language is most crucial. C++ for example is inherently slower than Fortran (unless you pull out some nasty tricks) because the language gets in the way of optimizing. What I am concerned about, is to avoid that kind of problems in D. Numerics are important in science (which is my field) but also in computer graphics and gaming. Anyone working in one of these fields will agree that array handling is too important to go for some quick solution. Of course, I might be a bit overly concerned. It already turns out that the solutions get simpler and simpler the more I think about it all. In the case at hand, I just realized how things could work together that you have all the flexibility you need for user-defined types, and all the performance you want for native arrays.
 Slice assignment though would - if overloadable at all
 - need a completely new concept.
Yep! But not such a totally "new" concept, as its old hat in Basic and lots of other languages already! MID(myData, 4, 5) = "Derek" or in a new D ... myData[4..9] = "Derek"; Not too big a stretch of the imagination, eh?
True.
 Its just bloody shorthand for ...
 
   myData[4] = 'D';
   myData[5] = 'e';
   myData[6] = 'r';
   myData[7] = 'e';
   myData[8] = 'k';
 
 or ...
 
   source = "Derek";
   for(i=4, j=0; j < source.length; i++,j++)
      myData[i] = source[j];
Not true. OK, I confess that this is, what the specs say, but here I strongly disagree: The important detail about a vector assignment is, that it does not specify the order of the assignments. This is crucial for allowing the compiler to optimize aggressively. If you write a loop, the compiler might be intelligent enough in some cases to realize that the iterations are independent and can be swapped. In general, this can not always be determined, so the compiler often has to be overly careful, even though it would actually be possible to swap iterations. Using a vector expression, the programmer indicates clearly, that the order does not matter, allowing the compiler to swap around. For example, the two statements myDataA[2..5] = myDataA[3..6]; myDataB[3..6] = myDataB[2..5]; both give undetermined results. They are legal (the compiler can, in general, not detect the overlap) but it is the programmers problem that the outcome is undefined. As I said, the specs say something different: following them, the first would be correct, the second garbage. But in this point, the specs should definitely be corrected. B.t.w: This would be a case, where compiler-warnings would be reasonable: if the compiler finds it, it clearly is a mistake by the programmer, but since the compiler cannot find it in general, it cannot be made an error.
Jul 07 2004
parent reply Derek <derek psyc.ward> writes:
On Wed, 07 Jul 2004 23:09:27 +0200, Norbert Nemec wrote:


[snip]

 When I talk about performance, I think about high-performance numerics. This
 is the area where the design of a language is most crucial. C++ for example
 is inherently slower than Fortran (unless you pull out some nasty tricks)
 because the language gets in the way of optimizing. What I am concerned
 about, is to avoid that kind of problems in D. Numerics are important in
 science (which is my field) but also in computer graphics and gaming.
 Anyone working in one of these fields will agree that array handling is too
 important to go for some quick solution.
 
Ok, so you are saying that D generates sub-optimal code for these sorts of applications. So don't use D! Use assembler or some specialized programming language. If D is to be a general purpose language, one should expect that it will not specialize in every programming domain. [snip]
    myData[4..9] = "Derek";
 
 Not too big a stretch of the imagination, eh?
True.
 Its just bloody shorthand for ...
 
   myData[4] = 'D';
   myData[5] = 'e';
   myData[6] = 'r';
   myData[7] = 'e';
   myData[8] = 'k';
 
 or ...
 
   source = "Derek";
   for(i=4, j=0; j < source.length; i++,j++)
      myData[i] = source[j];
Not true. OK, I confess that this is, what the specs say, but here I strongly disagree:
Here I see a fundamental difference in the way we are approaching this issue. I'm looking at the results, and you are looking at the process. If I were to reword my example above (with a pedantic streak)... Its just bloody shorthand that ensures that after the operation is complete ... myData[4] ends up with the value 'D' myData[5] ends up with the value 'e' myData[6] ends up with the value 'r' myData[7] ends up with the value 'e' myData[8] ends up with the value 'k' And I don't really care how the magic happens.
 The important detail about a vector assignment is,
I'm not talking about "vector assignment". I just want a specific subset of elements to end up with the values I need them to be. Vector assignment is one of many methods to achieve that end.
 that it does not specify
 the order of the assignments. This is crucial for allowing the compiler to
 optimize aggressively. If you write a loop, the compiler might be
 intelligent enough in some cases to realize that the iterations are
 independent and can be swapped. In general, this can not always be
 determined, so the compiler often has to be overly careful, even though it
 would actually be possible to swap iterations.
 
 Using a vector expression, the programmer indicates clearly, that the order
 does not matter, allowing the compiler to swap around.
Fine! Whatever! So long as I end up with the values in the right order, I don't care. If I really needed Ferrarri fast code, I'd drop into assembler. The Porche speed of D generated code is okay for me. -- Derek Melbourne, Australia
Jul 07 2004
next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Thu, 8 Jul 2004 09:26:19 +1000, Derek <derek psyc.ward> wrote:

[snip]
 If I really needed Ferrarri fast code, I'd drop into assembler. The 
 Porche
 speed of D generated code is okay for me.
[snip] Is a "Porche" a fancy french porch? My porch is attached to my house, and it's not fast at all ;o) Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 07 2004
prev sibling next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Derek wrote:

 On Wed, 07 Jul 2004 23:09:27 +0200, Norbert Nemec wrote:
 Ok, so you are saying that D generates sub-optimal code for these sorts of
 applications. So don't use D! Use assembler or some specialized
 programming language. If D is to be a general purpose language, one should
 expect that it will not specialize in every programming domain.
I think, this is just for the D-community and Walter to decide: if I would feel alone on my quest to bring high-performance-numerics into D, I'd give quite soon. But I have the feeling that there is general interest for this. Obviously numerics is not so exotic after all. The term "general purpose language" does not mean that the language should not try to excel in any special area. It just means that it should not do so on cost of the others. The amount of time, effort and complexity put into special areas is then just up to the personal interest of the language developers. In this case: Walter.
 If I really needed Ferrarri fast code, I'd drop into assembler. The Porche
 speed of D generated code is okay for me.
Assembler is not the way to go for really high-performance-code. The performace of modern processors depends highly on putting the instructions in the right order. Doing something like this in assembler would give complete spaghetti code. A good compiler for a good language can do much better than any programmer could do in a reasonable amount of time.
Jul 07 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ccioo4$dj6$1 digitaldaemon.com>, Norbert Nemec says...

Assembler is not the way to go for really high-performance-code. The
performace of modern processors depends highly on putting the instructions
in the right order. Doing something like this in assembler would give
complete spaghetti code. A good compiler for a good language can do much
better than any programmer could do in a reasonable amount of time.
Forgive my language, but I can't figure how better to express this... Bollocks! I can do better in assembler than any C, C++ or D compiler I have ever met. I think you're going to need some hard evidence to back up the above assertion. Either that, or maybe set a competition - assember-written-by-human versus assembler-compiled-from-C/C++/D-written-by human. But what would we get as a prize? Jill
Jul 08 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Arcane Jill wrote:

 In article <ccioo4$dj6$1 digitaldaemon.com>, Norbert Nemec says...
 
Assembler is not the way to go for really high-performance-code. The
performace of modern processors depends highly on putting the instructions
in the right order. Doing something like this in assembler would give
complete spaghetti code. A good compiler for a good language can do much
better than any programmer could do in a reasonable amount of time.
Forgive my language, but I can't figure how better to express this... Bollocks! I can do better in assembler than any C, C++ or D compiler I have ever met. I think you're going to need some hard evidence to back up the above assertion. Either that, or maybe set a competition - assember-written-by-human versus assembler-compiled-from-C/C++/D-written-by human. But what would we get as a prize?
I can show you a piece of C-code I wrote for a class in High-Performance-Computing: red-black-relaxation. We started out with a fairly simple algorithm: Four nested loops iterating over the same array over and over again. Over the course of one semester, we improved this code step by reordering, interlacing and partially unrolling the loops. The algorithm stayed the same, only the ordering of the commands was changed. In the end, the code (initially ~20 simple lines) had grown to several hundred lines of highly complex code, running roughly 10 times as fast. Handling this kind of code in C is complex but possible, in Assembler, you could probably forget about it. A good compiler for a well-designed language would have been able to do all the steps of instruction reordering automatically and intelligently. In C/C++/D, this is not possible yet, because the language is not designed for it. Currently, such languages and compilers are still in the experimental stage (see www.sac-home.org, for example). My intention is to bring the crucial language elements into D now, so that tomorrows compilers have a chance to optimize. Of course, this should happen without sacrificing the general qualities of D. If you are interested in seeing the code, I can try to dig it up and bring it to a presentable shape. Will take some time, though.
Jul 08 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ccivu1$pgc$1 digitaldaemon.com>, Norbert Nemec says...

If you are interested in seeing the code, I can try to dig it up and bring
it to a presentable shape. Will take some time, though.
Well, there's "interested" and there's "having time". Personally I would *love* to take on a head-to-head optimization challenge - my assembler versus your compiled C. But, I am just too busy right now. I have things to do that are more important than my pride. But one day, when I'm less busy (Free time? What's that?), I'll take you up on that challenge. And to make it interesting, the loser buys the winner a Guinness, yes? Jill
Jul 08 2004
next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Arcane Jill wrote:

 In article <ccivu1$pgc$1 digitaldaemon.com>, Norbert Nemec says...
 
If you are interested in seeing the code, I can try to dig it up and bring
it to a presentable shape. Will take some time, though.
Well, there's "interested" and there's "having time". Personally I would *love* to take on a head-to-head optimization challenge - my assembler versus your compiled C. But, I am just too busy right now. I have things to do that are more important than my pride. But one day, when I'm less busy (Free time? What's that?), I'll take you up on that challenge. And to make it interesting, the loser buys the winner a Guinness, yes?
I think we can save that time by giving some numbers: The program we were optimizing used a fixed number of floating point operations per run. For a given size of the data, say, 10 Million floating point additions/multiplications, etc. No matter what optimization, these operations had to be done. (Different algorithms might have needed less, but the task was to implement this algorithm) The theoretical speed limit of a processor can be given in MFLOPS (Million Floating Point Operations Per Second) The processor we were using back then hat it had roughly 1000 MFLOPS maximum performance. Therefore, it is clear that - no matter what - the program will run at least 10 seconds. The only problem is, to come as near to that limit as possible. The original version achieved a performance of ~50 MFLOPS, therefore exploiting roughly 5% of the potential of the processor. This is a typical value for naively written C code, using simple nested loops running over data that is much larger than the cache. 10Mio. ops at 50 MFLOPS gives 200sec running time. In the course of the semester, we hand-optimized this code up to ~600 MFLOPS, i.e. exploiting ~60% of the processors capabilities. This is about the limit that can be achieved for data-intensive real-world calculations. (programs on small data-sets that fits into the L1-Cache can often get near the theoretical limit, but they are rather academic) Now, 600MFLOPS means ~17seconds run-time. 10 seconds of this were used practically, 7 were wasted somewhere. A hardware-profiler showed where the time was lost: * integer-operations and flow-control are negligible. They happen in a separate unit of the CPU, so they are effectively happening in parallel to the productive floating-point operations. * very little time was lost by pipeline stalls (the operations were very well interlaced) * very little time was lost on branch-mispredictions (the loops were so simply structured that the prediction unit of the CPU had no problem) * the vast amount of the lost time was still spend on cache misses. Even though we did everything to exploit the cache as much as possible, we still had some bottleneck left there. Rewriting the code in assembler might perhaps give you an increase of a few more percent, but the only way to really go beyond what we already achieved would be to further optimize the locality of the data access making more use of the cache. And this really means that doing the whole thing in Assembler would get more and more complicated. As a last word about the topic: everything I said here is not special to the algorithm we were looking at, but it is the daily bread of numerics. "Numerics" means that there are three possible bottlenecks * The floating point unit itself * Pipeline-stalls * Data access. The first can only be overcome by mathematics searching for different algorithms. The second one means that fine-grained instruction reordering is important, the third one, that instructions should be reordered on a larger scale. None of these problems can be solved by using hand-coded assembler (except in rare, very simple cases). Using plain C cleverly is possible but ugly. In the long run, optimizing compilers for well-designed languages might take this ugly job from the user.
Jul 08 2004
parent Stephen Waits <steve waits.net> writes:
Norbert Nemec wrote:
 
 Rewriting the code in assembler might perhaps give you an increase of a few
 more percent, but the only way to really go beyond what we already achieved
 would be to further optimize the locality of the data access making more
 use of the cache. And this really means that doing the whole thing in
 Assembler would get more and more complicated.
You hit the nail on the head here. My experience in game programming has been exactly as this. Typically we may have a few, very small, inner loops hand coded in Assembler by the time a game is finished. Cache misses, and hence painfully slow memory access (relatively), always seems to end up as our bottleneck (on a main CPU anyway). --Steve
Jul 08 2004
prev sibling parent Stephen Waits <steve waits.net> writes:
Arcane Jill wrote:
 
 Well, there's "interested" and there's "having time". Personally I would *love*
 to take on a head-to-head optimization challenge - my assembler versus your
 compiled C. But, I am just too busy right now. I have things to do that are
more
 important than my pride.
It's one thing to code a few small routines in assembler. It's another that the language can deal with it well enough that it really makes a difference. For instance, you may still have temporaries floating around, or, because you coded a few small atomic operations in assembler, you may be giving up opportunity for reduction, and combination of those operations. Assembler, though it may help, is not the solution to the above mentioned issues. --Steve
Jul 08 2004
prev sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <miydn03kpeka$.4mtnnmprlmi5.dlg 40tude.net>, Derek says...

If I were to reword my example above (with a pedantic streak)...

Its just bloody shorthand that ensures that after the operation is complete
... 

   myData[4] ends up with the value 'D'
   myData[5] ends up with the value 'e'
   myData[6] ends up with the value 'r'
   myData[7] ends up with the value 'e'
   myData[8] ends up with the value 'k'

And I don't really care how the magic happens.
Pretty much every language on the planet can do this. D can do this too. The statement: does actually compile and execute correctly. But what we DON'T have is the ability to overload that expression. I don't get why adding an opSliceAssign() overload would be a problem though. Why does anyone think it would be a problem? Arcane Jill
Jul 08 2004
next sibling parent Derek Parnell <derek psych.ward> writes:
On Thu, 8 Jul 2004 07:14:00 +0000 (UTC), Arcane Jill wrote:

 In article <miydn03kpeka$.4mtnnmprlmi5.dlg 40tude.net>, Derek says...
 
If I were to reword my example above (with a pedantic streak)...

Its just bloody shorthand that ensures that after the operation is complete
... 

   myData[4] ends up with the value 'D'
   myData[5] ends up with the value 'e'
   myData[6] ends up with the value 'r'
   myData[7] ends up with the value 'e'
   myData[8] ends up with the value 'k'

And I don't really care how the magic happens.
Pretty much every language on the planet can do this. D can do this too. The statement: does actually compile and execute correctly. But what we DON'T have is the ability to overload that expression.
That's why I started this thread. I assumed it was overloadable but soon found out otherwise. It's only an operator overload which means we can use function call syntax instead. That is what I'm doing now, BTW. void opSlice(int i, int j, whatever x) { . . . } and using it thus... Foo.opSlice(i,j,x); instead of the more obvious Foo[i,j] = x;
 I don't get why adding an opSliceAssign() overload would be a problem though.
 Why does anyone think it would be a problem?
Beats me! -- Derek Melbourne, Australia 8/Jul/04 5:28:45 PM
Jul 08 2004
prev sibling next sibling parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Arcane Jill wrote:

 I don't get why adding an opSliceAssign() overload would be a problem
 though. Why does anyone think it would be a problem?
I just thought it was, but I changed my mind after I found that idea of opPartialIndex. See the other thread about the full concept.
Jul 08 2004
prev sibling next sibling parent Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:

 I don't get why adding an opSliceAssign() overload would be a problem though.
 Why does anyone think it would be a problem?
Agree with you, I don't see any problem at all. The current situation strikes me as a borderline-bug. Sam
Jul 08 2004
prev sibling parent Charles Hixson <charleshixsn earthlink.net> writes:
Arcane Jill wrote:
 In article <miydn03kpeka$.4mtnnmprlmi5.dlg 40tude.net>, Derek says...
 
 
If I were to reword my example above (with a pedantic streak)...
..
Pretty much every language on the planet can do this. D can do this too. The statement: does actually compile and execute correctly. But what we DON'T have is the ability to overload that expression. I don't get why adding an opSliceAssign() overload would be a problem though. Why does anyone think it would be a problem? Arcane Jill
What if you don't have integers as your index? What I'd like is an extension of what you are proposing, + an extension of what a range is (i.e., allow range to be overloaded for different types of arguments). So that one would be able to define: Note that here myData would be defined as DType[Name[]] myData; and it would be Name that was required to define what a range meant. And Name would need to have a special access method (or constructor?) that took a string as a value. This may be a bit of a stretch beyond what we would want to add before D 1.0 comes out, but might be reasonable for consideration for D 2.0. Access method vs. constructor: I tend to think of this as being most useful when one already has a list that contains the indicies that one is attempting to access, so this could just be syntactic as an access method. P.S.: What I'm REALLY thinking of...sort of, is a B+Tree or B*Tree that's accessed as if it were an array. But I want the tree to have keys of any comparable type, not just strings or integers.
Jul 08 2004
prev sibling parent Charles Hixson <charleshixsn earthlink.net> writes:
Derek wrote:
 On Wed, 07 Jul 2004 09:14:32 +0200, Norbert Nemec wrote:
 
 
I don't know about Walters reasoning in that respect, but I can state my
position:

I think that even the opSlice that is there should be used with care. As I
see it, there is no way to cleanly extend it to multiple dimensions
So what? Single dimension is just fine for my needs. As far as I can see, a slice operation is one that is only done on a vector (single dimension array). To me is just a way of specifying a subset of vector elements. Once one moves into multi-dimensional array operations, eg. maxtrix; a vector of vectors, we need a new syntax anyway. opSlice can remain the same (a vector slice) and D can have some new opXXX function for 'matrix slicing'. ..... My methods currently looks a lot like the longhand above ... void opSlice(int i, int j, char[] x) { for (int k = 0; i < j; i++,k++) { if ((i >= 0) && (i < Elem.length) && (k < x.length)) Elem[i]( cast(int)x[k] ); }; } in anticipation of being able to write ... Foo[i..j] = x; whereas I now write ... Foo.opSlice(i, j, x);
Perhaps this is primarily an argument asserting that [] should be a more flexibly overloadable operator? Perhaps one with a Template? So, e.g., a Hash could be indexed with something like: x = myHash["Betty".."Bill"] which would definitely require an overloading of the [] operation, as hashes don't usually have any sensibly defined ordering. (And if you want that, a tree is a better data structure.) I will admit to a fond rememberance of Fortran multi-dimensional array subscripting, which I still consider superior for most purposes to the C version. But Fortran never dealt with ranges, or user defined types being used as indexes (or even strings as indexes). So something new is needed to accomplish what is intended. If tuples were a part of D, I would suggest that array indicies be implicit tuples of ranges. I.e., that every index be implicitly converted to a tuple, even if only of one element. And that each tuple member be implicitly a range, even if the start and finish were the same. This would product a kind of orthogonal simplicity in thinking about what is intended here, and simplify the parsing. Thus one could have: Foo[i..j, k..l, m..n] = x And (using "(" as a tuple marker, this would be understood as: Foo[(i..j), (k..l), (m..n)] = x which would be (approx.) the same as: Foo[i..j][k..l][m..n] = x But note that for this to work a range would need to be a more general object, so that it could be specified how it would act over unexpected types, e.g., over strings. Or names. (We would want Betty..Bill to mean all of the people I'm thinking of between Betty and Bill, inclusive), this means that the strings would need to be references to some particular type of string that had a range operation defined over itself.) Ranges: Integers is a relatively simple case of range, and even there the documentation is quite confusing. I don't understand why x[1..3] should mean (x[1], x[2]). That isn't three elements long. It doesn't end with the third index of x (which would be x[3]. All I can think of is that in this case we have decided to use 1 based indexing rather than 0 based indexing, which seems quite strange.
Jul 08 2004