www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - My ACCU 2016 keynote video available online

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Uses D for examples, showcases Design by Introspection, and rediscovers 
a fast partition routine. It was quite well received. 
https://www.youtube.com/watch?v=AxnotgLql0k

Andrei
May 16 2016
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/16/2016 6:46 AM, Andrei Alexandrescu wrote:
 Uses D for examples, showcases Design by Introspection, and rediscovers a fast
 partition routine. It was quite well received.
 https://www.youtube.com/watch?v=AxnotgLql0k
https://www.reddit.com/r/programming/comments/4jlkhv/accu_2016_keynote_by_andrei_alexandrescu/
May 16 2016
prev sibling next sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Monday, 16 May 2016 at 13:46:11 UTC, Andrei Alexandrescu wrote:
 Uses D for examples, showcases Design by Introspection, and 
 rediscovers a fast partition routine. It was quite well 
 received. https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
Great! Your talks are always pushedFront in my view queue 😊 Did you manage to recruit any new D liutenants 😉
May 16 2016
prev sibling next sibling parent reply QAston <qaston gmail.com> writes:
On Monday, 16 May 2016 at 13:46:11 UTC, Andrei Alexandrescu wrote:
 Uses D for examples, showcases Design by Introspection, and 
 rediscovers a fast partition routine. It was quite well 
 received. https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
Funny, useful, advertises the best parts of D very well.
May 16 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/16/2016 05:45 PM, QAston wrote:
 On Monday, 16 May 2016 at 13:46:11 UTC, Andrei Alexandrescu wrote:
 Uses D for examples, showcases Design by Introspection, and
 rediscovers a fast partition routine. It was quite well received.
 https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
Funny, useful, advertises the best parts of D very well.
Thanks. And of course on reddit there's the Slaves Chorus of Nabucco chiming in right on cue how it could be done in C++. -- Andrei
May 17 2016
prev sibling next sibling parent reply Manu via Digitalmars-d-announce <digitalmars-d-announce puremagic.com> writes:
On 16 May 2016 at 23:46, Andrei Alexandrescu via
Digitalmars-d-announce <digitalmars-d-announce puremagic.com> wrote:
 Uses D for examples, showcases Design by Introspection, and rediscovers a
 fast partition routine. It was quite well received.
 https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
This isn't the one you said you were going to "destroy concepts" is it? At dconf, you mentioned a talk for release on some date I can't remember, and I got the impression you were going to show how C++'s concepts proposal was broken, and ideally, propose how we can nail it in D?
May 18 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/18/16 7:42 AM, Manu via Digitalmars-d-announce wrote:
 On 16 May 2016 at 23:46, Andrei Alexandrescu via
 Digitalmars-d-announce <digitalmars-d-announce puremagic.com> wrote:
 Uses D for examples, showcases Design by Introspection, and rediscovers a
 fast partition routine. It was quite well received.
 https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
This isn't the one you said you were going to "destroy concepts" is it? At dconf, you mentioned a talk for release on some date I can't remember, and I got the impression you were going to show how C++'s concepts proposal was broken, and ideally, propose how we can nail it in D?
That's the one - sorry to disappoint :o). -- Andrei
May 19 2016
parent reply Manu via Digitalmars-d-announce <digitalmars-d-announce puremagic.com> writes:
On 19 May 2016 at 22:10, Andrei Alexandrescu via
Digitalmars-d-announce <digitalmars-d-announce puremagic.com> wrote:
 On 5/18/16 7:42 AM, Manu via Digitalmars-d-announce wrote:
 On 16 May 2016 at 23:46, Andrei Alexandrescu via
 Digitalmars-d-announce <digitalmars-d-announce puremagic.com> wrote:
 Uses D for examples, showcases Design by Introspection, and rediscovers a
 fast partition routine. It was quite well received.
 https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
This isn't the one you said you were going to "destroy concepts" is it? At dconf, you mentioned a talk for release on some date I can't remember, and I got the impression you were going to show how C++'s concepts proposal was broken, and ideally, propose how we can nail it in D?
That's the one - sorry to disappoint :o). -- Andrei
Ah. Okay, well while this is a very interesting talk, I was indeed hoping you were going to make a D concepts proposal... do you have such a thing in mind, or are you against concepts in D?
May 19 2016
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
 Ah. Okay, well while this is a very interesting talk, I was indeed
 hoping you were going to make a D concepts proposal... do you have
 such a thing in mind, or are you against concepts in D?
D has constraints. No point in adding concepts.
May 20 2016
next sibling parent reply "H. S. Teoh via Digitalmars-d-announce" writes:
On Fri, May 20, 2016 at 01:26:31AM -0700, Walter Bright via
Digitalmars-d-announce wrote:
 On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
 Ah. Okay, well while this is a very interesting talk, I was indeed
 hoping you were going to make a D concepts proposal... do you have
 such a thing in mind, or are you against concepts in D?
D has constraints. No point in adding concepts.
Constraints have been a headache when it comes to generating user-friendly diagnostics. Not to mention inconsistency in what exactly is being tested for: if you want to check if something is an input range, do you use is(typeof(R.empty)), etc., or should you use __traits(compiles, R.init.empty), or is it is(typeof((R r){r.empty;})), or any of the 15 or so slightly different ways of testing for the existence and type of some aggregate member, all subtly different from each other? Subtly different as in, for instance, testing for is(typeof((){R r; bool x = r.empty;})) is different from is(typeof(R r){bool x = r.empty;}), because the former doesn't work with R that has parameters closing over a local scope, whereas the latter does. And none of this has even begun addressing the issue of human-readable diagnostics. Whereas if D had concepts, it would have been a simple matter of defining the prototypical range with struct-like syntax and calling it a day. (I still think sig constraints are awesome, though. Just not to the point of replacing concepts. I don't see them as being mutually exclusive. In fact, they would complement each other quite nicely.) T -- Not all rumours are as misleading as this one.
May 20 2016
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/20/2016 09:47 AM, H. S. Teoh via Digitalmars-d-announce wrote:
 Constraints have been a headache when it comes to generating
 user-friendly diagnostics.
This is a matter that can be addressed tactically. Every time we discuss this, a couple of good ideas come up, they don't get worked on, then they get forgotten, and a couple months later the discussion is reopened anew. This is inefficient. Who would want to sit down and work on this? Andrei
May 20 2016
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/20/2016 6:47 AM, H. S. Teoh via Digitalmars-d-announce wrote:
 Not to mention inconsistency in what exactly
 is being tested for: if you want to check if something is an input
 range, do you use is(typeof(R.empty)), etc., or should you use
 __traits(compiles, R.init.empty), or is it is(typeof((R r){r.empty;})),
 or any of the 15 or so slightly different ways of testing for the
 existence and type of some aggregate member, all subtly different from
 each other? Subtly different as in, for instance, testing for
 is(typeof((){R r; bool x = r.empty;})) is different from is(typeof(R
 r){bool x = r.empty;}), because the former doesn't work with R that has
 parameters closing over a local scope, whereas the latter does.
That is not a problem with constraints, it's a result of a dozen people adding constraints in an uncoordinated manner. I.e. it's a library problem.
 Whereas if D had concepts, it would have been a simple
 matter of defining the prototypical range with struct-like syntax and
 calling it a day.
That really isn't good enough. Constraints can address behavior and relationships, concepts do not.
May 20 2016
parent Leontien Smaal <LeontienSmaal inbound.plus> writes:
On Friday, 20 May 2016 at 19:34:11 UTC, Walter Bright wrote:
 Constraints can address behavior and relationships, concepts do 
 not.
Wow, TIL. That's so clear once said ! There's been several discussion here and even one phobos PR that proposes a kind of concepts but I didn't realize before that the 2 things are different. The problem I see in D is that the constraints, since they prevent to output a good message, are doing both (in a way): void foo(T)(T t) if (constraint) { // cannot have the message if constraint fails... static assert((checkConcept!T).ok, (checkConcept!T).message); } At the language level it would work void foo(T)(T t) Concept(CheckerTemplate) // use this to output a smart message if (constraint) { } But really, without changing much what I'd like to see is a DMD feature that would parse and evaluate the constraints to output a message: such as void foo(T)(T t) if ((a || b) && (a || b)) { }
 error:(a || b) is false
instead of throwing the whole constraint text in the output.
May 21 2016
prev sibling parent reply Manu via Digitalmars-d-announce <digitalmars-d-announce puremagic.com> writes:
On 20 May 2016 at 18:26, Walter Bright via Digitalmars-d-announce
<digitalmars-d-announce puremagic.com> wrote:
 On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
 Ah. Okay, well while this is a very interesting talk, I was indeed
 hoping you were going to make a D concepts proposal... do you have
 such a thing in mind, or are you against concepts in D?
D has constraints. No point in adding concepts.
I really struggle to agree. Constraints are a good first-step in that direction, but they're unwieldy, produce the worst looking function signatures (read: documentation) of literally any language ever conceived, relatively awkward error feedback, and very quickly get out of hand if you have many variations of possible constraints.
May 21 2016
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 21 May 2016 at 08:45:45 UTC, Manu wrote:
 Constraints are a good first-step in that direction, but 
 they're unwieldy, produce the worst looking function signatures 
 (read: documentation) of literally any language ever conceived, 
 relatively awkward error feedback, and very quickly get out of 
 hand if you have many variations of possible constraints.
Constraints are enough for simple matters. As soon as they are used to distinguish between many overloads with complicated relationships they become slightly crude. However this can be worked around with having static ifs and static asserts(0). I find myself just one variadic template with a lot of static ifs branches.
May 21 2016
parent Manu via Digitalmars-d-announce <digitalmars-d-announce puremagic.com> writes:
On 21 May 2016 at 19:55, Stefan Koch via Digitalmars-d-announce
<digitalmars-d-announce puremagic.com> wrote:
 On Saturday, 21 May 2016 at 08:45:45 UTC, Manu wrote:
 Constraints are a good first-step in that direction, but they're unwieldy,
 produce the worst looking function signatures (read: documentation) of
 literally any language ever conceived, relatively awkward error feedback,
 and very quickly get out of hand if you have many variations of possible
 constraints.
Constraints are enough for simple matters. As soon as they are used to distinguish between many overloads with complicated relationships they become slightly crude. However this can be worked around with having static ifs and static asserts(0). I find myself just one variadic template with a lot of static ifs branches.
I also find myself taking that route when I run into cases where there are numerous constraint combinations. It's better in some ways, but worse in others. I think it simplifies API's, but at the same time, it removes information from the API, and takes away a lot of documentation.
May 21 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/21/2016 04:45 AM, Manu via Digitalmars-d-announce wrote:
 On 20 May 2016 at 18:26, Walter Bright via Digitalmars-d-announce
 <digitalmars-d-announce puremagic.com> wrote:
 On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
 Ah. Okay, well while this is a very interesting talk, I was indeed
 hoping you were going to make a D concepts proposal... do you have
 such a thing in mind, or are you against concepts in D?
D has constraints. No point in adding concepts.
I really struggle to agree. Constraints are a good first-step in that direction, but they're unwieldy, produce the worst looking function signatures (read: documentation) of literally any language ever conceived, relatively awkward error feedback, and very quickly get out of hand if you have many variations of possible constraints.
I guess a lot more detail would be necessary here. A bunch of good folks (at least better than me) have worked for over a decade on C++ concepts and three (three!) proposals later it's still unclear whether they're a good idea. -- Andrei
May 21 2016
parent reply Manu via Digitalmars-d-announce <digitalmars-d-announce puremagic.com> writes:
On 21 May 2016 at 23:20, Andrei Alexandrescu via
Digitalmars-d-announce <digitalmars-d-announce puremagic.com> wrote:
 On 05/21/2016 04:45 AM, Manu via Digitalmars-d-announce wrote:
 On 20 May 2016 at 18:26, Walter Bright via Digitalmars-d-announce
 <digitalmars-d-announce puremagic.com> wrote:
 On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
 Ah. Okay, well while this is a very interesting talk, I was indeed
 hoping you were going to make a D concepts proposal... do you have
 such a thing in mind, or are you against concepts in D?
D has constraints. No point in adding concepts.
I really struggle to agree. Constraints are a good first-step in that direction, but they're unwieldy, produce the worst looking function signatures (read: documentation) of literally any language ever conceived, relatively awkward error feedback, and very quickly get out of hand if you have many variations of possible constraints.
I guess a lot more detail would be necessary here. A bunch of good folks (at least better than me) have worked for over a decade on C++ concepts and three (three!) proposals later it's still unclear whether they're a good idea. -- Andrei
I agree it's not clear to me either. I haven't seen any proposals for D. Have there been any? Constraints obviously work, but I do constantly find myself wishing there were something a little bit closer to the type system. I generally like where C++ is going. I think part of the problem with C++, as always, is that they are in a constant struggle to bolt things on to an ancient language with so many existing spiky angular edge cases that it always seems to become awkward. I think my biggest gripe with constraints in D though, is that they are quite pervasive when you start to produce generic API's, and the documentation becomes a massive problem. Every single person (seriously) I've ever introduced to D has struggled with the phobos docs as soon as constraints are presented. Many of us have raised this many times, and we've had a lot of discussion and various experiments with improving the documentation wrt existing constraints. I'm not sure we've doing a good documentation solution her, and I'm not even confident there's a good solution there. I just don't think it's a great way to express the problem. Concepts, or something like it, feels a lot more intuitive to me, and certainly lends much nicer to documentation and API presentation.
May 21 2016
parent Atila Neves <atila.neves gmail.com> writes:
On Saturday, 21 May 2016 at 13:51:11 UTC, Manu wrote:
 On 21 May 2016 at 23:20, Andrei Alexandrescu via 
 Digitalmars-d-announce <digitalmars-d-announce puremagic.com> 
 wrote:
 On 05/21/2016 04:45 AM, Manu via Digitalmars-d-announce wrote:
 [...]
I guess a lot more detail would be necessary here. A bunch of good folks (at least better than me) have worked for over a decade on C++ concepts and three (three!) proposals later it's still unclear whether they're a good idea. -- Andrei
I agree it's not clear to me either. I haven't seen any proposals for D. Have there been any?
Not quite the same but: https://github.com/dlang/phobos/pull/3677 https://wiki.dlang.org/DIP84#Implementation Atila
May 22 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/20/2016 02:50 AM, Manu via Digitalmars-d-announce wrote:
 Ah. Okay, well while this is a very interesting talk, I was indeed
 hoping you were going to make a D concepts proposal... do you have
 such a thing in mind, or are you against concepts in D?
I'm not "against" anything. I think we can improve contracts in many ways before we reach the conclusion that we need an overlapping language feature. I'd rather have one well executed feature rather than two half-done features. -- Andrei
May 20 2016
prev sibling next sibling parent reply Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Monday, 16 May 2016 at 13:46:11 UTC, Andrei Alexandrescu wrote:
 Uses D for examples, showcases Design by Introspection, and 
 rediscovers a fast partition routine. It was quite well 
 received. https://www.youtube.com/watch?v=AxnotgLql0k

 Andrei
Nice presentation. The code applying the sentinel optimization assumes mutability of the input. That needs to be checked for. That's fine for partition because that is assumed to be in-place. But for other algorithms it's not so obvious. It's sad that the optimization works only for non-const input. It is in conflict with the advice to make input const if the function doesn't change it. This makes the optimization less likely to be applicable. One might though relax the const requirement to mean "the input is identical at return of the function to its beginning". But that's a different story, I'll guess. Coming up with another implementation might also work, using chain or so. But typically the sentinel optimization assumes mutability. I didn't get the idea behind sentinels for sparse dot product. I picked the smallest of the last elements (so you need bidirectional ranges) and fix up as needed. For gdc I get a speedup (baseline over new implementation) of 1.2 in best case and >1.0 in worst case. On average it's about 1.1 I would say. I expected more. How would you approach sentinels with the sparse dot product. Can you elaborate the idea from the video? I didn't get it. The base line (dot1 in the graphs) is the straightforward version --- size_t i,j = 0; double sum = 0; while (i < a.length && j < b.length) { if (a[i].index < b[j].index) i++; else if (a[i].index > b[j].index) j++; else { assert(a[i].index == b[j].index); sum += a[i].value * b[j].value; i++; j++; } } return sum; --- BTW the effects vary greatly for different compilers. For example with dmd the optimized version is slowest. The baseline is best. Weird. With gdc the optimized is best and gdc's code is always faster than dmd's code. With ldc it's really strange. Slower than dmd. I assume I'm doing something wrong here. Used compiler flags dmd v2.071.0 -wi -dw -g -O -inline -release -noboundscheck gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0 -Wall -g -O3 -fomit-frame-pointer -finline-functions -frelease -fno-bounds-check -ffast-math ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1 -wi -dw -g -O3 -enable-inlining -release -boundscheck=off Am I missing some flags? I uploaded my plots. - running time https://www.scribd.com/doc/312951947/Running-Time - speed up https://www.scribd.com/doc/312951964/Speedup *Disclaimer* I hope most of this makes sense but take it with a grain of salt. Jens PS It seems the mailinglist interface does not work. I cannot send replies anymore via mail. I wrote Brad Roberts but no answer yet.
May 19 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/19/16 4:12 AM, Jens Müller wrote:
 The code applying the sentinel optimization assumes mutability of
 the input. That needs to be checked for.
Indeed. As I mentioned after discussing find, I didn't worry about those checks assuming they were obvious.
 That's fine for partition
 because that is assumed to be in-place. But for other algorithms it's
 not so obvious. It's sad that the optimization works only for
 non-const input.
Several optimizations only apply to mutable data. Others apply to immutable data. It's the way things go.
 I didn't get the idea behind sentinels for sparse dot product. I picked the
 smallest of the last elements (so you need bidirectional ranges) and fix
 up as
 needed. For gdc I get a speedup (baseline over new implementation) of
 1.2 in
 best case and >1.0 in worst case. On average it's about 1.1 I would say. I
 expected more. How would you approach sentinels with the sparse dot
 product. Can
 you elaborate the idea from the video? I didn't get it.
That's the idea - to only need to bounds check on one of the three possibilities. What test data did you use? 10%-20% win on dot product is significant because for many algorithms dot product is a kernel and dominates everything else. For those any win goes straight to the bottom line.
 The base line (dot1 in the graphs) is the straightforward version

 ---
 size_t i,j = 0;
 double sum = 0;
 while (i < a.length && j < b.length)
 {
      if (a[i].index < b[j].index) i++;
      else if (a[i].index > b[j].index) j++;
      else
      {
          assert(a[i].index == b[j].index);
          sum += a[i].value * b[j].value;
          i++;
          j++;
      }
 }
 return sum;
 ---
That does redundant checking. There's a better baseline: --- if (a.length == 0 || b.length == 0) return 0; const amax = a.length - 1, bmax = b.length - 1; size_t i,j = 0; double sum = 0; for (;;) { if (a[i].index < b[j].index) { if (i++ == amax) break; } else if (a[i].index > b[j].index) { bumpJ: if (j++ == bmax) break; } else { assert(a[i].index == b[j].index); sum += a[i].value * b[j].value; if (i++ == amax) break; goto bumpJ; } } return sum; --- Then if you add the sentinel you only need the bounds tests in the third case.
 BTW the effects vary greatly for different compilers.
 For example with dmd the optimized version is slowest. The baseline is
 best. Weird. With gdc the optimized is best and gdc's code is always
 faster than dmd's code. With ldc it's really strange. Slower than dmd. I
 assume I'm doing something wrong here.

 Used compiler flags
 dmd v2.071.0
 -wi -dw -g -O -inline -release -noboundscheck
 gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
 -Wall  -g -O3 -fomit-frame-pointer -finline-functions -frelease
 -fno-bounds-check -ffast-math
 ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
 -wi -dw -g -O3 -enable-inlining -release -boundscheck=off

 Am I missing some flags?
These look reasonable.
 I uploaded my plots.
 - running time https://www.scribd.com/doc/312951947/Running-Time
 - speed up https://www.scribd.com/doc/312951964/Speedup
What is dot2? Could you please redo the experiments with the modified code as well? Thanks! Andrei
May 19 2016
next sibling parent reply Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Thursday, 19 May 2016 at 12:04:31 UTC, Andrei Alexandrescu 
wrote:
 On 5/19/16 4:12 AM, Jens Müller wrote:
 What test data did you use?
An instance for benchmarking is generated as follows. Given nnz which is the sum of non-zero indices in input vector a and b. auto lengthA = uniform!"[]"(0, nnz, gen); auto lengthB = nnz - lengthA; auto a = iota(0, nnz).randomSample(lengthA, gen).map!(i => Pair(i, 10)).array(); auto b = iota(0, nnz).randomSample(lengthB, gen).map!(i => Pair(i, 10)).array(); So I take a random sample of (0, ..., nnz) for each input. Any better idea? I've seen that people generate vectors with larger gaps.
 10%-20% win on dot product is significant because for many 
 algorithms dot product is a kernel and dominates everything 
 else. For those any win goes straight to the bottom line.
Sure. Still I wasn't sure whether I got the idea from your talk. So maybe there is/was more.
 The base line (dot1 in the graphs) is the straightforward 
 version

 ---
 size_t i,j = 0;
 double sum = 0;
 while (i < a.length && j < b.length)
 {
      if (a[i].index < b[j].index) i++;
      else if (a[i].index > b[j].index) j++;
      else
      {
          assert(a[i].index == b[j].index);
          sum += a[i].value * b[j].value;
          i++;
          j++;
      }
 }
 return sum;
 ---
That does redundant checking. There's a better baseline: --- if (a.length == 0 || b.length == 0) return 0; const amax = a.length - 1, bmax = b.length - 1; size_t i,j = 0; double sum = 0; for (;;) { if (a[i].index < b[j].index) { if (i++ == amax) break; } else if (a[i].index > b[j].index) { bumpJ: if (j++ == bmax) break; } else { assert(a[i].index == b[j].index); sum += a[i].value * b[j].value; if (i++ == amax) break; goto bumpJ; } } return sum; ---
I check that.
 Then if you add the sentinel you only need the bounds tests in 
 the third case.
I post the sentinel code later. Probably there is something to improve there as well.
 BTW the effects vary greatly for different compilers.
 For example with dmd the optimized version is slowest. The 
 baseline is
 best. Weird. With gdc the optimized is best and gdc's code is 
 always
 faster than dmd's code. With ldc it's really strange. Slower 
 than dmd. I
 assume I'm doing something wrong here.

 Used compiler flags
 dmd v2.071.0
 -wi -dw -g -O -inline -release -noboundscheck
 gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
 -Wall  -g -O3 -fomit-frame-pointer -finline-functions -frelease
 -fno-bounds-check -ffast-math
 ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
 -wi -dw -g -O3 -enable-inlining -release -boundscheck=off

 Am I missing some flags?
These look reasonable.
But ldc looks so bad. Any comments from ldc users or developers? Because I see this in many other measurements as well. I would love to have another compiler producing efficient like gdc. For example what's equivalent to gdc's -ffast-math in ldc.
 I uploaded my plots.
 - running time 
 https://www.scribd.com/doc/312951947/Running-Time
 - speed up https://www.scribd.com/doc/312951964/Speedup
What is dot2? Could you please redo the experiments with the modified code as well?
dot2 is an optimization for jumping over gaps more quickly replacing the first two if statements with while statements. But my benchmark tests have no large gaps but interestingly it does make things worse. Jens
May 19 2016
next sibling parent reply David Nadlinger <code klickverbot.at> writes:
On Thursday, 19 May 2016 at 12:54:48 UTC, Jens Müller wrote:
 But ldc looks so bad.
 Any comments from ldc users or developers? Because I see this 
 in many other measurements as well.
This definitely does not match up with my experience. Particularly if you see this in many measurements, there might be something iffy with the way you test. Could you please post a runnable snippet that demonstrates the issue? In general, could you please directly post any performance issues to the LDC issue tracker on GitHub? We are quite interested in them, but I only happened to come across this post by chance. — David
May 22 2016
parent Jens Mueller <jensdotkdotmueller gmx.de> writes:
On Sunday, 22 May 2016 at 21:16:03 UTC, David Nadlinger wrote:
 On Thursday, 19 May 2016 at 12:54:48 UTC, Jens Müller wrote:
 But ldc looks so bad.
 Any comments from ldc users or developers? Because I see this 
 in many other measurements as well.
This definitely does not match up with my experience. Particularly if you see this in many measurements, there might be something iffy with the way you test. Could you please post a runnable snippet that demonstrates the issue?
No need to. Since you said it didn't match your experience I checked. It turns out I was using last years ldc (0.15.2-beta2). Version 0.16.0 behaves similar. But for version 0.17.1 I get the following plots. ldc looks much better. I make a note to check the fast math version once that's release. But the code is integer heavy. https://www.scribd.com/doc/313524674/Running-Time https://www.scribd.com/doc/313524678/Speedup
 In general, could you please directly post any performance 
 issues to the LDC issue tracker on GitHub? We are quite 
 interested in them, but I only happened to come across this 
 post by chance.
Sorry about that. Next time. Jens
May 23 2016
prev sibling parent Johan Engelen <j j.nl> writes:
On Thursday, 19 May 2016 at 12:54:48 UTC, Jens Müller wrote:
 For example what's equivalent to gdc's -ffast-math in ldc.
This is: https://github.com/ldc-developers/ldc/pull/1472 Working on performance improvements is a lot of fun. Please feed us with code that doesn't run as fast as it should! :) Johan
May 22 2016
prev sibling parent reply Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Thursday, 19 May 2016 at 12:04:31 UTC, Andrei Alexandrescu 
wrote:
 On 5/19/16 4:12 AM, Jens Müller wrote:

 ---
 if (a.length == 0 || b.length == 0)
     return 0;
 const amax = a.length - 1, bmax = b.length - 1;
 size_t i,j = 0;
 double sum = 0;
 for (;;)
 {
     if (a[i].index < b[j].index) {
         if (i++ == amax) break;
     }
     else if (a[i].index > b[j].index) {
         bumpJ: if (j++ == bmax) break;
     }
     else
     {
         assert(a[i].index == b[j].index);
         sum += a[i].value * b[j].value;
         if (i++ == amax) break;
         goto bumpJ;
     }
 }
 return sum;
 ---

 Then if you add the sentinel you only need the bounds tests in 
 the third case.
I'm not seeing it. Let me explain. Consider the input a = [1] and b = [2, 3] (I only write the indices). The smallest back index is 1, i.e., a.back is the chosen sentinel. Now I assume that we set b.back to a.back restoring it after the loop. Now in the case a[i].index < b[j].index I have to check whether a[i].index == a.back.index to break because otherwise i is incremented (since a[i].index = 1 and b[j].index = 2, for i = 0 and j = 0 respectively). In the last case I only check a[i].index == a.back.index, since this implies b[j].index == a.back.index. So in sum I have two bounds tests. But I think this is not what you are thinking of. This does not look right. Here are the plots for the implementations. https://www.scribd.com/doc/313204510/Running-Time https://www.scribd.com/doc/313204526/Speedup dot1 is my baseline, which is indeed worse than your baseline (dot2). But only on gdc. I choose dot2 as the baseline for computing the speedup. dot3 is the sentinel version. I removed the code to optimize for large gaps. Because it is only confusing. I may generate some benchmark data with larger gaps later to see whether it is worthwhile for such data. It looks much more regular now (ldc is still strange). Jens
May 19 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/19/2016 05:36 PM, Jens Müller wrote:
 I'm not seeing it. Let me explain.
 Consider the input a = [1] and b = [2, 3] (I only write the indices).
 The smallest back index is 1, i.e., a.back is the chosen sentinel.
Nonono, you stamp the largest index over the smaller index. So you overwrite a = [3] and you leave b = [2, 3] as it is. Now you know that you're multiplying two correct sparse vectors in which _definitely_ the last elements have equal indexes. So the inner loop is: if (a[i].idx < b[j].idx) { i++; // no check needed } else if (a[i].idx > b[j].idx) { j++; // no check needed } else { // check needed r += a[i].val * b[j].val; if (i == imax || j == jmax) break; ++i; ++j; } At the end you need a fixup to make sure you account for the last index that you overwrote (which of course you need to save first). Makes sense? Andrei
May 19 2016
parent reply Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Thursday, 19 May 2016 at 22:02:53 UTC, Andrei Alexandrescu 
wrote:
 On 05/19/2016 05:36 PM, Jens Müller wrote:
 I'm not seeing it. Let me explain.
 Consider the input a = [1] and b = [2, 3] (I only write the 
 indices).
 The smallest back index is 1, i.e., a.back is the chosen 
 sentinel.
Nonono, you stamp the largest index over the smaller index. So you overwrite a = [3] and you leave b = [2, 3] as it is. Now you know that you're multiplying two correct sparse vectors in which _definitely_ the last elements have equal indexes. So the inner loop is: if (a[i].idx < b[j].idx) { i++; // no check needed } else if (a[i].idx > b[j].idx) { j++; // no check needed } else { // check needed r += a[i].val * b[j].val; if (i == imax || j == jmax) break; ++i; ++j; } At the end you need a fixup to make sure you account for the last index that you overwrote (which of course you need to save first). Makes sense?
What if you stomped over an index in a that has as an equal index in b (it could be anywhere in b). After the loop finishes you restore the index in a. But how do you address the values for the stomped over index if needed? For instance test it on a = [2] b = [2,3] Note the 2 in b could be anywhere. I think you can check for if (a[i].idx == sentinelIdx) break; instead of if (i == imax || j == jmax) break; Jens
May 19 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/19/2016 06:50 PM, Jens Müller wrote:
 What if you stomped over an index in a that has as an equal index in b
 (it could be anywhere in b).
Hmmm, you're right. So that doesn't work, or at least not efficiently (the fixup would entail a binary search in b). How about this idea: arrange things such that a.back.idx <= b.back.idx (i.e. swap them if not so). Then stomp b.back.idx with size_t.max. Your kernel is: if (a[i].idx < b[j].idx) { if (i == imax) break; // check needed i++; } else if (a[i].idx > b[j].idx) { j++; // no check needed } else { // no check needed r += a[i++].val * b[j++].val; } The fixup needs only to account for the case a.back.idx == b.back.idx. In fact I like this a lot better than others because it works real fast for identical vector (taking the norm) or similar vectors (often the case). Works? Andrei
May 20 2016
parent reply Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Friday, 20 May 2016 at 14:14:18 UTC, Andrei Alexandrescu wrote:
 On 05/19/2016 06:50 PM, Jens Müller wrote:
 What if you stomped over an index in a that has as an equal 
 index in b
 (it could be anywhere in b).
Hmmm, you're right. So that doesn't work, or at least not efficiently (the fixup would entail a binary search in b). How about this idea: arrange things such that a.back.idx <= b.back.idx (i.e. swap them if not so). Then stomp b.back.idx with size_t.max. Your kernel is: if (a[i].idx < b[j].idx) { if (i == imax) break; // check needed i++; } else if (a[i].idx > b[j].idx) { j++; // no check needed } else { // no check needed r += a[i++].val * b[j++].val; } The fixup needs only to account for the case a.back.idx == b.back.idx. In fact I like this a lot better than others because it works real fast for identical vector (taking the norm) or similar vectors (often the case). Works?
No it doesn't work because you need to break in the last case. Consider the case when the last element of a is equal to an element in b. Next iteration you overrun a. But optimizing for similar vectors is interesting. Jens
May 20 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/20/16 2:13 PM, Jens Müller wrote:
 No it doesn't work because you need to break in the last case. Consider
 the case when the last element of a is equal to an element in b. Next
 iteration you overrun a.
I'm not that Bright :o). So you'd need one more test, but you still save the other test and the test on one of the three branches, so 2 out of 4. -- Andrei
May 20 2016
parent Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Friday, 20 May 2016 at 20:04:35 UTC, Andrei Alexandrescu wrote:
 On 5/20/16 2:13 PM, Jens Müller wrote:
 No it doesn't work because you need to break in the last case. 
 Consider
 the case when the last element of a is equal to an element in 
 b. Next
 iteration you overrun a.
I'm not that Bright :o).
Me neither.
 So you'd need one more test, but you still save the other test 
 and the test on one of the three branches, so 2 out of 4. --
Yes. Current version reduces it from 4 to 2. I've read that people use SSE to speed it up. Maybe I consider this later. If we want to improve on similar vectors (which is a great idea) we just reorder the cases, I'll guess. Just did it. It improves on my test data. But then on near dissimilar input I expect it to be worse. When doing these optimizations it is always dependent on the expected input which is a pity when optimizing library functions. These must work in all cases and should only be less efficient for very good reason. I'm looking forward to Fastware. Jens
May 20 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/19/2016 05:36 PM, Jens Müller wrote:
 I removed the code to optimize for large gaps. Because it is only
 confusing. I may generate some benchmark data with larger gaps later to
 see whether it is worthwhile for such data.
For skipping large gaps quickly, check galloping search (google for it, we also have it in phobos). -- Andrei
May 19 2016
parent Jens =?UTF-8?B?TcO8bGxlcg==?= <jensdotkdotmueller gmx.de> writes:
On Thursday, 19 May 2016 at 22:04:56 UTC, Andrei Alexandrescu 
wrote:
 On 05/19/2016 05:36 PM, Jens Müller wrote:
 I removed the code to optimize for large gaps. Because it is 
 only
 confusing. I may generate some benchmark data with larger gaps 
 later to
 see whether it is worthwhile for such data.
For skipping large gaps quickly, check galloping search (google for it, we also have it in phobos). -- Andrei
Sure. I've already seen this. It's nice. But you have to include it in the sparse dot product (or list intersection) algorithm somehow. Then you require random access and galloping is only beneficial if the gaps are large. As a library writer this is a difficult position because this turns easily into over engineering. Optimally one just exposes the primitives and the user plugs them together. Ideally without having to many knobs per algorithm. Jens
May 19 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/16/16 9:46 AM, Andrei Alexandrescu wrote:
 Uses D for examples, showcases Design by Introspection, and rediscovers
 a fast partition routine. It was quite well received.
 https://www.youtube.com/watch?v=AxnotgLql0k
This talk took a big gambit and it seems to have worked well. Per https://www.youtube.com/channel/UCJhay24LTpO1s4bIZxuIqKw/videos?sort p&view=0&flow=grid, "There's Treasure Everywhere" is the most watched talk of the ACCU 2016 conference with 5276 views with a large margin (next 1874, median 339). Andrei
May 19 2016