digitalmars.D - Re: memset and related things

bearophile <bearophileHUGS lycos.com> Sep 20 2009

language_fan <foo bar.com.invalid> Sep 20 2009

bearophile <bearophileHUGS lycos.com> Sep 20 2009

downs <default_357-line yahoo.de> Sep 21 2009

bearophile <bearophileHUGS lycos.com> Sep 21 2009
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 21 2009

Don <nospam nospam.com> Sep 21 2009

Walter Bright <newshound1 digitalmars.com> Sep 23 2009

Don <nospam nospam.com> Sep 23 2009

Walter Bright <newshound1 digitalmars.com> Sep 23 2009

Walter Bright <newshound1 digitalmars.com> Sep 23 2009

bearophile <bearophileHUGS lycos.com> Sep 23 2009

Jeremie Pelletier <jeremiep gmail.com> Sep 23 2009

Jeremie Pelletier <jeremiep gmail.com> Sep 23 2009
bearophile <bearophileHUGS lycos.com> Sep 23 2009

language_fan <foo bar.com.invalid> Sep 21 2009
Jarrett Billingsley <jarrett.billingsley gmail.com> Sep 23 2009

Don <nospam nospam.com> Sep 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:

 It'll be interesting to see what the priorities are now -- 
 maybe this stuff is of more interest now.


Probably removing bugs is more important still :-)
For example your work has changed a little how compile-time functions can be
used in D.


 BTW the AMD manual for K7 (or might be K6 optimisation manual? don't 
 exactly remember) goes into great detail about both memcpy() and 
 memset(). Turns out there's about five different cases.


In the meantime Deewiant has told me that on 64 bit glibc memset is better and
on more modern CPUs the timings are different (and on 64 bit my first version
may not work, maybe the second one is better. I have not tested it on 64 bit
LDC yet). I'm just a newbie on this stuff, while people that write the memset
of 64bit glibc are expert.

Bye and thank you,
bearophile

Sep 20 2009

language_fan <foo bar.com.invalid> writes:

Sun, 20 Sep 2009 16:09:50 -0400, bearophile thusly wrote:

 I'm just a newbie on this stuff, while
 people that write the memset of 64bit glibc are expert.


I have been wondering why people often here complain that Java is slower 
than D. Every time I see your benchmarks, Java is actually doing just 
fine, sometimes it's even faster than D. Are your benchmarks somehow 
flawed every time Java wins D in performance?

Sep 20 2009

bearophile <bearophileHUGS lycos.com> writes:

language_fan:

 Every time I see your benchmarks, Java is actually doing just 
 fine, sometimes it's even faster than D. Are your benchmarks somehow 
 flawed every time Java wins D in performance?


I don't fully understand your question. The main purpose of some of the
benchmarks I've done in the last few months is to find performance bugs (or
sometimes just bugs) in LDC or LLVM (I have stopped doing something similar for
DMD because people look less interested in improving it performance).

Now often I use code originally written in C++ and Java because I've seen that
in most situations LLVM is able to produce good binaries from very C-like code
(with few exceptions).

My benchmarks aren't chosen randomly, I naturally focus on things that are
slower in D, so sometimes you can see Java to "win". I usually discard the code
where Java results slower :-)

Installing and using a debug build of the JavaVM to see the assembly its JIT
produces is not handy, but I sometimes do it. Once in a while you can find some
surprises there.

Recently I have found a nice single-file C++ program, that heavily uses
templates and partial specialization, but I am having problems in translating
it to good D to test templates in LDC because I don't know enough C++ yet. It
finds the optimal solution to the 15 problems, and it's quite efficient:
http://codepad.org/96ATkuhx
Help/suggestions are welcome :-)

Bye,
bearophile

Sep 20 2009

downs <default_357-line yahoo.de> writes:

language_fan wrote:
 Sun, 20 Sep 2009 18:16:37 -0400, bearophile thusly wrote:
 
 My benchmarks aren't chosen randomly, I naturally focus on things that
 are slower in D, so sometimes you can see Java to "win". I usually
 discard the code where Java results slower :-)


 I have seen people many times mention that Java is in general orders of 
 magnitude slower than D, no matter what kind of algorithms you run on 
 both environments. This is because of the VM - nothing on a VM can run 
 faster than native code, they say. 


Wow. I actually haven't seen that argument in a relatively long time - it seems
to me relatively debunked nowadays.

(Personally, for me it comes down to "Java will always have a certain delay on
startup, and its thoroughly object-oriented design forces heap access that is
often unnecessary for solving the problem; plus the single-paradigm model is
woefully constraining in comparison to more flexible languages. "

 I personally use a 
 lot of heap memory allocation in my work, and so far Java has not only 
 been safer (and provides decent stack traces and no compiler bugs), but 
 also faster - each time.


Yes, heap allocation is faster in Java. There's so much of it they pretty much
had no choice but to tune it to hell and back :)

 If you decide to hide the bad results
 (for D), it will only reinforce the misinformation.


Um, he said he hides the bad results _for Java_.

Sep 21 2009

bearophile <bearophileHUGS lycos.com> writes:

downs:

and its thoroughly object-oriented design forces heap access that is often
unnecessary for solving the problem;<


In D/C# you can use an array of structs that sometimes allows for better
performance, in Java to avoid the decrease in performance caused by using an
array of references you may need to use several parallel arrays of the single
fields of the struct.

But the cache coherence signature of such parallels arrays is different,
sometimes it's better and sometimes it's worse than the array of structs. If
you have to access several items at once of each struct of the array, then
using the array of structs is generally faster than using the parallel arrays.
If you have to scan only one or few fields then using parallel arrays is faster
because there's less data that comes through the caches.

Time ago someone has created a "parallel array" struct here to allow for such
layout choices at compile-time. A VirtualMachine in theory can dynamically
change such struct/array layout at runtime according to the access patterns to
the data :-) I think similar dynamism in data structures may be done in the
next generation of VM-based languages (you can implement such smart data
structures in D too, but the data structure has to be mostly opaque and you
have to forbid taking addresses to any data, because it can be moved and
shuffled around).


Um, he said he hides the bad results _for Java_.<


Right. I generally don't like to hide things, and I show all data I have
collected.
Sometimes I don't show benchmarks where Java looks slow because they can be
boring and because I am not currently interested in improving the JavaVM.
Focusing on where LDC/LLVM do badly I can try to improve them.

Bye,
bearophile

Sep 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

downs wrote:
 language_fan wrote:
 Sun, 20 Sep 2009 18:16:37 -0400, bearophile thusly wrote:

 My benchmarks aren't chosen randomly, I naturally focus on things that
 are slower in D, so sometimes you can see Java to "win". I usually
 discard the code where Java results slower :-)


 magnitude slower than D, no matter what kind of algorithms you run on 
 both environments. This is because of the VM - nothing on a VM can run 
 faster than native code, they say. 


 Wow. I actually haven't seen that argument in a relatively long time - it
seems to me relatively debunked nowadays.
 
 (Personally, for me it comes down to "Java will always have a certain delay on
startup, and its thoroughly object-oriented design forces heap access that is
often unnecessary for solving the problem; plus the single-paradigm model is
woefully constraining in comparison to more flexible languages. "
 
 I personally use a 
 lot of heap memory allocation in my work, and so far Java has not only 
 been safer (and provides decent stack traces and no compiler bugs), but 
 also faster - each time.


 Yes, heap allocation is faster in Java. There's so much of it they pretty much
had no choice but to tune it to hell and back :)
 
 If you decide to hide the bad results
 (for D), it will only reinforce the misinformation.


 Um, he said he hides the bad results _for Java_.


I guess that reinforces some other misinformation :o).

Andrei

Sep 21 2009

Don <nospam nospam.com> writes:

language_fan wrote:
 Sun, 20 Sep 2009 18:16:37 -0400, bearophile thusly wrote:
 
 My benchmarks aren't chosen randomly, I naturally focus on things that
 are slower in D, so sometimes you can see Java to "win". I usually
 discard the code where Java results slower :-)


 I have seen people many times mention that Java is in general orders of 
 magnitude slower than D, no matter what kind of algorithms you run on 
 both environments.


I think you're creating a straw man. I don't recall anyone saying that. 
Certainly not "orders of magnitude" in general.
In general, when D is faster, it'll be because more coding techniques 
are available. For example, because you're not forced to used OOP all 
the time.

 This is because of the VM - nothing on a VM can run 
 faster than native code, they say.  If you decide to hide the bad results
 (for D), it will only reinforce the misinformation. I personally use a 
 lot of heap memory allocation in my work, and so far Java has not only 
 been safer (and provides decent stack traces and no compiler bugs), but 
 also faster - each time.


On this ng, it seems to be universally acknowledged that Java's GC is 
much better than D's; and that Java's tool chain is orders of magnitude 
more mature than D.
But, in the cases where Java beats D, it will almost always be because 
of the GC. LDC probably only gets beaten on the GC.
Having looked at the DMD optimiser, I'm a bit surprised that it's 
competitive at all (especially in floating point). There is so much 
low-hanging fruit, it's practically an orchard <g>. It's hardly been 
touched since the mid-90's, while Java's had 15 years of massive 
commercial development during that time.

Sep 21 2009

Walter Bright <newshound1 digitalmars.com> writes:

Don wrote:
 Having looked at the DMD optimiser, I'm a bit surprised that it's 
 competitive at all (especially in floating point). There is so much 
 low-hanging fruit, it's practically an orchard <g>.


I believe that's because it has reached the point of diminishing returns.

Sep 23 2009

Don <nospam nospam.com> writes:

Walter Bright wrote:
 Don wrote:
 Having looked at the DMD optimiser, I'm a bit surprised that it's 
 competitive at all (especially in floating point). There is so much 
 low-hanging fruit, it's practically an orchard <g>.


 I believe that's because it has reached the point of diminishing returns.


I presume you're talking about integer optimisation, because that's 
definitely not the case for DMD's floating point. Eg, there's a comment 
in the code somewhere saying that CSE (complex sub expression) 
optimisations are not done for floating point because the code generator 
can't cope with it. It does loops particularly poorly, it never seems to 
keep a variable on the FP stack, so it keeps loading an saving them. The 
potential speedups are enormous.

Sep 23 2009

Walter Bright <newshound1 digitalmars.com> writes:

Don wrote:
 I presume you're talking about integer optimisation, because that's 
 definitely not the case for DMD's floating point. Eg, there's a comment 
 in the code somewhere saying that CSE (complex sub expression) 
 optimisations are not done for floating point because the code generator 
 can't cope with it. It does loops particularly poorly, it never seems to 
 keep a variable on the FP stack, so it keeps loading an saving them. The 
 potential speedups are enormous.



I'll agree with you on that assessment.

Sep 23 2009

Walter Bright <newshound1 digitalmars.com> writes:

language_fan wrote:
 I have seen people many times mention that Java is in general orders of 
 magnitude slower than D, no matter what kind of algorithms you run on 
 both environments. This is because of the VM - nothing on a VM can run 
 faster than native code, they say. If you decide to hide the bad results 
 (for D), it will only reinforce the misinformation. I personally use a 
 lot of heap memory allocation in my work, and so far Java has not only 
 been safer (and provides decent stack traces and no compiler bugs), but 
 also faster - each time.


If you simply time a heap allocation in D and Java, Java will probably 
be faster not because of anything inherently faster about Java, but 
because Sun has poured billions of dollars into trying to make their 
heap allocator faster.

However, a typical D program does far fewer allocations than the Java 
equivalent, for the simple reason that D supports stack allocated and 
embedded value aggregates while Java requires them to be on the heap. It 
is the much reduced need for the heap to be fast that is an advantage for D.

Java does do some escape analysis to try and allocate heap objects on 
the stack instead, but I don't know how effective this is, and even that 
won't help if you try to embed a value aggregate into a class:

struct S { int x, y, z; }

class C
{
     S v;   // in Java this would require a second heap allocation
}

In other words, one of the most effective techniques for speeding up 
heap allocation costs is to design the data structures so they require 
fewer allocations!

Sep 23 2009

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

 Java does do some escape analysis to try and allocate heap objects on 
 the stack instead, but I don't know how effective this is, and even that 
 won't help if you try to embed a value aggregate into a class:
 
 struct S { int x, y, z; }
 
 class C
 {
      S v;   // in Java this would require a second heap allocation
 }


I have discussed with LDC devs an improvement of that idea (it was discussed in
this newsgroup too):

class C1 { int x, y, z; }

class C2 {
    scope C1 c;
}

This idea raises few problems, but I think they can be solved.
You can special-case the management of such objects scope-allocated inside
other objects. Or you can just add an invisible field to that C2 class, the
reference to c.

Bye,
bearophile

Sep 23 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

bearophile wrote:
 Walter Bright:
 
 Java does do some escape analysis to try and allocate heap objects on 
 the stack instead, but I don't know how effective this is, and even that 
 won't help if you try to embed a value aggregate into a class:

 struct S { int x, y, z; }

 class C
 {
      S v;   // in Java this would require a second heap allocation
 }


 I have discussed with LDC devs an improvement of that idea (it was discussed
in this newsgroup too):
 
 class C1 { int x, y, z; }
 
 class C2 {
     scope C1 c;
 }
 
 This idea raises few problems, but I think they can be solved.
 You can special-case the management of such objects scope-allocated inside
other objects. Or you can just add an invisible field to that C2 class, the
reference to c.
 
 Bye,
 bearophile


I would love to see such a feature added in D, would make object 
composition much more convenient and faster.

You'd just need the compiler to force a call into C1's constructor 
within C2's and generate a call to C1's destructor in the epilog of 
C2's. I can't see how implementing idea that could be any harder, other 
than the parser related support.

No need to store a reference within the object, that would still require 
another heap allocation killing the need for scope in the first place.

Sep 23 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Jarrett Billingsley wrote:
 On Wed, Sep 23, 2009 at 10:52 AM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 I would love to see such a feature added in D, would make object composition
 much more convenient and faster.

 You'd just need the compiler to force a call into C1's constructor within
 C2's and generate a call to C1's destructor in the epilog of C2's. I can't
 see how implementing idea that could be any harder, other than the parser
 related support.


 You are really hung up on parsing, aren't you ;) Parsing is a tiny,
 tiny, tiny fraction of the complexity of implementing a language, and
 most language features do not require any change to it. Really.


Everything begins at parsing! You'd need to parse the scope lexeme in 
that context, then you'd need to adjust the appropriate parse node to 
contain the scope flag, which would require additional semantics 
analysis and finally a few changes in the IR generation. Thats what I 
implied by parser "related" support.

But yeah, I love parsing :)

Sep 23 2009

bearophile <bearophileHUGS lycos.com> writes:

Jeremie Pelletier:

 No need to store a reference within the object, that would still require 
 another heap allocation killing the need for scope in the first place.


Probably you have not understood what I meant. I surely didn't meant that it
needs another heap allocation (the purpose of scope is to avoid that).

Such extra invisible reference field among that object attributes is not
necessary, but it makes things simpler. If you add such reference, then inside
the outer object you have the inner object followed by the reference to the
inner object, both inlined (so no other heap allocations are necessary). So you
can keep all the usual semantics of D objects, avoid most special-casing. So
for example you can assign a different object to such reference:

class C1 { int x, y, z; }

class C2 {
  scope C1 c1;
}

void main() {
  auto c2 = new C2;
  auto c = new C1;
  c2.c1 = c; // <--- see here
}

Now the memory of c1 inside c2 is not reachable anymore, so the destructor of
that inner object can be called by the GC, but of course its actual memory will
not be freed until c2 is deallocated.

Bye,
bearophile

Sep 23 2009

language_fan <foo bar.com.invalid> writes:

Sun, 20 Sep 2009 18:16:37 -0400, bearophile thusly wrote:

 My benchmarks aren't chosen randomly, I naturally focus on things that
 are slower in D, so sometimes you can see Java to "win". I usually
 discard the code where Java results slower :-)


I have seen people many times mention that Java is in general orders of 
magnitude slower than D, no matter what kind of algorithms you run on 
both environments. This is because of the VM - nothing on a VM can run 
faster than native code, they say. If you decide to hide the bad results 
(for D), it will only reinforce the misinformation. I personally use a 
lot of heap memory allocation in my work, and so far Java has not only 
been safer (and provides decent stack traces and no compiler bugs), but 
also faster - each time.

Sep 21 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Wed, Sep 23, 2009 at 10:52 AM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 I would love to see such a feature added in D, would make object composition
 much more convenient and faster.

 You'd just need the compiler to force a call into C1's constructor within
 C2's and generate a call to C1's destructor in the epilog of C2's. I can't
 see how implementing idea that could be any harder, other than the parser
 related support.


You are really hung up on parsing, aren't you ;) Parsing is a tiny,
tiny, tiny fraction of the complexity of implementing a language, and
most language features do not require any change to it. Really.

Sep 23 2009

Don <nospam nospam.com> writes:

bearophile wrote:
 Don:
 
 It'll be interesting to see what the priorities are now -- 
 maybe this stuff is of more interest now.


 Probably removing bugs is more important still :-)
 For example your work has changed a little how compile-time functions can be
used in D.
 
 
 BTW the AMD manual for K7 (or might be K6 optimisation manual? don't 
 exactly remember) goes into great detail about both memcpy() and 
 memset(). Turns out there's about five different cases.


 In the meantime Deewiant has told me that on 64 bit glibc memset is better and
on more modern CPUs the timings are different (and on 64 bit my first version
may not work, maybe the second one is better. I have not tested it on 64 bit
LDC yet). I'm just a newbie on this stuff, while people that write the memset
of 64bit glibc are expert.


Really, memset() _should_ be optimal in all cases.  On almost all 
compilers, it's not optimal, and on many (such as DMD) there's a _lot_ 
of room for improvement. So I consider this to a C standard library 
implementation issue, rather than a language weakness.

Sep 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 bearophile wrote:
 Don:

 It'll be interesting to see what the priorities are now -- maybe this 
 stuff is of more interest now.


 Probably removing bugs is more important still :-)
 For example your work has changed a little how compile-time functions 
 can be used in D.


 BTW the AMD manual for K7 (or might be K6 optimisation manual? don't 
 exactly remember) goes into great detail about both memcpy() and 
 memset(). Turns out there's about five different cases.


 In the meantime Deewiant has told me that on 64 bit glibc memset is 
 better and on more modern CPUs the timings are different (and on 64 
 bit my first version may not work, maybe the second one is better. I 
 have not tested it on 64 bit LDC yet). I'm just a newbie on this 
 stuff, while people that write the memset of 64bit glibc are expert.


 Really, memset() _should_ be optimal in all cases.  On almost all 
 compilers, it's not optimal, and on many (such as DMD) there's a _lot_ 
 of room for improvement. So I consider this to a C standard library 
 implementation issue, rather than a language weakness.


Don, I suggest the following.

std.algorithm has a routine called fill(range, value) which semantically 
subsumes memset. I suggest you specialize fill() for contiguous memory 
ranges of primitive types (which shouldn't be hard with std.traits), and 
then optimize the heck out of it.

You could do the same with copy(), also in std.algorithm, to implement a 
super-duper memcpy() routine.

If you go this route people can uniformly use high-level algorithms that 
specialize themselves whenever applicable.


Andrei

Sep 21 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Re: memset and related things