www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Importance of memory organization for speed

reply Bill Cox <bill billrocks.org> writes:
Hi, all.

Waaay back, there was a short discussion of optimizing memory layout for speed.
 I've written a simple benchmark that traverses large graphs, one written in
very carefully memory optimized C, the other using C++/STL.  The C version is
15X faster, and uses 2X less memory on my Ubuntu x64 Core Duo laptop. 
Cachegrind shows the C version has a 16.7X lower L2 cache miss rate, which
accounts for the speed difference.

So, I'll just post again the importance of keeping memory layout abstract, and
hidden from the user.  More and more, speed for memory intensive applications
is all about cache performance.  Benchmarks can be found in the
examples/graph_benchmark directory of svn for the datadraw project:

svn co https://datadraw.svn.sourceforge.net/svnroot/datadraw/trunk datadraw

Best regards,
Bill
Jun 09 2008
parent reply renoX <renosky free.fr> writes:
Bill Cox a écrit :
 Hi, all.
 
 Waaay back, there was a short discussion of optimizing memory layout
 for speed.  I've written a simple benchmark that traverses large
 graphs, one written in very carefully memory optimized C, the other
 using C++/STL.  The C version is 15X faster, and uses 2X less memory
 on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has
 a 16.7X lower L2 cache miss rate, which accounts for the speed
 difference.
 
 So, I'll just post again the importance of keeping memory layout
 abstract, and hidden from the user.

Uh? What you just did is using your knowledge of the memory layout in C to speedup your app, so it's the *opposite* of having the memory layout hidden from the user! I don't catch your point here.. Regards, renoX
  More and more, speed for memory
 intensive applications is all about cache performance.  Benchmarks
 can be found in the examples/graph_benchmark directory of svn for the
 datadraw project:
 
 svn co https://datadraw.svn.sourceforge.net/svnroot/datadraw/trunk
 datadraw
 
 Best regards, Bill

Jun 13 2008
next sibling parent Nick B <nick.barbalich gmail.com> writes:
renoX wrote:
 Bill Cox a écrit :
 Hi, all.

 Waaay back, there was a short discussion of optimizing memory layout
 for speed.  I've written a simple benchmark that traverses large
 graphs, one written in very carefully memory optimized C, the other
 using C++/STL.  The C version is 15X faster, and uses 2X less memory
 on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has
 a 16.7X lower L2 cache miss rate, which accounts for the speed
 difference.

 So, I'll just post again the importance of keeping memory layout
 abstract, and hidden from the user.

Uh? What you just did is using your knowledge of the memory layout in C to speedup your app, so it's the *opposite* of having the memory layout hidden from the user! I don't catch your point here.. Regards, renoX
  More and more, speed for memory
 intensive applications is all about cache performance.  Benchmarks
 can be found in the examples/graph_benchmark directory of svn for the
 datadraw project:

 svn co https://datadraw.svn.sourceforge.net/svnroot/datadraw/trunk
 datadraw

 Best regards, Bill


Hi there Does any one know how to measure the L1 & L2 cache performance using D & Tango or is the _only_ way to do this is to use Valgrind ? regards Nick B
Jun 14 2008
prev sibling parent Russell Lewis <webmaster villagersonline.com> writes:
renoX wrote:
 Bill Cox a écrit :
 Hi, all.

 Waaay back, there was a short discussion of optimizing memory layout
 for speed.  I've written a simple benchmark that traverses large
 graphs, one written in very carefully memory optimized C, the other
 using C++/STL.  The C version is 15X faster, and uses 2X less memory
 on my Ubuntu x64 Core Duo laptop.  Cachegrind shows the C version has
 a 16.7X lower L2 cache miss rate, which accounts for the speed
 difference.

 So, I'll just post again the importance of keeping memory layout
 abstract, and hidden from the user.

Uh? What you just did is using your knowledge of the memory layout in C to speedup your app, so it's the *opposite* of having the memory layout hidden from the user! I don't catch your point here..

In a perfect world, a compiler can perform deep optimizations, similar to hand-tuning your program. But it can't do it if you have already halfway specified the memory layout. So in that perfect world, you want to actually *underspecify* your program, so that the compiler can work miracles. However, if you compiler isn't as good as that, then hand-tuning is the better option. An interesting observation is that for straight-line code (constrained within a single function), it used to be that hand-tuned C (or, better yet, assembler) would be much faster than what any compiler could produce. Nowadays, compilers generally produce code that is as good (if not better) than assembly experts. I would suspect that 20 years from now, our compilers will rework the memory layout just like they currently rework the ordering of operations in our functions. But I don't think that we're there yet.
Jun 14 2008