digitalmars.D - D speed compared to C++

Matthew Allen (13/13) Mar 18 2008 I am looking to use D for programming a high speed vision application wh...

BCS (14/29) Mar 18 2008 first of all what C++ compiler? the best for testing would be DMC as

Sean Kelly (6/31) Mar 18 2008 D apps also have more going on in the application initialization phase t...

Frits van Bommel (4/21) Mar 18 2008 How is the startup time relevant, when he appears to be measuring

BCS (4/19) Mar 18 2008 the GC time is not, but cache priming and such can make a difference. I
Sean Kelly (5/26) Mar 18 2008 used it.

Sean Kelly (11/24) Mar 18 2008 in C/C++. I have done some arbitary speed tests and am finding that C/C+...
Frits van Bommel (9/25) Mar 18 2008 That's not a useful benchmark. G++ completely optimizes away the loop,

Frits van Bommel (6/9) Mar 18 2008 Oops, that shouldn't be "%d", should it?
BCS (3/5) Mar 18 2008 printf is kinda a heavy weight function. how does it compare with some

Frits van Bommel (3/10) Mar 18 2008 It doesn't seem to make any difference, unless you count executable size...

Walter Bright (5/22) Mar 18 2008 Loop unrolling could be a big issue here. DMD doesn't do loop unrolling,...

Matthew Allen (3/26) Mar 19 2008 Walter you are right. That was the issue. I added a function call into t...

Koroskin Denis (24/51) Mar 20 2008 on
Walter Bright (4/8) Mar 21 2008 You're welcome. To compare D vs C++ as languages, rather than the

bearophile (3/3) Mar 18 2008 Matthew Allen, the benchmarks on the Shootout site are flawed, but they ...

Dan (16/16) Mar 18 2008 1) Using a float or double as an incrementor in a tight loop is a bad id...

Saaa (5/15) Mar 18 2008 Is this because of the obvious `less calculations is better`, or somethi...

Paul Findlay (8/11) Mar 19 2008 I think chiefly this would be so the CPU's branch prediction can be more
Georg Wrede (108/127) Mar 25 2008 To Saa: it's pretty late here now, but I'll try to address some of this

lutger (6/12) Mar 26 2008 Another way is to use the profiler built into dmd. It can generate a def

bearophile (4/8) Mar 26 2008 I think DMD docs deserve to include such thing too, then.

Jarrett Billingsley (4/8) Mar 18 2008 If it linked in all of phobos, your programs would start at around 1MB. ...
Vladimir Panteleev (5/6) Mar 18 2008 Garbage collection can only happen on a memory allocation, turning off t...

David Ferenczi (2/2) Mar 19 2008 Watch out with the release flag, I expereienced starnge behaviour.
Matthew Allen (23/39) Mar 19 2008 I am testing DMD1.0 against MSVC6 compiler.

lutger (6/6) Mar 19 2008 For benchmarks that operate on a higher, language level see this compari...

Matthew Allen <matt.allen removeme.creativelifestyles.com> writes:

I am looking to use D for programming a high speed vision application which was
previously written in C/C++. I have done some arbitary speed tests and am
finding that C/C++ seems to be faster than D by a magnitude of about 3 times. I
have done some simple loop tests that increment a float value by some number
and also some memory allocation/deallocation loops and C/C++ seems to come out
on top each time. Is D meant to be faster or as fast as C/C++ and if so how can
I optimize the code. I am using -inline, -O, and -release. 

An example of a simple loop test I ran is as follows:

DWORD start = timeGetTime();
	int i,j,k;
	float dx=0;
    for(i=0; i<1000;i++)
        for(j=0; j<1000;j++)
            for(k=0; k<10; k++)
                {
                     dx++;
                }
    DWORD end = timeGetTime();

In C++ int and doubles. The C++ came back with a time of 15ms, and D came back
with 45ms.

Mar 18 2008

BCS <BCS pathlink.com> writes:

Matthew Allen wrote:
 
 An example of a simple loop test I ran is as follows:
 
 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();
 
 In C++ int and doubles. The C++ came back with a time of 15ms, and D came back
with 45ms.

first of all what C++ compiler? the best for testing would be DMC as 
that removes the back end differences.

Second how many test runs was that over?

third, try it with doubles (64bit reals) in both programs as the 
different conversions might be making a difference.

Another thing that might mask some stuff is start up time. Try running 
the test loops in another loop and spit out sequential times. I have 
seen large (2x - 3x) differences in the first run of a test vs. later 
runs. This would avoid random variables like the test code spanning a 
page boundary in one case and no in the other.

If you have done these things already then I don't known what's 
happening. /My/ next step would be to start looking at the ASM, but then 
again I'm known to be crazy.

Mar 18 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from BCS (BCS pathlink.com)'s article
 Matthew Allen wrote:
 An example of a simple loop test I ran is as follows:

 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();

 In C++ int and doubles. The C++ came back with a time of 15ms, and D came back
with 45ms.

 first of all what C++ compiler? the best for testing would be DMC as
 that removes the back end differences.
 Second how many test runs was that over?
 third, try it with doubles (64bit reals) in both programs as the
 different conversions might be making a difference.
 Another thing that might mask some stuff is start up time. Try running
 the test loops in another loop and spit out sequential times. I have
 seen large (2x - 3x) differences in the first run of a test vs. later
 runs.

D apps also have more going on in the application initialization phase than C++
apps.  For a real
apples-apples comparison, you might want to consider using Tango with the
"stub" GC plugged in.
That just calls malloc/free and has no initialization cost, at the expense of
no actual garbage collection.
I'll have to check whether the stub GC compiles with the latest Tango--it's
been a while since I used it.


Sean

Mar 18 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 == Quote from BCS (BCS pathlink.com)'s article
 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();



[snip]
 
 D apps also have more going on in the application initialization phase than
C++ apps.  For a real
 apples-apples comparison, you might want to consider using Tango with the
"stub" GC plugged in.
 That just calls malloc/free and has no initialization cost, at the expense of
no actual garbage collection.
 I'll have to check whether the stub GC compiles with the latest Tango--it's
been a while since I used it.

How is the startup time relevant, when he appears to be measuring 
in-process?

Mar 18 2008

BCS <BCS pathlink.com> writes:

Frits van Bommel wrote:
 DWORD start = timeGetTime();
     int i,j,k;
     float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();



 [snip]
 
 How is the startup time relevant, when he appears to be measuring 
 in-process?

the GC time is not, but cache priming and such can make a difference. I 
  have actually worked on code like the above and seen a consistent and 
significant drop in the second pass time.

Mar 18 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Frits van Bommel (fvbommel REMwOVExCAPSs.nl)'s article
 Sean Kelly wrote:
 == Quote from BCS (BCS pathlink.com)'s article
 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();



 [snip]
 D apps also have more going on in the application initialization phase than
C++ apps.  For a real
 apples-apples comparison, you might want to consider using Tango with the
"stub" GC plugged in.
 That just calls malloc/free and has no initialization cost, at the expense of
no actual garbage


collection.
 I'll have to check whether the stub GC compiles with the latest Tango--it's
been a while since I


used it.
 How is the startup time relevant, when he appears to be measuring
 in-process?

It's not.  I only mentioned it because BCS mentioned startup time.


Sean

Mar 18 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Matthew Allen (matt.allen removeme.creativelifestyles.com)'s
article
 I am looking to use D for programming a high speed vision application which
was previously written

in C/C++. I have done some arbitary speed tests and am finding that C/C++ seems
to be faster than D
by a magnitude of about 3 times. I have done some simple loop tests that
increment a float value by
some number and also some memory allocation/deallocation loops and C/C++ seems
to come out on
top each time. Is D meant to be faster or as fast as C/C++ and if so how can I
optimize the code. I am
using -inline, -O, and -release.
 An example of a simple loop test I ran is as follows:
 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();
 In C++ int and doubles. The C++ came back with a time of 15ms, and D came back
with 45ms.

Are these tests with DMD vs. DMC, or GDC vs. GCC?  If you're using different
compilers for the C++ and
D tests then you're really testing the code generator and optimizer more than
the language.  D code
generated by DMD, for example, is notoriously slow at floating point
operations, while the same code is
much faster with GDC.  This is an artifact of the Digital Mars back-end rather
than the language itself.


Sean

Mar 18 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Matthew Allen wrote:
 I am looking to use D for programming a high speed vision application which
was previously written in C/C++. I have done some arbitary speed tests and am
finding that C/C++ seems to be faster than D by a magnitude of about 3 times. I
have done some simple loop tests that increment a float value by some number
and also some memory allocation/deallocation loops and C/C++ seems to come out
on top each time. Is D meant to be faster or as fast as C/C++ and if so how can
I optimize the code. I am using -inline, -O, and -release. 
 
 An example of a simple loop test I ran is as follows:
 
 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();
 
 In C++ int and doubles. The C++ came back with a time of 15ms, and D came back
with 45ms.

That's not a useful benchmark. G++ completely optimizes away the loop, 
leaving you timing how fast an empty piece of code runs...

However, after adding 'printf("%d", dx)' the generated code for D and 
C++ is virtually identical, as are the timings. At least on my machine 
and with my compilers (gdc and g++ on 64-bit Ubuntu).
If you're seeing different results it may just be a difference between 
your C++ and your D compiler; especially if they're not g++ and gdc or 
dmc and dmd, i.e. if they don't share the same backend.

Mar 18 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Frits van Bommel wrote:
 Matthew Allen wrote:
     float dx=0;


[snip]
 However, after adding 'printf("%d", dx)' the generated code for D and 

Oops, that shouldn't be "%d", should it?
Well, it doesn't matter because I just put that in to keep the compiler 
from completely optimizing out the loop but that does explain why I get 
such weird output :).

Mar 18 2008

BCS <BCS pathlink.com> writes:

Frits van Bommel wrote:

 However, after adding 'printf("%d", dx)' the generated code for D and 
 C++ is virtually identical, as are the timings.

printf is kinda a heavy weight function. how does it compare with some 
dummy function?

Mar 18 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

BCS wrote:
 Frits van Bommel wrote:
 
 However, after adding 'printf("%d", dx)' the generated code for D and 
 C++ is virtually identical, as are the timings.

 
 printf is kinda a heavy weight function. how does it compare with some 
 dummy function?

It doesn't seem to make any difference, unless you count executable size.
(I passed 'dx' to a separately-compiled empty C function instead of printf)

Mar 18 2008

Walter Bright <newshound1 digitalmars.com> writes:

Matthew Allen wrote:
 I am looking to use D for programming a high speed vision application
 which was previously written in C/C++. I have done some arbitary
 speed tests and am finding that C/C++ seems to be faster than D by a
 magnitude of about 3 times. I have done some simple loop tests that
 increment a float value by some number and also some memory
 allocation/deallocation loops and C/C++ seems to come out on top each
 time. Is D meant to be faster or as fast as C/C++ and if so how can I
 optimize the code. I am using -inline, -O, and -release.
 
 An example of a simple loop test I ran is as follows:
 
 DWORD start = timeGetTime(); int i,j,k; float dx=0; for(i=0;
 i<1000;i++) for(j=0; j<1000;j++) for(k=0; k<10; k++) { dx++; } DWORD
 end = timeGetTime();
 
 In C++ int and doubles. The C++ came back with a time of 15ms, and D
 came back with 45ms.

Loop unrolling could be a big issue here. DMD doesn't do loop unrolling, 
but that is not a language issue at all, it's an optimizer issue. It's 
easy enough to check - get the assembler output of the loop from your 
compiler and post it here.

Mar 18 2008

Matthew Allen <mallen creativelifestyles.removeme.com> writes:

Walter Bright Wrote:

 Matthew Allen wrote:
 I am looking to use D for programming a high speed vision application
 which was previously written in C/C++. I have done some arbitary
 speed tests and am finding that C/C++ seems to be faster than D by a
 magnitude of about 3 times. I have done some simple loop tests that
 increment a float value by some number and also some memory
 allocation/deallocation loops and C/C++ seems to come out on top each
 time. Is D meant to be faster or as fast as C/C++ and if so how can I
 optimize the code. I am using -inline, -O, and -release.
 
 An example of a simple loop test I ran is as follows:
 
 DWORD start = timeGetTime(); int i,j,k; float dx=0; for(i=0;
 i<1000;i++) for(j=0; j<1000;j++) for(k=0; k<10; k++) { dx++; } DWORD
 end = timeGetTime();
 
 In C++ int and doubles. The C++ came back with a time of 15ms, and D
 came back with 45ms.

 
 Loop unrolling could be a big issue here. DMD doesn't do loop unrolling, 
 but that is not a language issue at all, it's an optimizer issue. It's 
 easy enough to check - get the assembler output of the loop from your 
 compiler and post it here.


Walter you are right. That was the issue. I added a function call into the loop
and D came out a lot faster than C++. 

Thanks for a great language!!!

Mar 19 2008

"Koroskin Denis" <2korden+dmd gmail.com> writes:

On Thu, 20 Mar 2008 00:44:02 +0300, Matthew Allen  =

<mallen creativelifestyles.removeme.com> wrote:

 Walter Bright Wrote:

 Matthew Allen wrote:
 I am looking to use D for programming a high speed vision applicati=



on
 which was previously written in C/C++. I have done some arbitary
 speed tests and am finding that C/C++ seems to be faster than D by =



a
 magnitude of about 3 times. I have done some simple loop tests that=



 increment a float value by some number and also some memory
 allocation/deallocation loops and C/C++ seems to come out on top ea=



ch
 time. Is D meant to be faster or as fast as C/C++ and if so how can=



 I
 optimize the code. I am using -inline, -O, and -release.

 An example of a simple loop test I ran is as follows:

 DWORD start =3D timeGetTime(); int i,j,k; float dx=3D0; for(i=3D0;
 i<1000;i++) for(j=3D0; j<1000;j++) for(k=3D0; k<10; k++) { dx++; } =



DWORD
 end =3D timeGetTime();

 In C++ int and doubles. The C++ came back with a time of 15ms, and =



D
 came back with 45ms.

 Loop unrolling could be a big issue here. DMD doesn't do loop unrolli=


ng,
 but that is not a language issue at all, it's an optimizer issue. It'=


s
 easy enough to check - get the assembler output of the loop from your=


 compiler and post it here.


 Walter you are right. That was the issue. I added a function call into=

  =

 the loop and D came out a lot faster than C++.

 Thanks for a great language!!!


Obviously, you can not say that D integer increment is faster or slower =
 =

than that of C++, it just doesn't depend on a language design. You could=
  =

compare DMD to GCC and that would make similar sense to comparison of GC=
C  =

and, say, ICC or DMC. The difference is a matter of optimization  =

techniques used be the language compilers, not by languages. Surely,  =

higher level language design makes some inpact on general performance,  =

like GC or constant-time array slicing, but not low level variable  =

increments/loops etc.. General D performance would only increase as more=
  =

vendors will produce commercial D compilers.

Mar 20 2008

Walter Bright <newshound1 digitalmars.com> writes:

Matthew Allen wrote:
 Walter you are right. That was the issue. I added a function call
 into the loop and D came out a lot faster than C++.
 
 Thanks for a great language!!!

You're welcome. To compare D vs C++ as languages, rather than the 
optimizers or code generators, the most straightforward way is to 
benchmark dmd vs dmc, and gdc vs gcc.

Mar 21 2008

bearophile <bearophileHUGS lycos.com> writes:

Matthew Allen, the benchmarks on the Shootout site are flawed, but they are
probably less flawed by your benchmarks, so I suggest you to take a look at
those benchmarks, you can download them and try them on your PC (for example
using an Intel compiler for C++, etc).

Bye,
bearophile

Mar 18 2008

Dan <murpsoft hotmail.com> writes:

1) Using a float or double as an incrementor in a tight loop is a bad idea. 
Most compilers optimize it out where possible; and so do Agner Fog and Paul
Hseih.  They know why such is true better than I.

2) Most compilers optimize stuff out if it's not directly affecting output or
external functions or arguments.  This is usually done on a per-function level.
 A better optimizer would do it for the whole program.

3) Startup for D is slower even for hello world because D statically links the
entirety of phobos and the GC even if you don't ever use them.  This equates to

still not "correct".

4) If the GC does a collection cycle, it'll bump the time complexity.  This
will happen pseudo-randomly.

~~~

If you really want to improve performance on C or C++, do it by profiling your
program, and optimize parts where it matters how fast you go.

- simplify
- remove unnecessary loops
- hoist stuff out of loops as much as possible
- iterate or recurse in ways that ease cache miss penalties
- iterate instead of recurse as much as possible
- reduce if/else if/else || && as much as possible
- multiply by inverse instead of divide where possible
- reduce calls to the OS where it's sensible

If you need to go further, learn assembler.  D's inline one ain't half bad. 
You can do things in assembler that you can't do in HLL's.  Things like ror,
rcl, sete, cmovcc, prefetchntq, clever XMMX usage and such.

/me is looking forward to when XMMX has byte-array functionality.   Could
outperform all x86-32 string stuff by an order of magnitude.

Mar 18 2008

"Saaa" <empty needmail.com> writes:

 If you really want to improve performance on C or C++, do it by profiling 
 your program, and optimize parts where it matters how fast you go.

 - simplify
 - remove unnecessary loops
 - hoist stuff out of loops as much as possible
 - iterate or recurse in ways that ease cache miss penalties

Could you elaborate on this? ^

 - iterate instead of recurse as much as possible
 - reduce if/else if/else || && as much as possible

Is this because of the obvious `less calculations is better`, or something 
else?

 - multiply by inverse instead of divide where possible
 - reduce calls to the OS where it's sensible

- don't allocate and then delete is you need a ~same amount of memory 
afterwards.

Mar 18 2008

Paul Findlay <r.lph50+d gmail.com> writes:

Saaa wrote:
 - reduce if/else if/else || && as much as possible

 Is this because of the obvious `less calculations is better`, or something
 else?

I think chiefly this would be so the CPU's branch prediction can be more
consistent, or so it doesn't have to guess.. "The elimination of branching
is an important concern with today's deeply pipelined processor
architectures"

Check out some of the stuff on:
http://www.azillionmonkeys.com/qed/optimize.html

 - Paul

Mar 19 2008

Georg Wrede <georg nospam.org> writes:

Saaa wrote:
 If you really want to improve performance on C or C++, do it by
 profiling your program, and optimize parts where it matters how
 fast you go.


To Saa: it's pretty late here now, but I'll try to address some of this
briefly.

In general, (I have to start with this. And this post, as every post in
a NG, should be for a broader audience than the one nominally replied
to.) improving performance only means one thing: have the computer do
less to achieve the goal.

 - simplify


Any time you look at your code from two weeks ago, you'll probably find
the same thing could be done more easily, with less code lines, with
less effort for the computer.

 - remove unnecessary loops


For example, to sum up the integers from 1 to 100 doesn't mean that
you'd do the obvious loop. Thinking more about it gives Sum = 1+100 +
2+99 ... and further refining it gives 50 * (1+100) = 5050. (Courtesy of
Gauss (http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss, ca 1785, ca 7 
years old!)

 - hoist stuff out of loops as much as possible


     for(int i=0; i<100; i++)
     {
         if (i%3 == 0) sum += i;
     }

converted to

     for(int i=0; i<100/3; i++)
     {
         sum += i;
     }
// ok, stupid example, AND it's late here, AND I haven't compiled it...

 - iterate or recurse in ways that ease cache miss penalties


Modern processors have caches ranging in size from hundreds of kilobytes
to some megabytes, usually with a separate cache for instructions, and
another for data. Program code, for example, is fetched into the cache
from memory around where the current instruction is located. Thus, the
instructions following the current one are in the cache which the
processor can access much faster than real memory.

Now, if you have your code so that, say, a loop you're executing,
contains subroutine or function calls that actually reside far apart in
memory, then it might be that the processor doesn't understand to keep
all of them in the cache. One way to ease this is to see to it that the
needed functions reside close to each other in ram. One way of trying to
dot this is to have them next to each other in the source code. (Please,
again, understand that it's late here, etc..., so I'm not accurate here,
more like conveying the general idea.)

The same goes for data. Suppose you're Cloe [in "24", the TV series],
and Boss gives you 25 minutes to filter out who of our 2 million
suspects have made any of 100 million recent cellular calls.

You'd have to write a D(!!!!) program to get it done, right? Now,
matching 2 million in an array against 100 million in a stream (yes,
you'd do it against a stream) would be the first tac to do it. But, 25
minutes wouldn't be enough. So, you write your program so that first,
the 2 million suspects are sorted in order of "suspectability", right?
The extra time used for this is more than countered for when the actual
routine is run because now most of the "suspects" appear regularly in
the comparisons, and therefore "stay in the cache".

(Again, it's late here, but you get the idea.)

 - iterate instead of recurse as much as possible


Any textbook on programming and recursion shows you an example of doing
the Fibonacci numbers. Just write the code as recursive and as iterative
(both from the textbook) and time the results. The difference is appalling.

 - reduce if/else if/else || && as much as possible

 
 Is this because of the obvious `less calculations is better`, or
 something else?

Well, any single calculation that you do _within_ the loop, is
calculated as many times as the loop is run. In _most_ cases, what you'd
have inside the loop at first thought, can pretty easily be transported
outside the loop. (Especially with D, where you can have compile-time
things calculated automatically, but also with other languages where
quite a lot of the stuff you'd initially have inside the loop, is
possible to either calculate once before, or convert the whole thing to
calculating other (simpler) things, and then either before the loop or
after it, have some function "rectifying" the end result. (See the
for-loop example above.)

 - multiply by inverse instead of divide where possible


Division is poison for computers. It's also poison for math
coprocessors. (Hey, those of us that are old enough to have had to do
division of large floating point numbers with pencil and paper at 
school, know intuitively that multiplication is a piece of cake, 
compared. Who here would venture to do 123455.2525 / 4.7110211 on only 
pencil and paper??)

Now, if instead, one could choose to do 123455.2525 * 0.212268206568
(whether with paper-and-pencil, or with the math coprocessor), the
result would be obtained _much_ faster. And with a _lot_ less head ache.

 - reduce calls to the OS where it's sensible


(This is ridiculous, but I hope it gets the idea across.) Suppose an
idiot Ivan has to write a program that sums up the sizes of all files on
a hard disk. He might first create a function that lists all the files
in a directory with separate OS queries, then edit it to be recursive, 
then make a list of all the files found with this method, then 
one-by-one ask the operating system for the sizes of these files.

Some operating systems have a single call that returns a list of the
file names and (among other things) the file sizes and types. Wouldn't
it be faster to use this, sum up the sizes, and then do the same if any
of the entreies turn out to be directories?

(Comment about reducing calls to the OS: It's not as clear-cut as the
above example might say. Not all of us use Windows, where "_any_" OS
call is a "waste of time". So, one should know, or seek wisdom in the OS
docs, (or even source code, where available) before making judgements on
the efficacy of particular tactics.)

 - don't allocate and then delete is you need a ~same amount of memory
  afterwards.

(Ignoring the typo.)

New-ing and delete-ing are expesive operations. (Time-wise.) Now,
especially if you know up front that you will need to make a lot of
both, it might be smart to first consider (or have your app evaluate, or
even guess) how many of these you might need at most at the same time.
Then it would be smart to allocate an array containing whatever it is
you need to allocate (be it integers, structs or objects, or whatever) 
at the most.

Then you could write a function myNew(object_or_whatever) that, instead
of allocating with new, would just look at the array and find the first
empty slot in it. Similarly, with delete, you could have a function that
frees the particular slot in this array. While this doesn't sound much
faster than using new and delete (because I'm too tired now to explain
it properly), this is a tried an proven method of making the program run
_much_ faster.

---

 If you really want to improve performance on C or C++, do it by
 profiling your program, and optimize parts where it matters how
 fast you go.


(Quoted again, from the top of this post.) I'd rewrite the above quote, to:

     If you really want to improve performance, do it by profiling.
     And, improve only where the profiling shows you're slow.

In any language.

Mar 25 2008

lutger <lutger.blijdestijn gmail.com> writes:

Georg Wrede wrote:
...
 Now, if you have your code so that, say, a loop you're executing,
 contains subroutine or function calls that actually reside far apart in
 memory, then it might be that the processor doesn't understand to keep
 all of them in the cache. One way to ease this is to see to it that the
 needed functions reside close to each other in ram. One way of trying to
 dot this is to have them next to each other in the source code. ...

Another way is to use the profiler built into dmd. It can generate a def
file with the optimal link order gathered from empirical results:

http://www.digitalmars.com/ctg/trace.html 

It's only for win32 however.

Mar 26 2008

bearophile <bearophileHUGS lycos.com> writes:

lutger:
 Another way is to use the profiler built into dmd. It can generate a def
 file with the optimal link order gathered from empirical results:
 http://www.digitalmars.com/ctg/trace.html 
 It's only for win32 however.

I think DMD docs deserve to include such thing too, then.

Bye,
bearophile

Mar 26 2008

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Dan" <murpsoft hotmail.com> wrote in message 
news:frpnmc$24u0$1 digitalmars.com...

 3) Startup for D is slower even for hello world because D statically links 
 the entirety of phobos and the GC even if you don't ever use them.  This 
 equates to about 80kb of bloat - so it's still dramatically better than 


If it linked in all of phobos, your programs would start at around 1MB.  It 
doesn't link in all of phobos.

Mar 18 2008

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Wed, 19 Mar 2008 02:45:00 +0200, Dan <murpsoft hotmail.com> wrote:

 4) If the GC does a collection cycle, it'll bump the time complexity.  This
will happen pseudo-randomly.

Garbage collection can only happen on a memory allocation, turning off the GC
will have no effect here (except, perhaps, the initialization time, which isn't
what we're measuring here anyway).

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Mar 18 2008

David Ferenczi <raggae ferenczi.net> writes:

Watch out with the release flag, I expereienced starnge behaviour.

See: http://d.puremagic.com/issues/show_bug.cgi?id=797

Mar 19 2008

Matthew Allen <mallen removeme.creativelifestyles.com> writes:

Matthew Allen Wrote:

 I am looking to use D for programming a high speed vision application which
was previously written in C/C++. I have done some arbitary speed tests and am
finding that C/C++ seems to be faster than D by a magnitude of about 3 times. I
have done some simple loop tests that increment a float value by some number
and also some memory allocation/deallocation loops and C/C++ seems to come out
on top each time. Is D meant to be faster or as fast as C/C++ and if so how can
I optimize the code. I am using -inline, -O, and -release. 
 
 An example of a simple loop test I ran is as follows:
 
 DWORD start = timeGetTime();
 	int i,j,k;
 	float dx=0;
     for(i=0; i<1000;i++)
         for(j=0; j<1000;j++)
             for(k=0; k<10; k++)
                 {
                      dx++;
                 }
     DWORD end = timeGetTime();
 
 In C++ int and doubles. The C++ came back with a time of 15ms, and D came back
with 45ms.

I am testing DMD1.0 against MSVC6 compiler. 
On DMD I used -O and -inline, On MSVC I used -O2.

Taking in the discussion I tried a few more tests and found that D is faster in
certain cirumstances so I guess that the speed is down to compiler optimization.

Also of note is that these tests were run in gui applications not console
applications. Running in simple console applications D came out on top in all
tests.

Here is summary of what I tried. Each test was run 100 times average given.

double Add(double a, double b) {return a+b;}

DWORD start = timeGetTime();
int i,j,k;
double dx=0;

for(i=0; i<1000;i++)
        for(j=0; j<1000;j++)
            for(k=0; k<10;k++)
            {
                dx++;  // test 1 - simple increment on dx
	dx = Add(i+0.5, j);  // test 2 - funcion to change dx
	dx+=Add(i+0.5,j); // test 3 - function increment on dx
             }
	
DWORD end = timeGetTime();


For test 1: DMD [42ms]  MSVC6 [15ms]
For test 2: DMD [9ms]   MSVC6 [100ms]
For test 3: DMD [42ms]  MSVC6 [109ms]

Mar 19 2008

lutger <lutger.blijdestijn gmail.com> writes:

For benchmarks that operate on a higher, language level see this comparison
of xml libraries, D comes on top with tango's implementation: 
http://dotnot.org/blog/index.php

Here are the slides of the presentation where the underlying ideas were
discussed:
http://s3.amazonaws.com/dconf2007/Kris_Array_Slicing.pdf

Mar 19 2008

D Programming

C/C++ Programming

Other

digitalmars.D - D speed compared to C++