digitalmars.D - D performance compared with other languages

christopher diggins (15/15) May 11 2004 I have written an object oriented Heapsort in D at

Andy Friesen (3/14) May 11 2004 The time decreases significantly if you use both -O and -release.
Walter (4/19) May 11 2004 Make sure to compile with:

christopher diggins (12/12) May 11 2004 Thanks for the feedback, Walter. I had only used -O but not -inline

Helmut Leitner (5/13) May 11 2004 Each interface has a 4 byte/object cost.
Ben Hinkle (8/20) May 11 2004 Every D object has a synchronized monitor that is right after the vtable...
Walter (11/20) May 11 2004 Thanks! Could you also please hyperlink the "Digital Mars Compiler" to

Daniel Horn (12/50) May 11 2004 That's a bit tedius, especially if your class may or may not need to be

Walter (10/25) May 11 2004 bits

-scooter- (11/13) May 12 2004 One would think so, but an excellent paper from OOPSLA '02 has hard evid...

Kevin Bealer (9/22) May 12 2004 In a GC program, you can avoid GC overhead by keeping free lists of anyt...

Scott Michel (5/18) May 13 2004 Any "modern" general-purpose allocator already does this since the late ...

Kevin Bealer (10/28) May 13 2004 I've usually heard that phrase when talking about cloning or life/death

-scooter- (14/31) May 17 2004 My point was that you become responsible for object lifetimes; you're

Kevin Bealer (8/39) May 18 2004 I was thinking "locality". Okay, this makes sense. Yeah, you can use a...

-scooter- (15/33) May 19 2004 You're thinking of customized slab allocators or separate heaps for a

Kevin Bealer (23/56) May 20 2004 Which makes perfect sense. I was writing one because I had a neat idea ...

Walter (8/20) May 12 2004 Storage

christopher diggins (17/37) May 11 2004 --

Walter (79/99) May 11 2004 msec.

Jan-Eric Duden (4/7) May 12 2004 I am wondering if the difference of execution time is the result of usin...

Walter (9/17) May 12 2004 I'd have to do testing to make sure, but I'm pretty sure it's due to

Jan-Eric Duden (13/20) May 12 2004 .)

christopher diggins (14/34) May 12 2004 sort

Stewart Gordon (21/28) May 12 2004 Can you give an example of an O(n) case of quicksort?

Jan-Eric Duden (5/12) May 12 2004 Yep, that's why sometimes heapsort is preferred over quicksort.

Stewart Gordon (12/20) May 12 2004 By 'swaps' do you mean the swapping of an element with the hole into

Kevin Bealer (26/41) May 12 2004 O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations ...

Stewart Gordon (30/43) May 13 2004 You seldom get direct proportionality in practice. In fact, it means

J Anderson (10/12) May 13 2004

Walter (4/6) May 13 2004 It can accept functions as parameters - both as function pointers, and a...

J Anderson (4/14) May 14 2004 Cool, now for operators....

Walter (3/18) May 14 2004 You can use the equivalent name for the operator functions.

J Anderson (5/12) May 13 2004 If your talking about the website I provided, did you read that shell

Stewart Gordon (9/12) May 13 2004 My talking?

J Anderson (4/12) May 13 2004 Whoops, your right. Its the swap sort that is broken.

Sean Kelly (6/8) May 12 2004 But some versions of heapsort need a considerable memory overhead to

J Anderson (8/16) May 12 2004 Right! Even sort algorithms like short-circuit bubble sort can be

Stewart Gordon (9/14) May 12 2004 Even better for nearly sorted data is bidirectional short-circuit bubble...

C (6/18) May 12 2004 Hmm, I couldn't find anything on this do you have some more info ?

J Anderson (11/33) May 12 2004 In a bubble sort small elements are *bubbled* up to the top of the
Stewart Gordon (19/24) May 13 2004

resistor AT nospam DOT mac DOT com (26/33) May 12 2004 Here's something interesting for everyone. I took the heapsort.d exampl...

J Anderson (5/41) May 12 2004 Interesting.

resistor AT nospam DOT mac DOT com (49/56) May 12 2004 begin 0644 test.d

=?iso-8859-1?q?Knud_S=F8rensen?= (3/8) May 12 2004 Why are your not using introspective sort ??

Walter (3/11) May 12 2004 Actually, it does just that (switch to heapsort as necessary).

=?ISO-8859-1?Q?J=F6rg_R=FCppel?= (7/20) May 12 2004 I think I've read in the docs that the D compiler autodetects if a

Walter (6/24) May 12 2004 usage

=?ISO-8859-1?Q?J=F6rg_R=FCppel?= (3/13) May 13 2004 Thanks, that's all I wanted to hear ;)

Kevin Bealer (9/17) May 11 2004 If I have a container of objects that implement interface Printable(), h...

christopher diggins (23/40) May 11 2004 bytes

Kevin Bealer (12/17) May 11 2004 Heron's approach of non-inheritance binding is interesting; I take it t...

christopher diggins (15/31) May 12 2004 I

Walter (21/24) May 12 2004 fact

christopher diggins (19/43) May 13 2004 that

Andy Friesen (4/15) May 11 2004 This is completely tangental, but Microsoft's C++ compiler completes the...
Matthew (4/19) May 11 2004 Why have you not used DMC++ for the Heron and C++? That way the comparis...

christopher diggins (13/15) May 11 2004 would

Walter (9/20) May 11 2004 comparison

christopher diggins (10/32) May 13 2004 compilers

C (4/18) May 11 2004 Impressive speeds on Heron! Id like to see so more benchmarks :).

"christopher diggins" <cdiggins users.sourceforge.net> writes:

I have written an object oriented Heapsort in D at
http://www.heron-language.com/benchmarks/heapsort-d.html to compare the
performance of object oriented code in several languages. The results were
somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I
was hoping that there might be some suggestions from members of this group
about how I might improve the code I wrote while keeping the overall
algorithm and design the same.

Please note that the program is a port of the program at
http://www.heron-language.com/benchmarks/heapsort-heron.html and is
discussed further at http://www.heron-language.com/benchmarks/index.html

TIA
-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 11 2004

Andy Friesen <andy ikagames.com> writes:

christopher diggins wrote:
 I have written an object oriented Heapsort in D at
 http://www.heron-language.com/benchmarks/heapsort-d.html to compare the
 performance of object oriented code in several languages. The results were
 somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I
 was hoping that there might be some suggestions from members of this group
 about how I might improve the code I wrote while keeping the overall
 algorithm and design the same.
 
 Please note that the program is a port of the program at
 http://www.heron-language.com/benchmarks/heapsort-heron.html and is
 discussed further at http://www.heron-language.com/benchmarks/index.html

The time decreases significantly if you use both -O and -release.

  -- andy

May 11 2004

"Walter" <newshound digitalmars.com> writes:

Make sure to compile with:
    -release -O -inline

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7qvo6$1256$1 digitaldaemon.com...
 I have written an object oriented Heapsort in D at
 http://www.heron-language.com/benchmarks/heapsort-d.html to compare the
 performance of object oriented code in several languages. The results were
 somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I
 was hoping that there might be some suggestions from members of this group
 about how I might improve the code I wrote while keeping the overall
 algorithm and design the same.

 Please note that the program is a port of the program at
 http://www.heron-language.com/benchmarks/heapsort-heron.html and is
 discussed further at http://www.heron-language.com/benchmarks/index.html

 TIA
 -- 
 Christopher Diggins
 http://www.cdiggins.com
 http://www.heron-language.com

May 11 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

Thanks for the feedback, Walter. I had only used -O but not -inline
and -release which sped up the execution to 703 msec and the memory usage
dropped to 2MB on my system. I have now updated the web page as a result.
There is still 4 bytes per object that I can't account for. I count 8 bytes
for the vector data, 4 bytes for the pointer to the object, 4 bytes for the
vtable in the object, which only brings me to 16 bytes per object, what is
the other 4 bytes for? Is there a way to allocate 100,000 objects at once
rather then one by one?

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 11 2004

Helmut Leitner <helmut.leitner wikiservice.at> writes:

christopher diggins wrote:
 
 Thanks for the feedback, Walter. I had only used -O but not -inline
 and -release which sped up the execution to 703 msec and the memory usage
 dropped to 2MB on my system. I have now updated the web page as a result.
 There is still 4 bytes per object that I can't account for. I count 8 bytes
 for the vector data, 4 bytes for the pointer to the object, 4 bytes for the
 vtable in the object, which only brings me to 16 bytes per object, what is
 the other 4 bytes for? 

Each interface has a 4 byte/object cost.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

May 11 2004

"Ben Hinkle" <bhinkle mathworks.com> writes:

Every D object has a synchronized monitor that is right after the vtable.
That is a pointer to a platform-specific mutex of some sort.
Structs don't have any vtable or monitor so they would be the most efficient
memory-wise but wouldn't support any dynamic binding or synchronization.

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7r63p$1bv1$1 digitaldaemon.com...
 Thanks for the feedback, Walter. I had only used -O but not -inline
 and -release which sped up the execution to 703 msec and the memory usage
 dropped to 2MB on my system. I have now updated the web page as a result.
 There is still 4 bytes per object that I can't account for. I count 8

bytes
 for the vector data, 4 bytes for the pointer to the object, 4 bytes for

the
 vtable in the object, which only brings me to 16 bytes per object, what is
 the other 4 bytes for? Is there a way to allocate 100,000 objects at once
 rather then one by one?

 -- 
 Christopher Diggins
 http://www.cdiggins.com
 http://www.heron-language.com

May 11 2004

"Walter" <newshound digitalmars.com> writes:

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7r63p$1bv1$1 digitaldaemon.com...
 Thanks for the feedback, Walter. I had only used -O but not -inline
 and -release which sped up the execution to 703 msec and the memory usage
 dropped to 2MB on my system. I have now updated the web page as a result.

Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
www.digitalmars.com/d/ ? You'll also get faster results if you use 'final'
on all member functions that are not overrided (essentially all the
functions you don't mark as 'virtual' in the C++ version).

 There is still 4 bytes per object that I can't account for. I count 8

bytes
 for the vector data, 4 bytes for the pointer to the object, 4 bytes for

the
 vtable in the object, which only brings me to 16 bytes per object, what is
 the other 4 bytes for?

It's for the monitor.

 Is there a way to allocate 100,000 objects at once
 rather then one by one?

Yes. Implement a new() function for the class, and have it parcel out bits
from a preallocated array.

May 11 2004

Daniel Horn <hellcatv hotmail.com> writes:

Walter wrote:
 "christopher diggins" <cdiggins users.sourceforge.net> wrote in message
 news:c7r63p$1bv1$1 digitaldaemon.com...
 
Thanks for the feedback, Walter. I had only used -O but not -inline
and -release which sped up the execution to 703 msec and the memory usage
dropped to 2MB on my system. I have now updated the web page as a result.

 
 
 Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
 www.digitalmars.com/d/ ? You'll also get faster results if you use 'final'
 on all member functions that are not overrided (essentially all the
 functions you don't mark as 'virtual' in the C++ version).
 
 
There is still 4 bytes per object that I can't account for. I count 8

 
 bytes
 
for the vector data, 4 bytes for the pointer to the object, 4 bytes for

 
 the
 
vtable in the object, which only brings me to 16 bytes per object, what is
the other 4 bytes for?

 
 
 It's for the monitor.
 
 
Is there a way to allocate 100,000 objects at once
rather then one by one?

 
 
 Yes. Implement a new() function for the class, and have it parcel out bits
 from a preallocated array.
 
 
 

That's a bit tedius, especially if your class may or may not need to be 
in an array (or you may wish to have the same class in 3 different 
packed arrays)
clearly you could make some sort of template class that inherits and 
make that have a "new" operator that does the parceling...
but it's a hack that only lets me have a single stack per template class
(what if I want to have many stacks of said objects) I could have a 
global "current stack" pointer and set it to my next available stack, 
but this is beginning to sound more and more contrived.

why not just do a structure then--

Do structs support static inheritance (I guess they will with mixins?)

May 11 2004

"Walter" <newshound digitalmars.com> writes:

"Daniel Horn" <hellcatv hotmail.com> wrote in message
news:c7ra0p$1hr4$1 digitaldaemon.com...
Is there a way to allocate 100,000 objects at once
rather then one by one?

 Yes. Implement a new() function for the class, and have it parcel out


bits
 from a preallocated array.

 That's a bit tedius, especially if your class may or may not need to be
 in an array (or you may wish to have the same class in 3 different
 packed arrays)

I do the same sorts of optimizations in C++ when I want performance. Storage
allocation is always a ripe target for performance tuning.

 clearly you could make some sort of template class that inherits and
 make that have a "new" operator that does the parceling...
 but it's a hack that only lets me have a single stack per template class
 (what if I want to have many stacks of said objects) I could have a
 global "current stack" pointer and set it to my next available stack,
 but this is beginning to sound more and more contrived.

That's why the general purpose storage allocator is the default solution.
But when speed is your master, writing custom allocators can yield big
benefits, and that's why D supports doing it.

 why not just do a structure then--

Structs can't inherit from an interface.

 Do structs support static inheritance (I guess they will with mixins?)

Mixins have the potential to do this. Stay tuned!

May 11 2004

-scooter- <scottm cs.ucla.edu> writes:

Walter wrote:

 I do the same sorts of optimizations in C++ when I want performance. Storage
 allocation is always a ripe target for performance tuning.

One would think so, but an excellent paper from OOPSLA '02 has hard evidence 
to the contrary: Reconsidering custom memory allocation (Berger, et. al). 
Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

The only time it seems that a custom memory allocator wins is in the "bulk 
free" case, when an entire heap of objects (like a compiler's AST for a 
class <hint>) gets deallocated. Otherwise, custom storage management is 
almost a complete waste of time.

I haven't taken Berger's HeapLayers or reaps allocation libraries out for a 
spin yet, but it looks interesting.


-scooter

May 12 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c7tlva$22e8$1 digitaldaemon.com>, -scooter- says...
Walter wrote:

 I do the same sorts of optimizations in C++ when I want performance. Storage
 allocation is always a ripe target for performance tuning.

One would think so, but an excellent paper from OOPSLA '02 has hard evidence 
to the contrary: Reconsidering custom memory allocation (Berger, et. al). 
Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

The only time it seems that a custom memory allocator wins is in the "bulk 
free" case, when an entire heap of objects (like a compiler's AST for a 
class <hint>) gets deallocated. Otherwise, custom storage management is 
almost a complete waste of time.

I haven't taken Berger's HeapLayers or reaps allocation libraries out for a 
spin yet, but it looks interesting.


-scooter

In a GC program, you can avoid GC overhead by keeping free lists of anything you
allocate.  The garbage collector won't run because you aren't calling new().
This was a tip for "real time" programming.  Note that you are avoiding benefit

the risk of old pointers to supposedly "new" objects, if it wasn't ready to be
deleted.  On the plus side, if you are not sure its garbage, dont put it on the
list: the GC will eventually hunt it down if you are, in fact, leaking.

Kevin

May 12 2004

Scott Michel <scottm cs.ucla.edu> writes:

Kevin Bealer wrote:
 In a GC program, you can avoid GC overhead by keeping free lists of
 anything you
 allocate.  The garbage collector won't run because you aren't calling
 new().
 This was a tip for "real time" programming.  Note that you are avoiding
 benefit

 #also run
 the risk of old pointers to supposedly "new" objects, if it wasn't ready
 to be
 deleted.  On the plus side, if you are not sure its garbage, dont put it
 on the list: the GC will eventually hunt it down if you are, in fact,
 leaking.

Any "modern" general-purpose allocator already does this since the late 80s.
Preventing free space fragmentation turns out to be the bigger problem.

Besides, if you keep a free list, then you're into having to play "God" with
object ctor and dtor. :-)

May 13 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c8075s$1ff$1 digitaldaemon.com>, Scott Michel says...
Kevin Bealer wrote:
 In a GC program, you can avoid GC overhead by keeping free lists of
 anything you
 allocate.  The garbage collector won't run because you aren't calling
 new().
 This was a tip for "real time" programming.  Note that you are avoiding
 benefit

 #also run
 the risk of old pointers to supposedly "new" objects, if it wasn't ready
 to be
 deleted.  On the plus side, if you are not sure its garbage, dont put it
 on the list: the GC will eventually hunt it down if you are, in fact,
 leaking.

Any "modern" general-purpose allocator already does this since the late 80s.
Preventing free space fragmentation turns out to be the bigger problem.

Besides, if you keep a free list, then you're into having to play "God" with
object ctor and dtor. :-)

I've usually heard that phrase when talking about cloning or life/death
issues... I'm not sure what rights are reserved to God when talking about
writing software...  If you mean that the free list can run very large and hog
memory, you can of course provide a size bound.  If it gets over N elements,
clear the "top" pointer, or selectively drop elements down to N.

Regarding fragmentation: does the D collector actually copy collect, or does it
need to be conservative?  I would suspect that using C pointers and copy
collection are two great tastes that don't taste great together.

Kevin

May 13 2004

-scooter- <scottm cs.ucla.edu> writes:

Kevin Bealer wrote:

 In article <c8075s$1ff$1 digitaldaemon.com>, Scott Michel says...
 
Any "modern" general-purpose allocator already does this since the late 80s.
Preventing free space fragmentation turns out to be the bigger problem.

Besides, if you keep a free list, then you're into having to play "God" with
object ctor and dtor. :-)

 
 
 I've usually heard that phrase when talking about cloning or life/death
 issues... I'm not sure what rights are reserved to God when talking about
 writing software...  If you mean that the free list can run very large and hog
 memory, you can of course provide a size bound.  If it gets over N elements,
 clear the "top" pointer, or selectively drop elements down to N.

My point was that you become responsible for object lifetimes; you're 
responsible for their proper birth and death (and possibly inadvertant 
ressurection... ok, bad pun sequence.) So keeping your own free list means 
that you might have faster allocation, but you now become responsible for a 
lot more than just allocation. But as I pointed out, modern mallocs already 
do this.

 Regarding fragmentation: does the D collector actually copy collect, or does it
 need to be conservative?  I would suspect that using C pointers and copy
 collection are two great tastes that don't taste great together.

Wrong type of fragmentation. :-)

This happens when the allocator doesn't have specifically sized areanas for 
certain sized allocations (usually powers of 2, n > 4). If the allocator 
just tries to find the best fit, breaking up previously allocated blocks 
without coalescing adjacent freespace, you end up with a very fragmented 
memory map.


-scooter

May 17 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c8bg4j$1a4b$1 digitaldaemon.com>, -scooter- says...
Kevin Bealer wrote:

 In article <c8075s$1ff$1 digitaldaemon.com>, Scott Michel says...
 
Any "modern" general-purpose allocator already does this since the late 80s.
Preventing free space fragmentation turns out to be the bigger problem.

Besides, if you keep a free list, then you're into having to play "God" with
object ctor and dtor. :-)

 
 
 I've usually heard that phrase when talking about cloning or life/death
 issues... I'm not sure what rights are reserved to God when talking about
 writing software...  If you mean that the free list can run very large and hog
 memory, you can of course provide a size bound.  If it gets over N elements,
 clear the "top" pointer, or selectively drop elements down to N.

My point was that you become responsible for object lifetimes; you're 
responsible for their proper birth and death (and possibly inadvertant 
ressurection... ok, bad pun sequence.) So keeping your own free list means 
that you might have faster allocation, but you now become responsible for a 
lot more than just allocation. But as I pointed out, modern mallocs already 
do this.

Okay, that makes sense.

 Regarding fragmentation: does the D collector actually copy collect, or does it
 need to be conservative?  I would suspect that using C pointers and copy
 collection are two great tastes that don't taste great together.

Wrong type of fragmentation. :-)

This happens when the allocator doesn't have specifically sized areanas for 
certain sized allocations (usually powers of 2, n > 4). If the allocator 
just tries to find the best fit, breaking up previously allocated blocks 
without coalescing adjacent freespace, you end up with a very fragmented 
memory map.

I was thinking "locality".  Okay, this makes sense.  Yeah, you can use a large
program allocator that uses a large number of free lists and chops up blocks
(completely unsuitable for small, memory tight, programs), or you can use
small-piece allocators and tolerate fragmentation.  The ugliest way is to tweak
the algorithm to use the same sized object (by making buffers a certain size).

Kevin

-scooter

May 18 2004

-scooter- <scottm cs.ucla.edu> writes:

Kevin Bealer wrote:
Regarding fragmentation: does the D collector actually copy collect, or does it
need to be conservative?  I would suspect that using C pointers and copy
collection are two great tastes that don't taste great together.

Wrong type of fragmentation. :-)

This happens when the allocator doesn't have specifically sized areanas for 
certain sized allocations (usually powers of 2, n > 4). If the allocator 
just tries to find the best fit, breaking up previously allocated blocks 
without coalescing adjacent freespace, you end up with a very fragmented 
memory map.

 
 
 I was thinking "locality".  Okay, this makes sense.  Yeah, you can use a large
 program allocator that uses a large number of free lists and chops up blocks
 (completely unsuitable for small, memory tight, programs), or you can use
 small-piece allocators and tolerate fragmentation.  The ugliest way is to tweak
 the algorithm to use the same sized object (by making buffers a certain size).

You're thinking of customized slab allocators or separate heaps for a 
particular object type or objects of the same size. Once you allocated from 
the custom heap, you return it and keep track of the free objects.

Yup, modern mallocs (FreeBSD's libc version and Lea's, commonly found in 
Linux's glibc and usable under Win32) do this already; that's why they have 
arenas. The allocators you're thinking about that has really nasty 
allocation algorithms are the old Microsoft C compiler's malloc and the old 
Kingsley malloc in past BSDs.

Basically, there's really not much that's exciting about memory allocation. 
Berger's paper is probably the most exciting research I've seen in 15 years 
and it solves an interesting problem when you really need custom allocation 
or want a region of related objects allocated, initialized, freed and 
destroyed as a region. The current GP allocators do a really good job and 
tweaking won't actually buy you much extra performance.

May 19 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c8hb2a$1o63$1 digitaldaemon.com>, -scooter- says...
Kevin Bealer wrote:
Regarding fragmentation: does the D collector actually copy collect, or does it
need to be conservative?  I would suspect that using C pointers and copy
collection are two great tastes that don't taste great together.

Wrong type of fragmentation. :-)

This happens when the allocator doesn't have specifically sized areanas for 
certain sized allocations (usually powers of 2, n > 4). If the allocator 
just tries to find the best fit, breaking up previously allocated blocks 
without coalescing adjacent freespace, you end up with a very fragmented 
memory map.

 
 
 I was thinking "locality".  Okay, this makes sense.  Yeah, you can use a large
 program allocator that uses a large number of free lists and chops up blocks
 (completely unsuitable for small, memory tight, programs), or you can use
 small-piece allocators and tolerate fragmentation.  The ugliest way is to tweak
 the algorithm to use the same sized object (by making buffers a certain size).

You're thinking of customized slab allocators or separate heaps for a 
particular object type or objects of the same size. Once you allocated from 
the custom heap, you return it and keep track of the free objects.

Yup, modern mallocs (FreeBSD's libc version and Lea's, commonly found in 
Linux's glibc and usable under Win32) do this already; that's why they have 
arenas. The allocators you're thinking about that has really nasty 
allocation algorithms are the old Microsoft C compiler's malloc and the old 
Kingsley malloc in past BSDs.

Basically, there's really not much that's exciting about memory allocation. 
Berger's paper is probably the most exciting research I've seen in 15 years 
and it solves an interesting problem when you really need custom allocation 
or want a region of related objects allocated, initialized, freed and 
destroyed as a region. The current GP allocators do a really good job and 
tweaking won't actually buy you much extra performance.

Which makes perfect sense.  I was writing one because I had a neat idea vis a
vis persistent objects and relative pointers.  You can create data in a subheap,
and mmap it in, providing an instant program "state load" effect.  The relative
pointers keep all objects connected.  Except of course virtual pointers; also
you need to rewrite every class to use the sub heap, everything on the subheap
needs to be connected to a "top" pointer, etc.  It works, its good, but its
special purpose.  A smart language space implementation could take advantage of
such a design, but it gets messy in portable-application space.

The other time you might want one is when you want a set of data (i.e. linked
list) to have really good locality.  But the solution to that is a vector or
plex.  (By plex, I mean a list-of-small-vectors, with a list-like interface.
They "should" have good performance in many common cases; they were used in a
GNU container library that went away when STL came out.)  Vectors are probably
good enough for this purpose.

I guess the general philosophy of all that is that we should start treating heap
storage the way we treat FS storage.  If the system runs out of main memory, and
all of your data is organized by memory pages, then OS page-out wont cause
thrashing until every page of system memory is in-use.  Locality is treated as
foremost in disk container algorithm design.  Swap thrashing occurs because it
is not treated that way in heap container class design.

To a degree, in most cases, always ski in control, etc.

Kevin

May 20 2004

"Walter" <newshound digitalmars.com> writes:

"-scooter-" <scottm cs.ucla.edu> wrote in message
news:c7tlva$22e8$1 digitaldaemon.com...
 Walter wrote:

 I do the same sorts of optimizations in C++ when I want performance.


Storage
 allocation is always a ripe target for performance tuning.

 One would think so, but an excellent paper from OOPSLA '02 has hard

evidence
 to the contrary: Reconsidering custom memory allocation (Berger, et. al).
 Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

 The only time it seems that a custom memory allocator wins is in the "bulk
 free" case, when an entire heap of objects (like a compiler's AST for a
 class <hint>) gets deallocated. Otherwise, custom storage management is
 almost a complete waste of time.

 I haven't taken Berger's HeapLayers or reaps allocation libraries out for

a
 spin yet, but it looks interesting.

The article is right in that sometimes performance optimizations yield
unexpected results. Just goes to show one should rely on a good performance
profiler when doing optimization.

May 12 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com
"Walter" <newshound digitalmars.com> wrote in message
news:c7r9ku$1h7b$1 digitaldaemon.com...
 "christopher diggins" <cdiggins users.sourceforge.net> wrote in message
 news:c7r63p$1bv1$1 digitaldaemon.com...
 Thanks for the feedback, Walter. I had only used -O but not -inline
 and -release which sped up the execution to 703 msec and the memory


usage
 dropped to 2MB on my system. I have now updated the web page as a


result.
 Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
 www.digitalmars.com/d/ ? You'll also get faster results if you use 'final'
 on all member functions that are not overrided (essentially all the
 functions you don't mark as 'virtual' in the C++ version).

Done and done. The final keywords sped it up even more from 710 to 610 msec.

 There is still 4 bytes per object that I can't account for. I count 8

 bytes
 for the vector data, 4 bytes for the pointer to the object, 4 bytes for

 the
 vtable in the object, which only brings me to 16 bytes per object, what


is
 the other 4 bytes for?

 It's for the monitor.

What about the 4 bytes on top of that for the interfaces what are they for?
Any plans on changing this in the future, or making a compile switch?

 Is there a way to allocate 100,000 objects at once
 rather then one by one?

 Yes. Implement a new() function for the class, and have it parcel out bits
 from a preallocated array.

I won't be including that in the example then, it seems like too much of a
roundabout way to do things for the scope of the program I included.

Thanks again Walter!

Christopher Diggins
http://www.heron-language.com

May 11 2004

"Walter" <newshound digitalmars.com> writes:

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7rfbm$1q4l$1 digitaldaemon.com...
 Done and done. The final keywords sped it up even more from 710 to 610

msec.
 There is still 4 bytes per object that I can't account for. I count 8

 bytes
 for the vector data, 4 bytes for the pointer to the object, 4 bytes



for
 the
 vtable in the object, which only brings me to 16 bytes per object,



what
 is
 the other 4 bytes for?

 It's for the monitor.

 What about the 4 bytes on top of that for the interfaces what are they

for?

Each interface derived from has a vptr allocated for it.

 Any plans on changing this in the future, or making a compile switch?

No, it's needed to get reasonable performance out of interfaces.

 Is there a way to allocate 100,000 objects at once
 rather then one by one?

 Yes. Implement a new() function for the class, and have it parcel out


bits
 from a preallocated array.

 I won't be including that in the example then, it seems like too much of a
 roundabout way to do things for the scope of the program I included.

 Thanks again Walter!

You're welcome. You should also note that arrays have a built-in sort
property, which uses quicksort. It'll blow away heapsort <g>. Doing that
cuts the sort time on my machine from 453 ms to 63 ms. Source attached.



begin 666 sort2.d



M('9O:60 4V5T6%DH:6YT(' L(&EN="!Y*2![(&UX(#T >#L (&UY(#T >3L 
M?3L-"B  9FEN86P :6YT($=E=% H*2![(')E='5R;B!M>#L ?3L-"B  9FEN
M86P :6YT($=E=%DH*2![(')E='5R;B!M>3L ?3L-"B  9FEN86P :6YT($=E
M=$UA9VYI='5D92 I('L <F5T=7)N("AM>"IM>"D *R H;7DJ;7DI.R!].PT*

M92A/8FIE8W0J(' I('L <F5T=7)N($=E=$UA9VYI='5D92 I("T *&-A<W0H

M3V)J96-T*B!X*2![(')E='5R;B!'971-86=N:71U9&4H*2 M("AC87-T*%9E


M('9O:60 4W=A<"AI;G0 :2P :6YT(&HI.PT*("!A8G-T<F%C="!I;G0 0V]M
M<&%R92AI;G0 :2P :6YT(&HI.PT*("!A8G-T<F%C="!I;G0 0V]U;G0H*3L-
M"GT-"F-L87-S($%R<F%Y*%0I(#H
M:60 4V5T3&5N9W1H*&EN="!N*2![(&T /2!N97< 5%MN73L 9F]R*&EN="!I
M/3 [(&D\;CL :2LK*2!M6VE=(#T

M070H:6YT(&DL(%0 >"D >R!M6VE=(#T >#L ?3L-"B  9FEN86P =F]I9"!3
M=V%P*&EN="!I+"!I;G0 :BD >R!4('1M<" ]($=E=$%T*&DI.R!3971!="AI

M;VUP87)E*&EN="!I+"!I;G0 :BD >R!R971U<FX 1V5T070H:2DN0V]M<&%R
M92AC87-T*$]B:F5C="HI1V5T070H:BDI.R!].PT*("!F:6YA;"!I;G0 0V]U
M;G0H*2![(')E='5R;B!M+FQE;F=T:#L
M($AE87!I9GDH25-O<G1A8FQE(' L(&EN="!I+"!I;G0 ;DAE87!3:7IE*0T*
M>PT*("!I;G0 ;DQE9G0 /2!I("H ,CL-"B  :6YT(&Y2:6=H=" ](&Y,969T
M("L ,3L-"B  :6YT(&Y,87)G97-T(#T :3L-"B  :68 *"AN3&5F=" \(&Y(
M96%P4VEZ92D )B8 *' N0V]M<&%R92AN3&5F="P ;DQA<F=E<W0I(#X ,"DI

M(#P
M(#X ,"DI('L-"B  ("!N3&%R9V5S=" ](&Y2:6=H=#L-"B  ?0T*("!I9B H

M("  2&5A<&EF>2AX+"!N3&%R9V5S="P ;DAE87!3:7IE*3L-"B  ?0T*?0T*



M4V]R=&%B;&4 >"D-"GL-"B  0G5I;&1(96%P*' I.PT*("!F;W( *&EN="!I
M(#T



M+"!I+3$I.PT*("  (&EF("AN5&UP(#P ,"D-"B  ("  (')E='5R;B!F86QS

M(&-O;G-T(&EN="!!4E)!65]325I%(#T


M<C)$/BD /2 E9%QN(BP *'8N8VQA<W-I;F9O+FEN:70N;&5N9W1H("H 82YL



M5F5C=&]R,D0 =CL 82D-"B  >PT*("  ('8 /2!N97< 5F5C=&]R,D0H*3L-




M,B ](&=E=%540W1I;64H*3L-"B  ("!I;G0 ;G5M7W1I8VMS(#T
M,3L-"B  ("!F;&]A="!N=6U?;7-E8R ]("AC87-T*&9L;V%T*6YU;5]T:6-K
M<R O(&-A<W0H9FQO870I5&EC:W-097)396-O;F0I("H 8V%S="AF;&]A="DQ


M+R]P<FEN=&8H(F%R<F%Y(&ES('-O<G1E9" ]("5B7&XB+"!)<U-O<G1E9"AA


`
end

May 11 2004

"Jan-Eric Duden" <jeduden whisset.com> writes:

 You're welcome. You should also note that arrays have a built-in sort
 property, which uses quicksort. It'll blow away heapsort <g>. Doing that
 cuts the sort time on my machine from 453 ms to 63 ms. Source attached.

I am wondering if the difference of execution time is the result of using
different algorithms (quicksort or heapsort). Or can the code for the sort
property be better optimized than a "normal" implementation of a sort
algorithm?

May 12 2004

"Walter" <newshound digitalmars.com> writes:

"Jan-Eric Duden" <jeduden whisset.com> wrote in message
news:c7sioo$cvh$1 digitaldaemon.com...
 You're welcome. You should also note that arrays have a built-in sort
 property, which uses quicksort. It'll blow away heapsort <g>. Doing that
 cuts the sort time on my machine from 453 ms to 63 ms. Source attached.

 I am wondering if the difference of execution time is the result of using
 different algorithms (quicksort or heapsort). Or can the code for the sort
 property be better optimized than a "normal" implementation of a sort
 algorithm?

I'd have to do testing to make sure, but I'm pretty sure it's due to
quicksort being algorithmically superior. One of the reasons I built sort
into arrays is because people implement inefficient sorts over and over, and
I wanted to make it real easy to use the best algorithm. (A job I had once
was to optimize a large program written by others; in poking around in it I
found 3 different hacky implementations of bubblesort. Replaced them all
with a call to C's qsort(), and I was an instant hero for the speedup <g>.)

May 12 2004

"Jan-Eric Duden" <jeduden whisset.com> writes:

 I'd have to do testing to make sure, but I'm pretty sure it's due to
 quicksort being algorithmically superior. One of the reasons I built sort
 into arrays is because people implement inefficient sorts over and over,

and
 I wanted to make it real easy to use the best algorithm. (A job I had once
 was to optimize a large program written by others; in poking around in it

I
 found 3 different hacky implementations of bubblesort. Replaced them all
 with a call to C's qsort(), and I was an instant hero for the speedup

<g>.)

I thought qsort and heapsort work about at the same speed.

I thought the most important difference between a heapsort and q-sort
implementation in C is the order of the algorithms:
- heapsort has roughly the same performance for all array instanciations
O(n*log n)
- qsort is a lot slower O(n^2) on some instanciations and a lot faster on
almost sorted arrays O(n). the average is of course: O(n*log n)

Maybe the difference is that the C q-sort implementation is highly
optimizied and the heapsort implementation that was
used is not?

May 12 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

"Jan-Eric Duden" <jeduden whisset.com> wrote in message
news:c7spim$n4h$1 digitaldaemon.com...
 I'd have to do testing to make sure, but I'm pretty sure it's due to
 quicksort being algorithmically superior. One of the reasons I built


sort
 into arrays is because people implement inefficient sorts over and over,

 and
 I wanted to make it real easy to use the best algorithm. (A job I had


once
 was to optimize a large program written by others; in poking around in


it
 I
 found 3 different hacky implementations of bubblesort. Replaced them all
 with a call to C's qsort(), and I was an instant hero for the speedup

 <g>.)

 I thought qsort and heapsort work about at the same speed.

 I thought the most important difference between a heapsort and q-sort
 implementation in C is the order of the algorithms:
 - heapsort has roughly the same performance for all array instanciations
 O(n*log n)
 - qsort is a lot slower O(n^2) on some instanciations and a lot faster on
 almost sorted arrays O(n). the average is of course: O(n*log n)

 Maybe the difference is that the C q-sort implementation is highly
 optimizied and the heapsort implementation that was
 used is not?


Both quicksort and heapsort are O(n log n) on average but the constants
involved in Quicksort are much better. Quicksort though depends a lot on
whether the partitioning subroutine is balanced. The differences are
discussed at length in Introduction to Algorithms by Cormen, Leiserson and
Rivest ( http://theory.lcs.mit.edu/~clr/ )

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 12 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

christopher diggins wrote:

<snip>
 - qsort is a lot slower O(n^2) on some instanciations and a lot 
 faster on almost sorted arrays O(n). the average is of course: 
 O(n*log n)


Can you give an example of an O(n) case of quicksort?

 Maybe the difference is that the C q-sort implementation is highly
 optimizied and the heapsort implementation that was used is not?

 
 Both quicksort and heapsort are O(n log n) on average

<snip>

Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).

When benchmarking a sorting algorithm, it's helpful to try a number of 
random cases, as well as the control cases of already sorted and already 
sorted backwards.

If you're benchmarking algorithms, rather than compilers or languages, 
you should try to use the same compiler, the same language and the same 
compiler options.  As such, it makes limited sense to compare a 
ready-made C library function of one algorithm with your own 
implementation of another.

Of course, if you're comparing languages/compilers, you should compare 
implementations of the same algorithm, and set as similar levels of 
optimisation as possible (ITMS).

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 12 2004

"Jan-Eric Duden" <jeduden whisset.com> writes:

 Can you give an example of an O(n) case of quicksort?

A sorted array. You only get O(n) in a slightly modified version of q-sort
that quits if no swaps occured.

 Maybe the difference is that the C q-sort implementation is highly
 optimizied and the heapsort implementation that was used is not?

 Both quicksort and heapsort are O(n log n) on average

 <snip>

 Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).

Yep, that's why sometimes heapsort is preferred over quicksort.

My question was more related to the constants. I wasn't aware that heapsort
is that much slower.

May 12 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Jan-Eric Duden wrote:

Can you give an example of an O(n) case of quicksort?

 
 A sorted array. You only get O(n) in a slightly modified version of q-sort
 that quits if no swaps occured.

By 'swaps' do you mean the swapping of an element with the hole into 
which the key element is to go?

How can one element being already in the right place imply that the 
whole array is sorted?

<snip>
 Yep, that's why sometimes heapsort is preferred over quicksort.
 
 My question was more related to the constants. I wasn't aware that heapsort
 is that much slower.

What constants?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 12 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c7tbbi$1i8q$1 digitaldaemon.com>, Stewart Gordon says...
Jan-Eric Duden wrote:

Can you give an example of an O(n) case of quicksort?

 
 A sorted array. You only get O(n) in a slightly modified version of q-sort
 that quits if no swaps occured.

By 'swaps' do you mean the swapping of an element with the hole into 
which the key element is to go?

How can one element being already in the right place imply that the 
whole array is sorted?

<snip>
 Yep, that's why sometimes heapsort is preferred over quicksort.
 
 My question was more related to the constants. I wasn't aware that heapsort
 is that much slower.

What constants?

O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations required
is affected by the size of the input set.  This is called "big O notation".

We say a sort takes O(N^2) when the time required is proportional to the square
of the input set size (N).  This is called "order of N squared".

The real speed of the algorithm is something like K * N^2.  If one algorithm is
twice as fast as another for any value of N, then it is said to have K=2 while
the other has K=1.  K is the constant.

All of this analysis ignores effects like cache locality, because they are
difficult to summarize in any simple and general way.  Normally, O(...), or
"order" is the most important criteria.  If O(...) is the same for both designs,
then the secondary effects apply.

For large data sets, cache and locality issues are most important, whereas for
small data sets, the "constant" is probably more important.  "Small" here
probably means "fits, or almost fits, in memory".

Special data sets can trip up some algorithms, for example, carelessly written
qsort will take forever on already-sorted data.

Bubble sort is the classic "bad" sorting algorithm.  It still has its defenders,
for special cases.  Like insertion sort, it probably survives because it is easy
to code for small hand written linked lists in languages like C.

Often large, old, programs in C will have dozens or hundreds of data structures
of the type "small-structs-in-a-short-list", half of these will have a ten line
hand coded insertion sort (shudder).  "Don't get out of the boat".

I always liked "shell sorts" and "natural merge sorts" but noone uses those any
more.  Probably because they're really slow... oh well to each his own I guess.

Kevin

May 12 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Kevin Bealer wrote:

<snip>
 O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations
required
 is affected by the size of the input set.  This is called "big O notation".
 
 We say a sort takes O(N^2) when the time required is proportional to the square
 of the input set size (N).  This is called "order of N squared".

You seldom get direct proportionality in practice.  In fact, it means 
that there exists some K for which F(N) < K*N^2, for all N>N0 for some N0.

And if you replace < with >, you get big Omega notation.  And if you 
consider both together, you get big Theta notation.

I guess that people commonly use big O when they really mean big Theta, 
probably because O is on most keyboards/character sets, and considering 
that anything Theta(N^2) is O(N^2) anyway.

 The real speed of the algorithm is something like K * N^2.  If one algorithm is
 twice as fast as another for any value of N, then it is said to have K=2 while
 the other has K=1.  K is the constant.

Yes.  Though it would depend on such things as the relative costs of 
comparison and assignment/swapping.

<snip>
 Bubble sort is the classic "bad" sorting algorithm.  It still has its
defenders,
 for special cases.  Like insertion sort, it probably survives because it is
easy
 to code for small hand written linked lists in languages like C.

I guess for linked lists, insertion sort is as easy or perhaps easier to 
code than bubble sort.

Just thinking about it, on a linked list, insertion sort has the 
advantage that an insertion is O(1).  OTOH, on an array it has the 
advantage that a search for the right place is O(log N).  I suppose the 
constant for the whole alg is smaller for the linked list....

I guess selection sort (or shaker sort, which I hadn't heard of before 
but is a perfectly intuitive improvement I had thought up) is equally 
easy and thus defendable.  It even has its strenghts, such as sorting 
tournament crosstables....

<snip>
 I always liked "shell sorts" and "natural merge sorts" but noone uses those any
 more.  Probably because they're really slow... oh well to each his own I guess.

Strange - JA's source shows shell sort being marginally quicker than 
heap sort.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 13 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Stewart Gordon wrote:

 Yes.  Though it would depend on such things as the relative costs of 
 comparison and assignment/swapping.

<snip>

On of the great things about ADA generics is that you could easily 
replace the comparison operator and do tests on how many comparisons 
were actually performed.  Of course you could box all the variables you 
use in the sort algorithm but that's just not the same.  I wish D 
templates could accept operators/functions as parameters.  At the very 
least its a good way to profile that particular aspect of an algorithm.


-- 
-Anderson: http://badmama.com.au/~anderson/

May 13 2004

"Walter" <newshound digitalmars.com> writes:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:c7vnbp$2bvs$1 digitaldaemon.com...
  I wish D
 templates could accept operators/functions as parameters.

It can accept functions as parameters - both as function pointers, and as
aliases.

May 13 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Walter wrote:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:c7vnbp$2bvs$1 digitaldaemon.com...
  

 I wish D
templates could accept operators/functions as parameters.
    

It can accept functions as parameters - both as function pointers, and as
aliases.
  

Cool, now for operators....

-- 
-Anderson: http://badmama.com.au/~anderson/

May 14 2004

"Walter" <newshound digitalmars.com> writes:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:c81tqo$2h9g$1 digitaldaemon.com...
 Walter wrote:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:c7vnbp$2bvs$1 digitaldaemon.com...


 I wish D
templates could accept operators/functions as parameters.

It can accept functions as parameters - both as function pointers, and as
aliases.

 Cool, now for operators....

You can use the equivalent name for the operator functions.

May 14 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Stewart Gordon wrote:

 I always liked "shell sorts" and "natural merge sorts" but noone uses 
 those any
 more.  Probably because they're really slow... oh well to each his 
 own I guess.


 Strange - JA's source shows shell sort being marginally quicker than 
 heap sort.

 Stewart.

If your talking about the website I provided, did you read that shell 
sort was slightly broken?


-- 
-Anderson: http://badmama.com.au/~anderson/

May 13 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

J Anderson wrote:
<snip>
 If your talking

My talking?

 about the website I provided, did you read that shell 
 sort was slightly broken?

No.  I still don't.  Where's that little bit of info buried then?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 13 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Stewart Gordon wrote:

 J Anderson wrote:
 <snip>

 If your talking


 My talking?

 about the website I provided, did you read that shell sort was 
 slightly broken?


 No.  I still don't.  Where's that little bit of info buried then?

 Stewart.

Whoops, your right.  Its the swap sort that is broken.

-- 
-Anderson: http://badmama.com.au/~anderson/

May 13 2004

Sean Kelly <sean f4.ca> writes:

Stewart Gordon wrote:
 
 Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).

But some versions of heapsort need a considerable memory overhead to 
run.  In general, quicksort is the better choice for average sorting 
situations.  People with special needs generally also know how to code 
for those needs :)

Sean

May 12 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Sean Kelly wrote:

 Stewart Gordon wrote:

 Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).


 But some versions of heapsort need a considerable memory overhead to 
 run.  In general, quicksort is the better choice for average sorting 
 situations.  People with special needs generally also know how to code 
 for those needs :)

 Sean

Right!  Even sort algorithms like short-circuit bubble sort can be 
useful if they are used in right application.  Short-circuit bubble sort 
is most useful when you know that most-of-the the array is sorted 
because then you get o(n). With standard quicksort you'd get O(n^2) for 
a sorted array.

-- 
-Anderson: http://badmama.com.au/~anderson/

May 12 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

J Anderson wrote:

<snip>
 Right!  Even sort algorithms like short-circuit bubble sort can be 
 useful if they are used in right application.  Short-circuit bubble sort 
 is most useful when you know that most-of-the the array is sorted 
 because then you get o(n). With standard quicksort you'd get O(n^2) for 
 a sorted array.

Even better for nearly sorted data is bidirectional short-circuit bubble 
sort.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 12 2004

C <qbert atari-soldiers.com> writes:

 Even better for nearly sorted data is bidirectional short-circuit bubble 
 sort.

Hmm, I couldn't find anything on this do you have some more info ?

Charlie

On Wed, 12 May 2004 17:09:30 +0100, Stewart Gordon <smjg_1998 yahoo.com> 
wrote:

 J Anderson wrote:

 <snip>
 Right!  Even sort algorithms like short-circuit bubble sort can be 
 useful if they are used in right application.  Short-circuit bubble 
 sort is most useful when you know that most-of-the the array is sorted 
 because then you get o(n). With standard quicksort you'd get O(n^2) for 
 a sorted array.

 Even better for nearly sorted data is bidirectional short-circuit bubble 
 sort.

 Stewart.



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

May 12 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

C wrote:

 Even better for nearly sorted data is bidirectional short-circuit 
 bubble sort.


 Hmm, I couldn't find anything on this do you have some more info ?

 Charlie

 On Wed, 12 May 2004 17:09:30 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com> wrote:

 J Anderson wrote:

 <snip>

 Right!  Even sort algorithms like short-circuit bubble sort can be 
 useful if they are used in right application.  Short-circuit bubble 
 sort is most useful when you know that most-of-the the array is 
 sorted because then you get o(n). With standard quicksort you'd get 
 O(n^2) for a sorted array.


 Even better for nearly sorted data is bidirectional short-circuit 
 bubble sort.

 Stewart.


In a bubble sort small elements are *bubbled* up to the top of the 
array.  In a bidirectional sort elements sink down to the bottom as 
well.  Of course short-circuit bubble sort still wins in some 
circumstances.

Here's an interesting page on sorting (includes bidirectional):
http://www.cs.ubc.ca/spider/harrison/Java/sorting-demo.html

BTW: In uni I had to do a project with 18 completely different sort 
algorithms.  So I really got to know them well.

-- 
-Anderson: http://badmama.com.au/~anderson/

May 12 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

C wrote:

 Even better for nearly sorted data is bidirectional short-circuit 
 bubble sort.

 
 
 Hmm, I couldn't find anything on this do you have some more info ?

<snip>

Several years ago I wrote a simple database application.  When adding a 
record, it would add it to the end of the file.  When deleting one it 
would fill the gap with the record from the end.  When updating it would 
just update in place.

The sort implementation was a bidirectional bubble sort.  Hence the 
records added since the last sort would bubble up to their correct 
positions, and the records moved to fill gaps would bubble down back 
towards the end.  Updated records (where the change rendered the record 
out of order) could bubble in either direction.

The algorithm was also quite suited to the program's rather simplistic 
dynamic record loading.  At a time it would keep 50 consecutive records 
in memory, moving the 'window' as necessary.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 13 2004

resistor AT nospam DOT mac DOT com <resistor_member pathlink.com> writes:

Here's something interesting for everyone.  I took the heapsort.d example, redid
it to be a bit more D-
like (used a true array rather than an array class, used opCmp instead of a
Compare method, etc.)

Anyways, I decided to test my version against Walter's QuickSort.  Here are the
results (the QuickSort 
and HeapSort files are identical except one calls HeapSort(a) and the other
calls a.sort):

DMD QuickSort - 93 msecs
DMD HeapSort - 62 msecs

GDC QuickSort - 76 msecs
GDC HeapSort - 62 msecs

The times are more or less constant across runs.  Only the GDC HeapSort varies,
and it only be an msec 
or two.

This brings to light two things: First, that the HeapSort is potentially more
viable than the QuickSort.  
And secondly, that the GDC backend is a good bit more powerful than the DMD
backend.  Also, I'm 
pretty sure that GDC's libphobos has debugging enabled, so it might be able to
go faster!

Comment? Thoughts?

Owen

P.S. I've attached the code for this test, in case I've made a silly mistake
that is favoring the HeapSort.

In article <c7sncs$k3e$1 digitaldaemon.com>, Walter says...
I'd have to do testing to make sure, but I'm pretty sure it's due to
quicksort being algorithmically superior. One of the reasons I built sort
into arrays is because people implement inefficient sorts over and over, and
I wanted to make it real easy to use the best algorithm. (A job I had once
was to optimize a large program written by others; in poking around in it I
found 3 different hacky implementations of bubblesort. Replaced them all
with a call to C's qsort(), and I was an instant hero for the speedup <g>.)

May 12 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

resistor AT nospam DOT mac DOT com wrote:

Here's something interesting for everyone.  I took the heapsort.d example, redid
it to be a bit more D-
like (used a true array rather than an array class, used opCmp instead of a
Compare method, etc.)

Anyways, I decided to test my version against Walter's QuickSort.  Here are the
results (the QuickSort 
and HeapSort files are identical except one calls HeapSort(a) and the other
calls a.sort):

DMD QuickSort - 93 msecs
DMD HeapSort - 62 msecs

GDC QuickSort - 76 msecs
GDC HeapSort - 62 msecs

The times are more or less constant across runs.  Only the GDC HeapSort varies,
and it only be an msec 
or two.

This brings to light two things: First, that the HeapSort is potentially more
viable than the QuickSort.  
And secondly, that the GDC backend is a good bit more powerful than the DMD
backend.  Also, I'm 
pretty sure that GDC's libphobos has debugging enabled, so it might be able to
go faster!

Comment? Thoughts?

Owen

P.S. I've attached the code for this test, in case I've made a silly mistake
that is favoring the HeapSort.

In article <c7sncs$k3e$1 digitaldaemon.com>, Walter says...
  

I'd have to do testing to make sure, but I'm pretty sure it's due to
quicksort being algorithmically superior. One of the reasons I built sort
into arrays is because people implement inefficient sorts over and over, and
I wanted to make it real easy to use the best algorithm. (A job I had once
was to optimize a large program written by others; in poking around in it I
found 3 different hacky implementations of bubblesort. Replaced them all
with a call to C's qsort(), and I was an instant hero for the speedup <g>.)
    


Interesting.

It would also be interesting to see how something like radix sort compares.

-- 
-Anderson: http://badmama.com.au/~anderson/

May 12 2004

resistor AT nospam DOT mac DOT com <resistor_member pathlink.com> writes:

Oops.  Forgot the attachment.  Here it is.

In article <c7sncs$k3e$1 digitaldaemon.com>, Walter says...
I'd have to do testing to make sure, but I'm pretty sure it's due to
quicksort being algorithmically superior. One of the reasons I built sort
into arrays is because people implement inefficient sorts over and over, and
I wanted to make it real easy to use the best algorithm. (A job I had once
was to optimize a large program written by others; in poking around in it I
found 3 different hacky implementations of bubblesort. Replaced them all
with a call to C's qsort(), and I was an instant hero for the speedup <g>.)


begin 0644 test.d
M:6UP;W)T('-T9"YR86YD;VT["FEM<&]R="!S=&0N9&%T93L*"F-L87-S(%9E
M8W1O<C)$"GL*("!F:6YA;"!V;VED(%-E=%A9*&EN="!X+"!I;G0 >2D >R!M
M>"`](' [("!M>2`]('D[('T["B` 9FEN86P :6YT($=E=% H*2![(')E='5R
M;B!M>#L ?3L*("!F:6YA;"!I;G0 1V5T62 I('L <F5T=7)N(&UY.R!].PH 
M(&9I;F%L(&EN="!'971-86=N:71U9&4H*2![(')E='5R;B`H;7 J;7 I("L 
M*&UY*FUY*3L ?3L*"B` 9FEN86P :6YT(&]P0VUP*%9E8W1O<C)$('8I('L*
M("` (')E='5R;B!'971-86=N:71U9&4H*2`M('8N1V5T36%G;FET=61E*"D[
M"B` ?0H*("!I;G0 ;7 ["B` :6YT(&UY.PI]" IV;VED($AE87!I9GDH3V)J
M96-T6UT >"P :6YT(&DL(&EN="!N2&5A<%-I>F4I"GL*("!I;G0 ;DQE9G0 
M/2!I("H ,CL*("!I;G0 ;E)I9VAT(#T ;DQE9G0 *R`Q.PH (&EN="!N3&%R
M9V5S="`](&D["B` :68 *"AN3&5F="`\(&Y(96%P4VEZ92D )B8 *'A;;DQE
M9G1=(#X ('A;;DQA<F=E<W1=*2D >PH ("` ;DQA<F=E<W0 /2!N3&5F=#L*
M("!]"B` :68 *"AN4FEG:'0 /"!N2&5A<%-I>F4I("8F("AX6VY2:6=H=%T 
M/B!X6VY,87)G97-T72DI('L*("` (&Y,87)G97-T(#T ;E)I9VAT.PH ('T*
M("!I9B`H;DQA<F=E<W0 (3T :2D >PH ("` 3V)J96-T('0Q(#T >%MI73L*
M("` ('A;:5T /2!X6VY,87)G97-T73L*("` ('A;;DQA<F=E<W1=(#T
M"B` ("!(96%P:69Y*' L(&Y,87)G97-T+"!N2&5A<%-I>F4I.PH ('T*?0IV
M;VED($)U:6QD2&5A<"A/8FIE8W1;72!X*0I["B` 9F]R("AI;G0 :2`]("AX

M(&DL(' N;&5N9W1H("T ,2D["GT*=F]I9"!(96%P4V]R="A/8FIE8W1;72!X
M*0I["B` 0G5I;&1(96%P*' I.PH (&9O<B`H:6YT(&D /2!X+FQE;F=T:"`M

M>%LP72`]('A;:5T["B` ("!X6VE=(#T
M(&DI.PH ('T*?0IB;V]L($ES4V]R=&5D*$]B:F5C=%M=(' I('L*("!F;W( 
M*&EN="!I/3$[(&D\>"YL96YG=& [(&DK*RD*("![" H ("` :68 *'A;:5T 
M/"!X6VDM,5TI"B` ("` (')E='5R;B!F86QS93L*("!]"B` <F5T=7)N('1R
M=64["GT*=F]I9"!M86EN*"D*>PH (&-O;G-T(&EN="!!4E)!65]325I%(#T 

M25I%73L*("`*("!P<FEN=&8H(G-I>F5O9BA696-T;W(R1%M=*2`]("5D7&XB
M+"!A+FQE;F=T:"`J(&%;,%TN<VEZ92D["B` <')I;G1F*")S;W)T:6YG("5D

M>2!T;R!R86YD;VT =F%L=65S"B` 9F]R96%C:"`H:6YO=70 5F5C=&]R,D0 
M=F5C.R!A*2!["B` ("!V96, /2!N97< 5F5C=&]R,D0H*3L*("` ('9E8RY3

M4V]R=&5D*&$I*0H (`EP<FEN=&8H(F%R<F%Y(&ES('-O<G1E9%QN(BD["B` 
M>PH ("` 9%]T:6UE('0Q(#T
M="AA*3L*("` (&$N<V]R=#L*("` (&1?=&EM92!T,B`](&=E=%540W1I;64H
M*3L*("` (&EN="!N=6U?=&EC:W, /2!T,B`M('0Q.PH ("` 9FQO870 ;G5M
M7VUS96, /2`H8V%S="AF;&]A="EN=6U?=&EC:W, +R!C87-T*&9L;V%T*51I
M8VMS4&5R4V5C;VYD*2`J(&-A<W0H9FQO870I,3`P,#L*("` ('!R:6YT9B B
M=&EM92!E;&%P<V5D("AI;B!M<V5C*2`]("5F7&XB+"!N=6U?;7-E8RD["B` 
M?0H (&EF("A)<U-O<G1E9"AA*2D*("` ('!R:6YT9B B87)R87D :7, <V]R
M=&5D7&XB*3L*("!P<FEN=&8H(DAA=FEN9R!A(&AA<'!Y(&1A>5QN(BD["GT*
!"F5D
`
end

May 12 2004

=?iso-8859-1?q?Knud_S=F8rensen?= <knud NetRunner.all-technology.com> writes:

On Tue, 11 May 2004 14:47:58 -0700, Walter wrote:

 
 You're welcome. You should also note that arrays have a built-in sort
 property, which uses quicksort. It'll blow away heapsort <g>. Doing that
 cuts the sort time on my machine from 453 ms to 63 ms. Source attached.
 

Why are your not using introspective sort ??
http://www.nist.gov/dads/HTML/introspectiveSort.html

May 12 2004

"Walter" <newshound digitalmars.com> writes:

"Knud S�rensen" <knud NetRunner.all-technology.com> wrote in message
news:pan.2004.05.12.22.37.18.149835 NetRunner.all-technology.com...
 On Tue, 11 May 2004 14:47:58 -0700, Walter wrote:

 You're welcome. You should also note that arrays have a built-in sort
 property, which uses quicksort. It'll blow away heapsort <g>. Doing that
 cuts the sort time on my machine from 453 ms to 63 ms. Source attached.

 Why are your not using introspective sort ??
 http://www.nist.gov/dads/HTML/introspectiveSort.html

Actually, it does just that (switch to heapsort as necessary).

May 12 2004

=?ISO-8859-1?Q?J=F6rg_R=FCppel?= <joerg sharky-x.de> writes:

Walter wrote:

 "christopher diggins" <cdiggins users.sourceforge.net> wrote in message
 news:c7r63p$1bv1$1 digitaldaemon.com...
 
Thanks for the feedback, Walter. I had only used -O but not -inline
and -release which sped up the execution to 703 msec and the memory usage
dropped to 2MB on my system. I have now updated the web page as a result.

 
 
 Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
 www.digitalmars.com/d/ ? You'll also get faster results if you use 'final'
 on all member functions that are not overrided (essentially all the
 functions you don't mark as 'virtual' in the C++ version).
 

I think I've read in the docs that the D compiler autodetects if a 
function should be virtual or not. Why is there such a speedgain by 
explicitely marking methods as final, when the compiler can do this 
implicitely?

Regards,
J�rg

May 12 2004

"Walter" <newshound digitalmars.com> writes:

"J�rg R�ppel" <joerg sharky-x.de> wrote in message
news:c7u7ru$4dt$1 digitaldaemon.com...
 Walter wrote:

 "christopher diggins" <cdiggins users.sourceforge.net> wrote in message
 news:c7r63p$1bv1$1 digitaldaemon.com...

Thanks for the feedback, Walter. I had only used -O but not -inline
and -release which sped up the execution to 703 msec and the memory



usage
dropped to 2MB on my system. I have now updated the web page as a



result.
 Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
 www.digitalmars.com/d/ ? You'll also get faster results if you use


'final'
 on all member functions that are not overrided (essentially all the
 functions you don't mark as 'virtual' in the C++ version).

 I think I've read in the docs that the D compiler autodetects if a
 function should be virtual or not. Why is there such a speedgain by
 explicitely marking methods as final, when the compiler can do this
 implicitely?

Because the compiler is not sophisticated enough to do it automatically yet.

May 12 2004

=?ISO-8859-1?Q?J=F6rg_R=FCppel?= <joerg sharky-x.de> writes:

I think I've read in the docs that the D compiler autodetects if a
function should be virtual or not. Why is there such a speedgain by
explicitely marking methods as final, when the compiler can do this
implicitely?

 
 
 Because the compiler is not sophisticated enough to do it automatically yet.
 
 

Thanks, that's all I wanted to hear ;)

Regards,
J�rg

May 13 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c7r63p$1bv1$1 digitaldaemon.com>, christopher diggins says...
Thanks for the feedback, Walter. I had only used -O but not -inline
and -release which sped up the execution to 703 msec and the memory usage
dropped to 2MB on my system. I have now updated the web page as a result.
There is still 4 bytes per object that I can't account for. I count 8 bytes
for the vector data, 4 bytes for the pointer to the object, 4 bytes for the
vtable in the object, which only brings me to 16 bytes per object, what is
the other 4 bytes for? Is there a way to allocate 100,000 objects at once
rather then one by one?

If I have a container of objects that implement interface Printable(), how does
Heron determine which are type Circle and which are type Square, without the use
of a vtable?

If the language cannot determine which type is which, then you should remove the
word "virtual" from the C++ version, and put "final" in the Java (and D?)
versions:  if the container can only have one type of object, then C++ does not
need the virtual keyword, and including it kind of "rigs" the demo, right?

Kevin

May 11 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

"Kevin Bealer" <Kevin_member pathlink.com> wrote in message
news:c7rboo$1kqs$1 digitaldaemon.com...
 In article <c7r63p$1bv1$1 digitaldaemon.com>, christopher diggins says...
Thanks for the feedback, Walter. I had only used -O but not -inline
and -release which sped up the execution to 703 msec and the memory usage
dropped to 2MB on my system. I have now updated the web page as a result.
There is still 4 bytes per object that I can't account for. I count 8


bytes
for the vector data, 4 bytes for the pointer to the object, 4 bytes for


the
vtable in the object, which only brings me to 16 bytes per object, what


is
the other 4 bytes for? Is there a way to allocate 100,000 objects at once
rather then one by one?

 If I have a container of objects that implement interface Printable(), how

does
 Heron determine which are type Circle and which are type Square, without

the use
 of a vtable?

Interface references contain an extra pointer to a dispatch table. So the
dynamic dispatching is done when needed. The compiler ignores the interface
references when it can deduce the concrete types.

 If the language cannot determine which type is which, then you should

remove the
 word "virtual" from the C++ version, and put "final" in the Java (and D?)

I do agree with you about the final keyword and I have done that to the D
version, and I am going to do it in the Java version.

 versions:  if the container can only have one type of object, then C++

does not
 need the virtual keyword, and including it kind of "rigs" the demo, right?

You are correct, but Vector2D is used to demonstrate a class that is
run-time polymorphic but happens to be used in a non-polymorphic manner. I
could easily add to the demo a modification that forces the issue of
polymorphism, but I thought it would complicate things too much. Do you
recommend that I change the demo to make this more explicit?

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 11 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c7rhdv$1t4h$1 digitaldaemon.com>, christopher diggins says...
..
You are correct, but Vector2D is used to demonstrate a class that is
run-time polymorphic but happens to be used in a non-polymorphic manner. I
could easily add to the demo a modification that forces the issue of
polymorphism, but I thought it would complicate things too much. Do you
recommend that I change the demo to make this more explicit?

Heron's approach of non-inheritance binding is interesting;  I take it there is
a global dispatch table with pointers to all methods appearing in non-deduceable
cases?

I think the demo you have is good as far as it goes.  But I would be more
interested to know what happens in both cases for both languages: i.e. C++
run-time and compile-time and Heron run-time and compile-time.  If Heron can
outperform in *both* cases, that would be much more impressive.

But in (performance oriented code) in a language like C++, virtual-ness is
usually used only when compile time binding is difficult to achieve.

Kevin

May 11 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

"Kevin Bealer" <Kevin_member pathlink.com> wrote in message
news:c7rkjh$2216$1 digitaldaemon.com...
 In article <c7rhdv$1t4h$1 digitaldaemon.com>, christopher diggins says...
 ..
You are correct, but Vector2D is used to demonstrate a class that is
run-time polymorphic but happens to be used in a non-polymorphic manner.


I
could easily add to the demo a modification that forces the issue of
polymorphism, but I thought it would complicate things too much. Do you
recommend that I change the demo to make this more explicit?

 Heron's approach of non-inheritance binding is interesting;  I take it

there is
 a global dispatch table with pointers to all methods appearing in

non-deduceable
 cases?

Close, there are lots of small tables created.

 I think the demo you have is good as far as it goes.  But I would be more
 interested to know what happens in both cases for both languages: i.e. C++
 run-time and compile-time and Heron run-time and compile-time.  If Heron

can
 outperform in *both* cases, that would be much more impressive.

Good point.

 But in (performance oriented code) in a language like C++, virtual-ness is
 usually used only when compile time binding is difficult to achieve.

In retrospect, I am starting to be of the opinion that the example may fact
rigged as you mentioned. There is no way that I can justify a design that
uses ICompare as a class with virtual functions.

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 12 2004

"Walter" <newshound digitalmars.com> writes:

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7t5qi$19oe$1 digitaldaemon.com...
 In retrospect, I am starting to be of the opinion that the example may

fact
 rigged as you mentioned. There is no way that I can justify a design that
 uses ICompare as a class with virtual functions.

I'd also be careful about the GetMagnitude() function. The time spent in the
two multiplies may swamp other effects. I've seen many benchmark programs
that purport to measure X, when they actually are measuring Y that the
benchmark writer thought was innocuous. Compiler vendors have been known to
exploit these naive benchmarks by optimizing the heck out of Y (which may be
much easier than optimizing X), and letting the benchmark reporter
misinterpret it into saying that X was optimized.

I've even seen crafty vendors create benchmarks specifically to mislead
people into thinking X was being optimized when it was actually Y. (This is
why I tend to not bother creating benchmarks for any but my own use, I like
to use benchmarks written by people not affiliated with DM to avoid even the
appearance of impropriety.)

In your benchmark, I'd check the multiplies, I'd also check to see if maybe
one compiler does a better job of function inlining or register allocation
before concluding that the speed difference is due to virtual dispatch.
Running a disassembler over the generated code and being familiar with
performance assembler coding is essential. The results can be quite
surprising.

May 12 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

"Walter" <newshound digitalmars.com> wrote in message
news:c7u37p$2m0t$1 digitaldaemon.com...
 "christopher diggins" <cdiggins users.sourceforge.net> wrote in message
 news:c7t5qi$19oe$1 digitaldaemon.com...
 In retrospect, I am starting to be of the opinion that the example may

 fact
 rigged as you mentioned. There is no way that I can justify a design


that
 uses ICompare as a class with virtual functions.

 I'd also be careful about the GetMagnitude() function. The time spent in

the
 two multiplies may swamp other effects. I've seen many benchmark programs
 that purport to measure X, when they actually are measuring Y that the
 benchmark writer thought was innocuous. Compiler vendors have been known

to
 exploit these naive benchmarks by optimizing the heck out of Y (which may

be
 much easier than optimizing X), and letting the benchmark reporter
 misinterpret it into saying that X was optimized.

 I've even seen crafty vendors create benchmarks specifically to mislead
 people into thinking X was being optimized when it was actually Y. (This

is
 why I tend to not bother creating benchmarks for any but my own use, I

like
 to use benchmarks written by people not affiliated with DM to avoid even

the
 appearance of impropriety.)

 In your benchmark, I'd check the multiplies, I'd also check to see if

maybe
 one compiler does a better job of function inlining or register allocation
 before concluding that the speed difference is due to virtual dispatch.
 Running a disassembler over the generated code and being familiar with
 performance assembler coding is essential. The results can be quite
 surprising.


You make some excellent points. I have written a brand new benchmark, which
only has increments, decrements, == and function calls. Hopefully this is
more accurate. Unfortunately I am too slow and clumsy with assembler to be
able to do any disassembly, hopefully someone else will come along and do
it.

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 13 2004

Andy Friesen <andy ikagames.com> writes:

christopher diggins wrote:

 I have written an object oriented Heapsort in D at
 http://www.heron-language.com/benchmarks/heapsort-d.html to compare the
 performance of object oriented code in several languages. The results were
 somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I
 was hoping that there might be some suggestions from members of this group
 about how I might improve the code I wrote while keeping the overall
 algorithm and design the same.
 
 Please note that the program is a port of the program at
 http://www.heron-language.com/benchmarks/heapsort-heron.html and is
 discussed further at http://www.heron-language.com/benchmarks/index.html

This is completely tangental, but Microsoft's C++ compiler completes the 
heapsort in 235ms. (Athlon XP 1600+ with 512MB of RAM, using /O2)

  -- andy

May 11 2004

"Matthew" <matthew.hat stlsoft.dot.org> writes:

Why have you not used DMC++ for the Heron and C++? That way the comparison would
be truer

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7qvo6$1256$1 digitaldaemon.com...
 I have written an object oriented Heapsort in D at
 http://www.heron-language.com/benchmarks/heapsort-d.html to compare the
 performance of object oriented code in several languages. The results were
 somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I
 was hoping that there might be some suggestions from members of this group
 about how I might improve the code I wrote while keeping the overall
 algorithm and design the same.

 Please note that the program is a port of the program at
 http://www.heron-language.com/benchmarks/heapsort-heron.html and is
 discussed further at http://www.heron-language.com/benchmarks/index.html

 TIA
 -- 
 Christopher Diggins
 http://www.cdiggins.com
 http://www.heron-language.com

May 11 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

"Matthew" <matthew.hat stlsoft.dot.org> wrote in message
news:c7rgk8$1s1l$1 digitaldaemon.com...
 Why have you not used DMC++ for the Heron and C++? That way the comparison

would
 be truer


If all that would happen is make the heron and C++ code run slower (or
faster) I don't really see the point. I am simply using the best compilers I
have at my disposal. I also don't think it would make any difference to the
whole point of the comparison which is that Heron implements more efficient
object oriented code the other languages which support object oriented
techniques.

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 11 2004

"Walter" <newshound digitalmars.com> writes:

"christopher diggins" <cdiggins users.sourceforge.net> wrote in message
news:c7rl11$22nr$1 digitaldaemon.com...
 "Matthew" <matthew.hat stlsoft.dot.org> wrote in message
 news:c7rgk8$1s1l$1 digitaldaemon.com...
 Why have you not used DMC++ for the Heron and C++? That way the


comparison
 would
 be truer


 If all that would happen is make the heron and C++ code run slower (or
 faster) I don't really see the point. I am simply using the best compilers

I
 have at my disposal. I also don't think it would make any difference to

the
 whole point of the comparison which is that Heron implements more

efficient
 object oriented code the other languages which support object oriented
 techniques.

That depends on if point is really to compare inherent efficiency of the
language, not the relative implementation vagaries of different compilers.
DMD and DMC++ share a common optimizer and code generator.

May 11 2004

"christopher diggins" <cdiggins users.sourceforge.net> writes:

"Walter" <newshound digitalmars.com> wrote in message
news:c7rn04$25r5$1 digitaldaemon.com...
 "christopher diggins" <cdiggins users.sourceforge.net> wrote in message
 news:c7rl11$22nr$1 digitaldaemon.com...
 "Matthew" <matthew.hat stlsoft.dot.org> wrote in message
 news:c7rgk8$1s1l$1 digitaldaemon.com...
 Why have you not used DMC++ for the Heron and C++? That way the


 comparison
 would
 be truer


 If all that would happen is make the heron and C++ code run slower (or
 faster) I don't really see the point. I am simply using the best


compilers
 I
 have at my disposal. I also don't think it would make any difference to

 the
 whole point of the comparison which is that Heron implements more

 efficient
 object oriented code the other languages which support object oriented
 techniques.

 That depends on if point is really to compare inherent efficiency of the
 language, not the relative implementation vagaries of different compilers.
 DMD and DMC++ share a common optimizer and code generator.


I am now convinced and the new example (
http://www.heron-language.com/benchmarks/index.html ) uses only the Digital
mars compilers.

-- 
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com

May 13 2004

C <qbert atari-soldiers.com> writes:

Impressive speeds on Heron!  Id like to see so more benchmarks :).

C

On Tue, 11 May 2004 12:41:34 -0400, christopher diggins 
<cdiggins users.sourceforge.net> wrote:

 I have written an object oriented Heapsort in D at
 http://www.heron-language.com/benchmarks/heapsort-d.html to compare the
 performance of object oriented code in several languages. The results 
 were
 somewhat disappointing, when compared with C++, Delphi, Java, and Heron. 
 I
 was hoping that there might be some suggestions from members of this 
 group
 about how I might improve the code I wrote while keeping the overall
 algorithm and design the same.

 Please note that the program is a port of the program at
 http://www.heron-language.com/benchmarks/heapsort-heron.html and is
 discussed further at http://www.heron-language.com/benchmarks/index.html

 TIA

May 11 2004

D Programming

C/C++ Programming

Other

digitalmars.D - D performance compared with other languages