www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - New malloc() for win32 that should produce faster DMD's and faster

reply Walter Bright <newshound2 digitalmars.com> writes:
The execrable existing implementation was scrapped, and the new one uses
Windows 
HeapAlloc().

http://ftp.digitalmars.com/snn.lib

This is for testing porpoises, and of course for those that Feel Da Need For
Speed.
Aug 03 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/3/2013 2:55 PM, Walter Bright wrote:
 Feel Da Need For Speed.
So much better than: Feel Da Need For Reduced Elapsed Time :-)
Aug 03 2013
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 03.08.2013 23:55, schrieb Walter Bright:
 The execrable existing implementation was scrapped, and the new one uses
Windows
 HeapAlloc().

 http://ftp.digitalmars.com/snn.lib

 This is for testing porpoises, and of course for those that Feel Da Need For
Speed.
ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators?
Aug 03 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/3/2013 11:07 PM, dennis luehring wrote:
 ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or
 other malloc allocators?
No, I haven't.
Aug 04 2013
prev sibling parent reply "Joseph Rushton Wakeling" <joseph.wakeling webdrake.net> writes:
On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:
 ever tested nedmalloc 
 (http://www.nedprod.com/programs/portable/nedmalloc/) or other 
 malloc allocators?
"Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results." So there may be minimal returns from incorporating nedmalloc on modern OS's ... ?
Aug 04 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:
 On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:
 ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or
 other malloc allocators?
"Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results." So there may be minimal returns from incorporating nedmalloc on modern OS's ... ?
As I wrote earlier, Microsoft has enormous incentive to make HeapXXXX as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work?
Aug 04 2013
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 04.08.2013 09:35, schrieb Walter Bright:
 On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:
 On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:
 ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or
 other malloc allocators?
"Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results." So there may be minimal returns from incorporating nedmalloc on modern OS's ... ?
As I wrote earlier, Microsoft has enormous incentive to make HeapXXXX as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work?
HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies)
Aug 04 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/4/2013 12:53 AM, dennis luehring wrote:
 HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc
 directly - would it be better to use this kernel32 api directly? (maybe if used
 in druntime to reduce dll dependencies)
I can't find any documentation on RtlHeapAlloc.
Aug 04 2013
prev sibling parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
04.08.2013 11:53, dennis luehring пишет:
 Am 04.08.2013 09:35, schrieb Walter Bright:
 On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:
 On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:
 ever tested nedmalloc
 (http://www.nedprod.com/programs/portable/nedmalloc/) or
 other malloc allocators?
"Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results." So there may be minimal returns from incorporating nedmalloc on modern OS's ... ?
As I wrote earlier, Microsoft has enormous incentive to make HeapXXXX as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work?
HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies)
Up to Windows XP (at least) KERNEL32's HeapAlloc function is forwarded to RtlAllocateHeap [1] function exported by NTDLL so there is no runtime performance overhead. There is no RtlHeapAlloc function on my Windows XP and I can't find any information about it on the web. Looks like a Windows 6.x stuff or a mistake in name. Also note there are tons of errors because of such "slightly different" names. If we are talking about "Heap*" functions: 1. Incorrect "RtlAllocHeap" name here [2]. 2. Incorrect "HeapFree" function signature (4-byte BOOL is returned but it is just a wrapper of RtlFreeHeap which returns 1-byte BOOLEAN) (fixed in Windows 6.x). [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx [2] http://msdn.microsoft.com/ru-ru/magazine/cc301808(en-us).aspx -- Денис В. Шеломовский Denis V. Shelomovskij
Aug 04 2013
parent dennis luehring <dl.soluz gmx.net> writes:
your're right it was RtlAllocateHeap

Am 04.08.2013 11:25, schrieb Denis Shelomovskij:
 04.08.2013 11:53, dennis luehring пишет:
 Am 04.08.2013 09:35, schrieb Walter Bright:
 On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:
 On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:
 ever tested nedmalloc
 (http://www.nedprod.com/programs/portable/nedmalloc/) or
 other malloc allocators?
"Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results." So there may be minimal returns from incorporating nedmalloc on modern OS's ... ?
As I wrote earlier, Microsoft has enormous incentive to make HeapXXXX as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work?
HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies)
Up to Windows XP (at least) KERNEL32's HeapAlloc function is forwarded to RtlAllocateHeap [1] function exported by NTDLL so there is no runtime performance overhead. There is no RtlHeapAlloc function on my Windows XP and I can't find any information about it on the web. Looks like a Windows 6.x stuff or a mistake in name. Also note there are tons of errors because of such "slightly different" names. If we are talking about "Heap*" functions: 1. Incorrect "RtlAllocHeap" name here [2]. 2. Incorrect "HeapFree" function signature (4-byte BOOL is returned but it is just a wrapper of RtlFreeHeap which returns 1-byte BOOLEAN) (fixed in Windows 6.x). [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx [2] http://msdn.microsoft.com/ru-ru/magazine/cc301808(en-us).aspx
Aug 04 2013
prev sibling next sibling parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
04.08.2013 1:55, Walter Bright пишет:
 The execrable existing implementation was scrapped, and the new one uses
 Windows HeapAlloc().

 http://ftp.digitalmars.com/snn.lib

 This is for testing porpoises, and of course for those that Feel Da Need
 For Speed.
So I suppose you use `HeapFree` too? Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178 -- Денис В. Шеломовский Denis V. Shelomovskij
Aug 04 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/4/2013 2:28 AM, Denis Shelomovskij wrote:
 04.08.2013 1:55, Walter Bright пишет:
 The execrable existing implementation was scrapped, and the new one uses
 Windows HeapAlloc().

 http://ftp.digitalmars.com/snn.lib

 This is for testing porpoises, and of course for those that Feel Da Need
 For Speed.
So I suppose you use `HeapFree` too?
Yes.
 Please, be sure you use this Windows API
 BOOL/BOOLEAN bug workaround:
 https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178
That's good to know, thanks!
Aug 04 2013
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 04.08.2013 11:28, schrieb Denis Shelomovskij:
 04.08.2013 1:55, Walter Bright пишет:
 The execrable existing implementation was scrapped, and the new one uses
 Windows HeapAlloc().

 http://ftp.digitalmars.com/snn.lib

 This is for testing porpoises, and of course for those that Feel Da Need
 For Speed.
So I suppose you use `HeapFree` too? Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178
but please without using two ifs and GetVersion on every free call
Aug 05 2013
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Sunday, 4 August 2013 at 09:28:11 UTC, Denis Shelomovskij 
wrote:
 So I suppose you use `HeapFree` too? Please, be sure you use 
 this Windows API BOOL/BOOLEAN bug workaround:
 https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178
BOOLEAN is either TRUE or FALSE, so it should be ok to check only the least significant byte.
Aug 05 2013
parent "Mr. Anonymous" <mailnew4ster gmail.com> writes:
On Monday, 5 August 2013 at 21:42:11 UTC, Kagamin wrote:
 On Sunday, 4 August 2013 at 09:28:11 UTC, Denis Shelomovskij 
 wrote:
 So I suppose you use `HeapFree` too? Please, be sure you use 
 this Windows API BOOL/BOOLEAN bug workaround:
 https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178
BOOLEAN is either TRUE or FALSE, so it should be ok to check only the least significant byte.
Not in Windows: typedef BYTE BOOLEAN; typedef int BOOL; (c) http://msdn.microsoft.com/en-us/library/windows/desktop/aa383751%28v=vs.85%29.aspx While ideally it should be TRUE or FALSE, sometimes it isn't. In fact, for functions that return BOOL, MSDN states the following: "If the function succeeds, the return value is nonzero."
Aug 05 2013
prev sibling next sibling parent reply Richard Webb <richard.webb boldonjames.com> writes:
On 03/08/2013 22:55, Walter Bright wrote:
 The execrable existing implementation was scrapped, and the new one uses
 Windows HeapAlloc().

 http://ftp.digitalmars.com/snn.lib

 This is for testing porpoises, and of course for those that Feel Da Need
 For Speed.
Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to compile the algorithm unit tests (when i tried it last week, it was taking closer to 17 seconds). For comparison, the MSVC build takes about 10 seconds on the same machine (Athlon 64X2 6000+). Keep up the good work :-)
Aug 05 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/5/2013 4:01 AM, Richard Webb wrote:
 Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to
 compile the algorithm unit tests (when i tried it last week, it was taking
 closer to 17 seconds).

 For comparison, the MSVC build takes about 10 seconds on the same machine
 (Athlon 64X2 6000+).

 Keep up the good work :-)
So I guess the DMC code generator isn't as awful as has been assumed! This is hardly the first time the culprit was a library routine, not the code generator.
Aug 05 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 05.08.2013 19:52, schrieb Walter Bright:
 On 8/5/2013 4:01 AM, Richard Webb wrote:
 Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to
 compile the algorithm unit tests (when i tried it last week, it was taking
 closer to 17 seconds).

 For comparison, the MSVC build takes about 10 seconds on the same machine
 (Athlon 64X2 6000+).

 Keep up the good work :-)
So I guess the DMC code generator isn't as awful as has been assumed! This is hardly the first time the culprit was a library routine, not the code generator.
don't start the party to early there are still 1.5 seconds left :)
Aug 05 2013
prev sibling parent reply Richard Webb <richard.webb boldonjames.com> writes:
On 05/08/2013 18:52, Walter Bright wrote:
 This is hardly the first time the culprit was a library routine
It's possible that other library routines are causing some of the remaining difference from the MSVC build (e.g. the profiler suggests that the DMC build spends somewhat more time inside memcpy than the MSVC build). Not sure if it's down to implementation or optimization though - might be down to intrinsics/inlining and such? (the proflie for the DMC build says it's using ~1% of its time inside strlen and the profile for the MSVC build doesn't mention it at all, which i guess is because it's using an intrinsic version).
Aug 06 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/6/2013 5:13 AM, Richard Webb wrote:
 It's possible that other library routines are causing some of the remaining
 difference from the MSVC build (e.g. the profiler suggests that the DMC build
 spends somewhat more time inside memcpy than the MSVC build).

 Not sure if it's down to implementation or optimization though - might be down
 to intrinsics/inlining and such? (the proflie for the DMC build says it's using
 ~1% of its time inside strlen and the profile for the MSVC build doesn't
mention
 it at all, which i guess is because it's using an intrinsic version).
If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy().
Aug 06 2013
parent reply "Kiith-Sa" <kiithsacmp gmail.com> writes:
On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote:
 On 8/6/2013 5:13 AM, Richard Webb wrote:
 It's possible that other library routines are causing some of 
 the remaining
 difference from the MSVC build (e.g. the profiler suggests 
 that the DMC build
 spends somewhat more time inside memcpy than the MSVC build).

 Not sure if it's down to implementation or optimization though 
 - might be down
 to intrinsics/inlining and such? (the proflie for the DMC 
 build says it's using
 ~1% of its time inside strlen and the profile for the MSVC 
 build doesn't mention
 it at all, which i guess is because it's using an intrinsic 
 version).
If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy().
If you use a profiler with line or instruction granularity (like perf on Linux), it will show up. On Windows, that would probably be VTune and CodeAnalyst.
Aug 06 2013
parent "Kiith-Sa" <kiithsacmp gmail.com> writes:
On Tuesday, 6 August 2013 at 18:38:43 UTC, Kiith-Sa wrote:
 On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote:
 On 8/6/2013 5:13 AM, Richard Webb wrote:
 It's possible that other library routines are causing some of 
 the remaining
 difference from the MSVC build (e.g. the profiler suggests 
 that the DMC build
 spends somewhat more time inside memcpy than the MSVC build).

 Not sure if it's down to implementation or optimization 
 though - might be down
 to intrinsics/inlining and such? (the proflie for the DMC 
 build says it's using
 ~1% of its time inside strlen and the profile for the MSVC 
 build doesn't mention
 it at all, which i guess is because it's using an intrinsic 
 version).
If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy().
If you use a profiler with line or instruction granularity (like perf on Linux), it will show up. On Windows, that would probably be VTune and CodeAnalyst.
(obviously, as a part of the function it was inlined into, but you'll get the time consumed at lines/instructions from the inlined function)
Aug 06 2013
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, August 03, 2013 14:55:29 Walter Bright wrote:
 The execrable existing implementation was scrapped, and the new one uses
 Windows HeapAlloc().
 
 http://ftp.digitalmars.com/snn.lib
 
 This is for testing porpoises, and of course for those that Feel Da Need For
 Speed.
But what if I prefer to test dolphins? ;) - Jonathan M Davis P.S. So long, and thanks for all the fish.
Aug 03 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/3/2013 3:28 PM, Jonathan M Davis wrote:
 On Saturday, August 03, 2013 14:55:29 Walter Bright wrote:
 This is for testing porpoises, and of course for those that Feel Da Need For
 Speed.
But what if I prefer to test dolphins? ;)
They all look alike anyway, what's the difference?
Aug 11 2013