www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compile-time memory footprint of std.algorithm

reply "Iain Buclaw" <ibuclaw gdcproject.org> writes:
Testing a 2.065 pre-release snapshot against GDC. I see that 
std.algorithm now surpasses 2.1GBs of memory consumption when 
compiling unittests.  This is bringing my laptop down to its 
knees for a painful 2/3 minutes.

This is time that could be better spent if the unittests where 
simply broken down/split up.
Apr 22 2014
next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Apr 22, 2014 at 06:09:11PM +0000, Iain Buclaw via Digitalmars-d wrote:
 Testing a 2.065 pre-release snapshot against GDC. I see that
 std.algorithm now surpasses 2.1GBs of memory consumption when
 compiling unittests.  This is bringing my laptop down to its knees for
 a painful 2/3 minutes.
 
 This is time that could be better spent if the unittests where simply
 broken down/split up.

Didn't we say (many months ago!) that we wanted to split up std.algorithm into more manageable chunks? I see that that hasn't happened yet. :-( T -- "Real programmers can write assembly code in any language. :-)" -- Larry Wall
Apr 22 2014
prev sibling next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 22 April 2014 at 18:09:12 UTC, Iain Buclaw wrote:
 Testing a 2.065 pre-release snapshot against GDC. I see that 
 std.algorithm now surpasses 2.1GBs of memory consumption when 
 compiling unittests.  This is bringing my laptop down to its 
 knees for a painful 2/3 minutes.

My (ancient) laptop only has 2GB of RAM :-) Has anyone looked into why it is using so much? Is it all the temporary allocations created by CTFE that are never cleaned up?
Apr 22 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Apr-2014 01:00, Iain Buclaw via Digitalmars-d пишет:
 On 22 April 2014 21:43, Peter Alexander via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Tuesday, 22 April 2014 at 18:09:12 UTC, Iain Buclaw wrote:
 Testing a 2.065 pre-release snapshot against GDC. I see that std.algorithm
 now surpasses 2.1GBs of memory consumption when compiling unittests.  This
 is bringing my laptop down to its knees for a painful 2/3 minutes.

My (ancient) laptop only has 2GB of RAM :-) Has anyone looked into why it is using so much? Is it all the temporary allocations created by CTFE that are never cleaned up?

I blame Kenji and all the semanticTiargs and other template-related copying and discarding of memory around the place. :o)

At a times I really don't know why can't we just drop in a Boehm GC (the stock one, not homebrew stuff) and be done with it. Speed? There is no point in speed if it leaks that much. -- Dmitry Olshansky
Apr 22 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/22/2014 11:33 PM, Dmitry Olshansky wrote:
 At a times I really don't know why can't we just drop in a Boehm GC (the stock
 one, not homebrew stuff) and be done with it. Speed? There is no point in speed
 if it leaks that much.

I made a build of dmd with a collector in it. It destroyed the speed. Took it out.
Apr 22 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Apr-2014 10:39, Walter Bright пишет:
 On 4/22/2014 11:33 PM, Dmitry Olshansky wrote:
 At a times I really don't know why can't we just drop in a Boehm GC
 (the stock
 one, not homebrew stuff) and be done with it. Speed? There is no point
 in speed
 if it leaks that much.

I made a build of dmd with a collector in it. It destroyed the speed. Took it out.

Getting more practical - any chance to use it selectively in CTFE and related stuff that is KNOWN to generate garbage? -- Dmitry Olshansky
Apr 22 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/22/2014 11:56 PM, Dmitry Olshansky wrote:
 Getting more practical - any chance to use it selectively in CTFE and related
 stuff that is KNOWN to generate garbage?

Using it there only will require a rewrite of interpret.c.
Apr 23 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/23/2014 12:20 AM, Kagamin wrote:
 On Wednesday, 23 April 2014 at 06:39:04 UTC, Walter Bright wrote:
 I made a build of dmd with a collector in it. It destroyed the speed. Took it
 out.

Is it because of garbage collections? Then allow people configure collection threshold, say, collect garbage only when the heap is bigger than 16GB.

It's more than that. I invite you to read the article I wrote on DrDobbs a while back about changes to the allocator to improve speed. tl;dr: allocation is a critical speed issue with dmd. Using the bump-pointer method is very fast, and it matters.
Apr 23 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Apr-2014 12:12, Walter Bright пишет:
 On 4/23/2014 12:20 AM, Kagamin wrote:
 On Wednesday, 23 April 2014 at 06:39:04 UTC, Walter Bright wrote:
 I made a build of dmd with a collector in it. It destroyed the speed.
 Took it
 out.

Is it because of garbage collections? Then allow people configure collection threshold, say, collect garbage only when the heap is bigger than 16GB.

It's more than that. I invite you to read the article I wrote on DrDobbs a while back about changes to the allocator to improve speed. tl;dr: allocation is a critical speed issue with dmd. Using the bump-pointer method is very fast, and it matters.

This stinks it's not even half-serious. A x2 speed increase was due to scraping the old allocator on Win32 altogether and using plain HeapAPI. If the prime reason compilation is fast is because we just throw away memory, we must be doing something wrong, very wrong. -- Dmitry Olshansky
Apr 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/23/2014 2:00 AM, Dmitry Olshansky wrote:
 If the prime reason compilation is fast is because we just throw away memory,
we
 must be doing something wrong, very wrong.

I've tried adding a collector to DMD with poor results. If you'd like to give it a try as well, please do so. The thing is, I work all day every day on D. I cannot do more. If people want more things done, like redesigning memory allocation in the compiler, redesigning D to do ARC, etc., they'll need to pitch in.
Apr 23 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Apr-2014 21:16, Walter Bright пишет:
 On 4/23/2014 2:00 AM, Dmitry Olshansky wrote:
 If the prime reason compilation is fast is because we just throw away
 memory, we
 must be doing something wrong, very wrong.

I've tried adding a collector to DMD with poor results. If you'd like to give it a try as well, please do so.

I'll give it a spin then.
 The thing is, I work all day every day on D. I cannot do more.

That is understood, thanks for honesty.
 If people
 want more things done, like redesigning memory allocation in the
 compiler, redesigning D to do ARC, etc., they'll need to pitch in.

-- Dmitry Olshansky
Apr 23 2014
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 23/04/14 08:39, Walter Bright wrote:

 I made a build of dmd with a collector in it. It destroyed the speed.
 Took it out.

Isn't that bad advertisement for the GC in D? Or has it something to do with DMD not being designed with a GC in mind? -- /Jacob Carlborg
Apr 23 2014
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
On 4/24/14, 3:16 AM, Jacob Carlborg wrote:
 On 23/04/14 08:39, Walter Bright wrote:

 I made a build of dmd with a collector in it. It destroyed the speed.
 Took it out.

Isn't that bad advertisement for the GC in D? Or has it something to do with DMD not being designed with a GC in mind?

dmd is written in C++, the collector must have been boehm
Apr 24 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/24/2014 7:16 AM, Iain Buclaw via Digitalmars-d wrote:
 On 24 April 2014 12:01, Ary Borenszweig via Digitalmars-d
 It wasn't IIRC.  'Twas in-house GC, no?

It was with the C++ version of the original D collector.
Apr 26 2014
prev sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Dmitry Olshansky"  wrote in message news:lj7mrr$1p5s$1 digitalmars.com... 

 At a times I really don't know why can't we just drop in a Boehm GC (the 
 stock one, not homebrew stuff) and be done with it. Speed? There is no 
 point in speed if it leaks that much.

Or you know, switch to D and use druntime's GC.
Apr 23 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Apr-2014 20:56, Daniel Murphy пишет:
 "Dmitry Olshansky"  wrote in message news:lj7mrr$1p5s$1 digitalmars.com...
 At a times I really don't know why can't we just drop in a Boehm GC
 (the stock one, not homebrew stuff) and be done with it. Speed? There
 is no point in speed if it leaks that much.

Or you know, switch to D and use druntime's GC.

Good point. Can't wait to see D-only codebase. -- Dmitry Olshansky
Apr 23 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Apr-2014 05:12, Marco Leise пишет:
 Am Wed, 23 Apr 2014 21:23:17 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 23-Apr-2014 20:56, Daniel Murphy пишет:
 "Dmitry Olshansky"  wrote in message news:lj7mrr$1p5s$1 digitalmars.com...
 At a times I really don't know why can't we just drop in a Boehm GC
 (the stock one, not homebrew stuff) and be done with it. Speed? There
 is no point in speed if it leaks that much.

Or you know, switch to D and use druntime's GC.

Good point. Can't wait to see D-only codebase.

Hmm. DMD doesn't use a known and tried, imprecise GC because it is a lot slower.

No it doesn't. It used a precursor of D's GC and that turned out to be slow. See Walter's post.
 How is DMD written in D using the druntime
 GC going to help that ?

GC is that easier to reach, every enhancement to D's GC becomes instantly available. Wanna make compiler faster - make D's runtime faster! ;)
 I wondered about this ever since there
 was talk about DDMD. I'm totally expecting compile times to
 multiply by 1.2 or so.

Since memory management is going to stay the same with disabled GC (at least for starters), I doubt things will change radically. If they will then it'll just highlight perf problems in D's runtime that need work. -- Dmitry Olshansky
Apr 26 2014
prev sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
On 4/23/14, 1:56 PM, Daniel Murphy wrote:
 "Dmitry Olshansky"  wrote in message news:lj7mrr$1p5s$1 digitalmars.com...
 At a times I really don't know why can't we just drop in a Boehm GC
 (the stock one, not homebrew stuff) and be done with it. Speed? There
 is no point in speed if it leaks that much.

Or you know, switch to D and use druntime's GC.

But that will be slow. Walter's point is that if you introduce a GC it will be slower. Of course, you won't be able to compile big stuff. But developers usually have good machines, so it's not that a big deal.
Apr 23 2014
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 22 April 2014 21:43, Peter Alexander via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Tuesday, 22 April 2014 at 18:09:12 UTC, Iain Buclaw wrote:
 Testing a 2.065 pre-release snapshot against GDC. I see that std.algorithm
 now surpasses 2.1GBs of memory consumption when compiling unittests.  This
 is bringing my laptop down to its knees for a painful 2/3 minutes.

My (ancient) laptop only has 2GB of RAM :-) Has anyone looked into why it is using so much? Is it all the temporary allocations created by CTFE that are never cleaned up?

I blame Kenji and all the semanticTiargs and other template-related copying and discarding of memory around the place. :o)
Apr 22 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 23 April 2014 at 06:39:04 UTC, Walter Bright wrote:
 I made a build of dmd with a collector in it. It destroyed the 
 speed. Took it out.

Is it because of garbage collections? Then allow people configure collection threshold, say, collect garbage only when the heap is bigger than 16GB.
Apr 23 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 23 Apr 2014 02:39:05 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/22/2014 11:33 PM, Dmitry Olshansky wrote:
 At a times I really don't know why can't we just drop in a Boehm GC  
 (the stock
 one, not homebrew stuff) and be done with it. Speed? There is no point  
 in speed
 if it leaks that much.

I made a build of dmd with a collector in it. It destroyed the speed. Took it out.

The time it takes to compile a program where the compiler consumes 2G of ram on a 2G machine is infinite ;) There must be some compromise between slow-but-perfect memory management and invoking the OOM killer. -Steve
Apr 23 2014
prev sibling next sibling parent "Steve Teale" <steve.teale britseyeview.com> writes:
On Wednesday, 23 April 2014 at 17:16:40 UTC, Walter Bright wrote:
 On 4/23/2014 2:00 AM, Dmitry Olshansky wrote:
 If the prime reason compilation is fast is because we just 
 throw away memory, we
 must be doing something wrong, very wrong.

I've tried adding a collector to DMD with poor results. If you'd like to give it a try as well, please do so. The thing is, I work all day every day on D. I cannot do more. If people want more things done, like redesigning memory allocation in the compiler, redesigning D to do ARC, etc., they'll need to pitch in.

Well said Walter!
Apr 23 2014
prev sibling next sibling parent "Messenger" <dont shoot.me> writes:
On Wednesday, 23 April 2014 at 15:46:00 UTC, Steven Schveighoffer 
wrote:
 The time it takes to compile a program where the compiler 
 consumes 2G of ram on a 2G machine is infinite ;)

(nitpick: not necessarily given good swap behaviour!)
Apr 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 tl;dr: allocation is a critical speed issue with dmd. Using the 
 bump-pointer method is very fast, and it matters.

What about packing DMD structure members such as integers and enums more efficiently? We could start with making enums __attribute__((packed)). Is there any free static/dynamic tool to check for unexercized bits? How does Clang do to save so much space compared to GCC? Do they pack gentlier or use deallocation? A much higher-hanging fruit is to switch from using pointers to 32-bit handles on 64-bit CPUs to reference tokens, sub-expressions etc. But I guess that is a big undertaking getting type-safe and may give performance hits.
Apr 23 2014
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Wednesday, 23 April 2014 at 19:54:29 UTC, Nordlöw wrote:
 tl;dr: allocation is a critical speed issue with dmd. Using 
 the bump-pointer method is very fast, and it matters.

What about packing DMD structure members such as integers and enums more efficiently? We could start with making enums __attribute__((packed)). Is there any free static/dynamic tool to check for unexercized bits? How does Clang do to save so much space compared to GCC? Do they pack gentlier or use deallocation? A much higher-hanging fruit is to switch from using pointers to 32-bit handles on 64-bit CPUs to reference tokens, sub-expressions etc. But I guess that is a big undertaking getting type-safe and may give performance hits.

Maybe we should investigate where the memory is going first before planning our attack :-)
Apr 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 Maybe we should investigate where the memory is going first 
 before planning our attack :-)

I agree. Tool anyone?
Apr 23 2014
prev sibling next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 I agree. Tool anyone?

https://stackoverflow.com/questions/23255043/finding-unexercised-bits-of-allocated-data Massif may give some clues.
Apr 23 2014
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 23 April 2014 21:55, "Nordlöw" <digitalmars-d puremagic.com> wrote:
 Maybe we should investigate where the memory is going first before
 planning our attack :-)

I agree. Tool anyone?

I'm using valgrind - may take a while to process and merge them all. I'll post an update in the morning.
Apr 23 2014
prev sibling next sibling parent "Jussi Jumppanen" <jussij zeusedit.com> writes:
On Wednesday, 23 April 2014 at 20:04:09 UTC, Peter Alexander
wrote:

 Maybe we should investigate where the memory is going first 
 before planning our attack :-)

FWIW one hint might be found in the DCD project found here: https://github.com/Hackerpilot/DCD/ In that project compiling the lexer.d file causes a massive increase in compiler memory usage. More details found here: https://github.com/Hackerpilot/DCD/issues/93 NOTE: That was DMD running on a 32 bit Windows XP machine.
Apr 23 2014
prev sibling next sibling parent "Brian Schott" <briancschott gmail.com> writes:
On Wednesday, 23 April 2014 at 23:19:20 UTC, Jussi Jumppanen 
wrote:
 In that project compiling the lexer.d file causes a massive
 increase in compiler memory usage.

The code is actually located here: https://github.com/Hackerpilot/Dscanner If you want to make DMD cry, compile it with "-O -inline -release".
Apr 23 2014
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 23 Apr 2014 21:23:17 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 23-Apr-2014 20:56, Daniel Murphy =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 "Dmitry Olshansky"  wrote in message news:lj7mrr$1p5s$1 digitalmars.com=


 At a times I really don't know why can't we just drop in a Boehm GC
 (the stock one, not homebrew stuff) and be done with it. Speed? There
 is no point in speed if it leaks that much.

Or you know, switch to D and use druntime's GC.

Good point. Can't wait to see D-only codebase.

Hmm. DMD doesn't use a known and tried, imprecise GC because it is a lot slower. How is DMD written in D using the druntime GC going to help that ? I wondered about this ever since there was talk about DDMD. I'm totally expecting compile times to multiply by 1.2 or so. --=20 Marco
Apr 23 2014
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 23 April 2014 22:24, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 On 23 April 2014 21:55, "Nordlöw" <digitalmars-d puremagic.com> wrote:
 Maybe we should investigate where the memory is going first before
 planning our attack :-)

I agree. Tool anyone?

I'm using valgrind - may take a while to process and merge them all. I'll post an update in the morning.

I was amazed to see some small losses in the glue (that I'll be dealing with), but by and large the worst culprits were all the syntaxCopy'ing done in Template semantic analysis. The resultant assembly file emitted by gdc is 83MB in size, so I think it is impossible to not have a large memory consumption here. The stats file is 100MB (39k reported leaks) and I'm not sure just what to do with it yet.
Apr 23 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 23 April 2014 at 08:12:42 UTC, Walter Bright wrote:
 tl;dr: allocation is a critical speed issue with dmd. Using the 
 bump-pointer method is very fast, and it matters.

Alternatively we could replace heap on size threshold.
Apr 23 2014
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 24 April 2014 12:01, Ary Borenszweig via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 4/24/14, 3:16 AM, Jacob Carlborg wrote:
 On 23/04/14 08:39, Walter Bright wrote:

 I made a build of dmd with a collector in it. It destroyed the speed.
 Took it out.

Isn't that bad advertisement for the GC in D? Or has it something to do with DMD not being designed with a GC in mind?

dmd is written in C++, the collector must have been boehm

It wasn't IIRC. 'Twas in-house GC, no?
Apr 24 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 24 April 2014 at 06:16:05 UTC, Jacob Carlborg wrote:
 On 23/04/14 08:39, Walter Bright wrote:

 I made a build of dmd with a collector in it. It destroyed the 
 speed.
 Took it out.

Isn't that bad advertisement for the GC in D? Or has it something to do with DMD not being designed with a GC in mind?

Well, keep in mind we are comparing using the GC versus "doing nothing". I'd be interested in knowing the speed with *any* memory management model in DMD.
Apr 24 2014
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 22 April 2014 19:09, Iain Buclaw via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Testing a 2.065 pre-release snapshot against GDC. I see that std.algorithm
 now surpasses 2.1GBs of memory consumption when compiling unittests.  This
 is bringing my laptop down to its knees for a painful 2/3 minutes.

 This is time that could be better spent if the unittests where simply broken
 down/split up.

The final nail in the coffin was when my laptop locked up building phobos development using dmd. Went out and bought an SSD disk and replaced my crippled HDD drive - expecting no further problems in the near future...
Jun 21 2014
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Sat, Jun 21, 2014 at 09:34:35PM +0100, Iain Buclaw via Digitalmars-d wrote:
 On 22 April 2014 19:09, Iain Buclaw via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Testing a 2.065 pre-release snapshot against GDC. I see that
 std.algorithm now surpasses 2.1GBs of memory consumption when
 compiling unittests.  This is bringing my laptop down to its knees
 for a painful 2/3 minutes.

 This is time that could be better spent if the unittests where
 simply broken down/split up.

The final nail in the coffin was when my laptop locked up building phobos development using dmd. Went out and bought an SSD disk and replaced my crippled HDD drive - expecting no further problems in the near future...

It's long past due for std.algorithm to be broken up. And this isn't the first time problems like this came up, either. I vaguely recall someone working on an algorithms module, potentially splitting up some of the stuff from the current std.algorithm; whatever happened with that? In fact, splitting std.algorithm has been mentioned so many times, that I feel like I should just shut up and submit a PR for it instead. Even if it gets rejected, at least it gets things moving instead of everyone talking about it yet nothing ever comes of it. T -- Perhaps the most widespread illusion is that if we were in power we would behave very differently from those who now hold it---when, in truth, in order to get power we would have to become very much like them. -- Unknown
Jun 21 2014