www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Minimize GC memory footprint

reply frame <frame86 live.com> writes:
Is there a way to force the GC to re-use memory in already 
existing pools?

I set maxPoolSize:1 to gain pools that can be quicker released 
after there no longer in use. This already reduces memory usage 
to 1:3. Sadly the application creates multiple pools that are not 
necessary in my POV - just fragmented temporary slice data like 
from format(). What can I do to optimize?
Jan 30
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 30 January 2021 at 16:42:35 UTC, frame wrote:
 Is there a way to force the GC to re-use memory in already 
 existing pools?

 I set maxPoolSize:1 to gain pools that can be quicker released 
 after there no longer in use. This already reduces memory usage 
 to 1:3. Sadly the application creates multiple pools that are 
 not necessary in my POV - just fragmented temporary slice data 
 like from format(). What can I do to optimize?
Do you want to optimize for reduced memory usage?
Jan 30
parent reply frame <frame86 live.com> writes:
On Saturday, 30 January 2021 at 22:57:41 UTC, Imperatorn wrote:
 On Saturday, 30 January 2021 at 16:42:35 UTC, frame wrote:
 Is there a way to force the GC to re-use memory in already 
 existing pools?

 I set maxPoolSize:1 to gain pools that can be quicker released 
 after there no longer in use. This already reduces memory 
 usage to 1:3. Sadly the application creates multiple pools 
 that are not necessary in my POV - just fragmented temporary 
 slice data like from format(). What can I do to optimize?
Do you want to optimize for reduced memory usage?
Yes, speed is secondary (long running daemon)
Jan 30
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Sunday, 31 January 2021 at 04:12:14 UTC, frame wrote:
 On Saturday, 30 January 2021 at 22:57:41 UTC, Imperatorn wrote:
 On Saturday, 30 January 2021 at 16:42:35 UTC, frame wrote:
 Is there a way to force the GC to re-use memory in already 
 existing pools?

 I set maxPoolSize:1 to gain pools that can be quicker 
 released after there no longer in use. This already reduces 
 memory usage to 1:3. Sadly the application creates multiple 
 pools that are not necessary in my POV - just fragmented 
 temporary slice data like from format(). What can I do to 
 optimize?
Do you want to optimize for reduced memory usage?
Yes, speed is secondary (long running daemon)
It says experimental, but it's fine: https://dlang.org/phobos/std_experimental_allocator.html
Jan 31
parent reply frame <frame86 live.com> writes:
On Sunday, 31 January 2021 at 12:14:53 UTC, Imperatorn wrote:

 It says experimental, but it's fine:

 https://dlang.org/phobos/std_experimental_allocator.html
Well, this looks very nice but I have to deal with GC as long I want to use other libraries that are relying on it or even just phobos. Conclusion so far (for Windows): 32bit: - GC just doesn't work at all 64bit: - Collections are rare. It can be necessary to call GC.collect() manually. - Scope guards to explicit clean up / free memory at function exit have no deep impact on most cases. - If your application should save memory call GC.minimize() when it's appropriate. It seems that calling GC.enable() if it's already enabled just disables the automatic GC again? Is this a bug? I cannot reproduce it outside my application yet since it's not clear when the GC starts collecting, but it always shows the same behaviour:
 // GC.enable();
Case A: The app is very kind in memory usage (~20 MB)
 GC.enable();
Case B: The app is consuming huge amount of memory (~900 MB)
 GC.disable();
 GC.enable();
Case A again
 GC.disable();
 GC.enable();
 GC.enable();
Case B again I also have to struggle what the specs' text actually mean:
 This function is reentrant, and must be called once for every 
 call to disable before automatic collections are enabled.
Feb 03
parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 3 February 2021 at 13:37:42 UTC, frame wrote:
 I have to deal with GC as long I want to use other libraries 
 that are relying on it or even just phobos.

 Conclusion so far (for Windows):

 32bit:
 - GC just doesn't work at all
?? Do you mean no collections happen? 32bit GC should just work.
 64bit:
 - Collections are rare. It can be necessary to call 
 GC.collect() manually.
 - Scope guards to explicit clean up / free memory at function 
 exit have no deep impact on most cases.
 - If your application should save memory call GC.minimize() 
 when it's appropriate.


 It seems that calling GC.enable() if it's already enabled just 
 disables the automatic GC again? Is this a bug? I cannot 
 reproduce it outside my application yet since it's not clear 
 when the GC starts collecting, but it always shows the same 
 behaviour:

 // GC.enable();
Case A: The app is very kind in memory usage (~20 MB)
 GC.enable();
Case B: The app is consuming huge amount of memory (~900 MB)
 GC.disable();
 GC.enable();
Case A again
 GC.disable();
 GC.enable();
 GC.enable();
Case B again
That looks like a bug indeed.
 I also have to struggle what the specs' text actually mean:

 This function is reentrant, and must be called once for every 
 call to disable before automatic collections are enabled.
I think it means that you need to make sure that enable() is called as many times as disable() is called before collection can happen automatically. — Bastiaan.
Feb 05
next sibling parent frame <frame86 live.com> writes:
On Friday, 5 February 2021 at 22:46:05 UTC, Bastiaan Veelo wrote:

 I think it means that you need to make sure that enable() is 
 called as many times as disable() is called before collection 
 can happen automatically.

 — Bastiaan.
Thanks, in the meanwhile I looked into the source:
 struct Gcx
 {
    uint disabled; // turn off collections if >0
 ...
 }
 void enable()
 {
    static void go(Gcx* gcx) nothrow
    {
        assert(gcx.disabled > 0);
        gcx.disabled--;
    }
    runLocked!(go, otherTime, numOthers)(gcx);
 }
 void disable()
 {
    static void go(Gcx* gcx) nothrow
    {
        gcx.disabled++;
    }
    runLocked!(go, otherTime, numOthers)(gcx);
 }
So that explains what's going on. The assertion should kick in to warn about this issue. But it doesn't work on user code. I assume the runtime is not compiled but just linked or do I need another argument switch?
Feb 05
prev sibling parent reply frame <frame86 live.com> writes:
On Friday, 5 February 2021 at 22:46:05 UTC, Bastiaan Veelo wrote:

 ?? Do you mean no collections happen? 32bit GC should just work.
No, it doesn't - this code fails on memory allocation and works fine with -m64 switch: import std.stdio; import core.memory : GC; void main() { void usage() { writefln("Usage: %.2f MiB / collected: %d", (cast(double) GC.stats.usedSize) / 1_048_576, GC.profileStats.numCollections); } void foo() { string[] s; scope (exit) { s.length = 0; } foreach (i; 0 .. 50_000_00) { s ~= "a"; } } foreach (i; 0 .. uint.max) { writefln("Round: %d", i + 1); foo(); GC.collect(); usage(); } } ... Round: 24 Usage: 1603.57 MiB / collected: 27 Round: 25 Usage: 1691.64 MiB / collected: 28 Round: 26 Usage: 1729.50 MiB / collected: 29 Round: 27 core.exception.OutOfMemoryError src\core\exception.d(647): Memory allocation failed
Feb 05
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 06/02/2021 3:32 PM, frame wrote:
 On Friday, 5 February 2021 at 22:46:05 UTC, Bastiaan Veelo wrote:
 
 ?? Do you mean no collections happen? 32bit GC should just work.
No, it doesn't - this code fails on memory allocation and works fine with -m64 switch: import std.stdio; import core.memory : GC; void main() {     void usage() {         writefln("Usage: %.2f MiB / collected: %d", (cast(double) GC.stats.usedSize) / 1_048_576, GC.profileStats.numCollections);     }     void foo() {         string[] s;         scope (exit) {             s.length = 0;
This won't do anything.
          }
 
          foreach (i; 0 .. 50_000_00) {
              s ~= "a";
          }
      }
 
      foreach (i; 0 .. uint.max) {
          writefln("Round: %d", i + 1);
Don't forget to stdout.flush; Otherwise stuff can get caught in the buffer before erroring out.
          foo();
          GC.collect();
          usage();
      }
 }
 
 ...
 Round: 24
 Usage: 1603.57 MiB / collected: 27
 Round: 25
 Usage: 1691.64 MiB / collected: 28
 Round: 26
 Usage: 1729.50 MiB / collected: 29
 Round: 27
 
 core.exception.OutOfMemoryError src\core\exception.d(647): Memory 
 allocation failed
Turn on the precise GC, 32bit is a bit too small of a range and you can get false positives like in this case (at least looks like it).
Feb 06
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 6 February 2021 at 09:42:38 UTC, rikki cattermole 
wrote:
 On 06/02/2021 3:32 PM, frame wrote:
  [...]
This won't do anything.
  [...]
Don't forget to stdout.flush; Otherwise stuff can get caught in the buffer before erroring out.
 [...]
Turn on the precise GC, 32bit is a bit too small of a range and you can get false positives like in this case (at least looks like it).
For reference, how does one turn on precise GC?
Feb 06
parent reply Siemargl <inqnone gmail.com> writes:
On Saturday, 6 February 2021 at 11:20:18 UTC, Imperatorn wrote:
 On Saturday, 6 February 2021 at 09:42:38 UTC, rikki cattermole 
 wrote:
 On 06/02/2021 3:32 PM, frame wrote:
  [...]
This won't do anything.
  [...]
Don't forget to stdout.flush; Otherwise stuff can get caught in the buffer before erroring out.
 [...]
Turn on the precise GC, 32bit is a bit too small of a range and you can get false positives like in this case (at least looks like it).
For reference, how does one turn on precise GC?
https://dlang.org/spec/garbage.html#gc_config Strange things happens: - precise scanning dont change result - OOM same round 27 --DRT-gcopt=help wont show used gc implementation, also cleanup type not printed - maxPoolSize:N dont limit total size of GC - in gc:manual mode GC.collect() not releasing memory When i print free GC memory, it seems to memory leaking writefln("Usage: %.2f MiB (free %.2f MiB) / collected: %d", (cast(double) GC.stats.usedSize) / 1_048_576, (cast(double) GC.stats.freeSize) / 1_048_576, GC.profileStats.numCollections); stdout.flush();
Feb 06
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 07/02/2021 12:38 AM, Siemargl wrote:
 On Saturday, 6 February 2021 at 11:20:18 UTC, Imperatorn wrote:
 On Saturday, 6 February 2021 at 09:42:38 UTC, rikki cattermole wrote:
 On 06/02/2021 3:32 PM, frame wrote:
  [...]
This won't do anything.
  [...]
Don't forget to stdout.flush; Otherwise stuff can get caught in the buffer before erroring out.
 [...]
Turn on the precise GC, 32bit is a bit too small of a range and you can get false positives like in this case (at least looks like it).
For reference, how does one turn on precise GC?
https://dlang.org/spec/garbage.html#gc_config Strange things happens: - precise scanning dont change result - OOM same round 27
Okay, its still seeing something is alive then.
 --DRT-gcopt=help  wont show used  gc implementation, also cleanup type 
 not printed
https://github.com/dlang/druntime/blob/master/src/core/gc/config.d#L36
 - maxPoolSize:N  dont limit total size of GC
Shouldn't change anything, except make OOM happen faster.
 - in gc:manual mode GC.collect() not releasing memory
https://github.com/dlang/druntime/blob/master/src/gc/impl/manual/gc.d#L84
 
 When i print free GC memory, it seems to memory leaking
 writefln("Usage: %.2f MiB (free %.2f MiB) / collected: %d",
      (cast(double) GC.stats.usedSize) / 1_048_576,
      (cast(double) GC.stats.freeSize) / 1_048_576, 
 GC.profileStats.numCollections);
 stdout.flush();
I've compiled and ran it under ldc. Dmd in 32bit mode is certainly doing something that the GC doesn't appreciate. I can reproduce it with -m32mscoff as well. So yeah dmd specific all right.
Feb 06
parent reply frame <frame86 live.com> writes:
On Saturday, 6 February 2021 at 13:30:03 UTC, rikki cattermole 
wrote:

 Okay, its still seeing something is alive then.
That's why I used the scope guard. I know it shouldn't have any effect but I want to give the GC an extra hint ;)
 I've compiled and ran it under ldc. Dmd in 32bit mode is 
 certainly doing something that the GC doesn't appreciate. I can 
 reproduce it with -m32mscoff as well. So yeah dmd specific all 
 right.
The sad story never ends. But seriously are there no runtime tests before releasing the next DMD? The GC is a main feature of D and such things give a bad impression.
Feb 06
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 07/02/2021 4:22 AM, frame wrote:
 On Saturday, 6 February 2021 at 13:30:03 UTC, rikki cattermole wrote:
 
 Okay, its still seeing something is alive then.
That's why I used the scope guard. I know it shouldn't have any effect but I want to give the GC an extra hint ;)
The GC shouldn't be aware of the scope guard. It expands out into a try finally block.
 I've compiled and ran it under ldc. Dmd in 32bit mode is certainly 
 doing something that the GC doesn't appreciate. I can reproduce it 
 with -m32mscoff as well. So yeah dmd specific all right.
The sad story never ends. But seriously are there no runtime tests before releasing the next DMD? The GC is a main feature of D and such things give a bad impression.
Nah, this is old. It is also bad D code. Allocate up front and then set. T[] buffer; buffer.length = 1_000_000; foreach(i, v; source[0 .. 1_000_000]) { buffer[i] = someOp(v); } This will be significantly faster, as it won't require allocating more than once and will prevent heap fragmentation which 32bit is severely affected by (hence why precise GC is important for testing on this target).
Feb 06
parent reply frame <frame86 live.com> writes:
On Saturday, 6 February 2021 at 15:45:47 UTC, rikki cattermole 
wrote:

 The GC shouldn't be aware of the scope guard. It expands out 
 into a try finally block.
But .length = 0 should.
 Nah, this is old. It is also bad D code.

 Allocate up front and then set.
I agree but it has to work anyway. New users do not ask about how they can write friendly GC code.
 This will be significantly faster, as it won't require 
 allocating more than once and will prevent heap fragmentation 
 which 32bit is severely affected by (hence why precise GC is 
 important for testing on this target).
Ah, yes. I read about it in another thread. This is a specific problem for any GC that work on the same way, not only in D. But if it's a known problem, the default architecture should be 64bit where things just work. 32bit should be an opt-in then. Default settings should work out of the box. If not - it's bad for reputation of the language.
Feb 06
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 6 February 2021 at 17:50:18 UTC, frame wrote:
 On Saturday, 6 February 2021 at 15:45:47 UTC, rikki cattermole 
 wrote:
 Default settings should work out of the box. If not - it's bad 
 for reputation of the language.
Given that 32-bit has been the default on Windows for D's entire lifetime, I don't expect this is so common of an issue. More harmful would be requiring Visual Studio to compile with the default settings. That said, dub already uses -m64 by default on Windows. And the goal is to enable 64-bit in dmd on Windows by default. That's the reason the MinGW-based link libraries and the LDC linker were added to the distribution, so that it will work out of the box. I don't know what the timetable is supposed to be, but at some point after that is rock solid, the switch to 64-bit by default will be made.
Feb 06
next sibling parent Siemargl <inqnone gmail.com> writes:
On Saturday, 6 February 2021 at 19:10:14 UTC, Mike Parker wrote:
 On Saturday, 6 February 2021 at 17:50:18 UTC, frame wrote:
 On Saturday, 6 February 2021 at 15:45:47 UTC, rikki cattermole 
 wrote:
 Default settings should work out of the box. If not - it's bad 
 for reputation of the language.
Given that 32-bit has been the default on Windows for D's entire lifetime, I don't expect this is so common of an issue. More harmful would be requiring Visual Studio to compile with the default settings. That said, dub already uses -m64 by default on Windows. And the goal is to enable 64-bit in dmd on Windows by default. That's the reason the MinGW-based link libraries and the LDC linker were added to the distribution, so that it will work out of the box. I don't know what the timetable is supposed to be, but at some point after that is rock solid, the switch to 64-bit by default will be made.
This example seems been as a corner uglyness of AA realisation. 64bit heal a problem, but uses about 500Mb of RAM continuosly for this simple example, so its only patching sinkin' ship :-(
Feb 06
prev sibling parent Siemargl <inqnone gmail.com> writes:
On Saturday, 6 February 2021 at 19:10:14 UTC, Mike Parker wrote:
 On Saturday, 6 February 2021 at 17:50:18 UTC, frame wrote:
Sorry, i forgot mem leak. Or maybe i incorrect understand Gc counters So log Usage: 698.46 MiB (free 187.42 MiB) / collected: 14 Round: 12 Usage: 759.72 MiB (free 184.16 MiB) / collected: 15 ... Usage: 802.00 MiB (free 202.88 MiB) / collected: 16 Usage: 871.38 MiB (free 197.50 MiB) / collected: 17 Usage: 935.64 MiB (free 200.24 MiB) / collected: 18 Usage: 995.86 MiB (free 210.02 MiB) / collected: 19 Usage: 1071.70 MiB (free 207.18 MiB) / collected: 20 Usage: 1139.22 MiB (free 215.66 MiB) / collected: 21 Usage: 1214.83 MiB (free 219.05 MiB) / collected: 22 Usage: 1297.88 MiB (free 218.00 MiB) / collected: 23 Usage: 1378.43 MiB (free 222.45 MiB) / collected: 24 Usage: 1416.72 MiB (free 184.16 MiB) / collected: 25 Usage: 1459.00 MiB (free 229.88 MiB) / collected: 26 Usage: 1501.28 MiB (free 187.60 MiB) / collected: 27 Usage: 1543.55 MiB (free 236.32 MiB) / collected: 28 Usage: 1585.83 MiB (free 194.05 MiB) / collected: 29 Usage: 1628.11 MiB (free 245.77 MiB) / collected: 30 Round: 28 Usage: 1670.39 MiB (free 203.49 MiB) / collected: 31 Round: 29 So GC.used is growing, but GC.free is stable
Feb 06
prev sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 6 February 2021 at 17:50:18 UTC, frame wrote:>

 But .length = 0 should.
What do you expect it to do in this case?
Feb 06
parent reply frame <frame86 live.com> writes:
On Saturday, 6 February 2021 at 19:13:33 UTC, Mike Parker wrote:
 On Saturday, 6 February 2021 at 17:50:18 UTC, frame wrote:>

 But .length = 0 should.
What do you expect it to do in this case?
Don't know - some compiler optimization? :D On Saturday, 6 February 2021 at 19:31:39 UTC, Siemargl wrote:
 64bit heal a problem, but uses about 500Mb of RAM continuosly 
 for this simple example, so its only patching sinkin' ship :-(
Hmmm.. with -m64 it's reporting 80 MB used, 203 MB are really marked as private bytes. Constant. If I use GC.minimize() it goes up and down and sometimes consumes more than 203 MB. Best is 100MB. But it doesn't leak endlessly like the 32bit variant.
Feb 06
parent reply frame <frame86 live.com> writes:
On Saturday, 6 February 2021 at 20:24:00 UTC, frame wrote:

 Hmmm.. with -m64 it's reporting 80 MB used, 203 MB are really 
 marked as private bytes. Constant. If I use GC.minimize() it 
 goes up and down and sometimes consumes more than 203 MB. Best 
 is 100MB. But it doesn't leak endlessly like the 32bit variant.
Update: Thanks to Adam's bug report: https://issues.dlang.org/show_bug.cgi?id=21550 My poorly delivered example modified foo() runs also smooth on 32bit now:
void foo() {
    string[] s;

    foreach (i; 0 .. 50_000_00) {
        s ~= "a";
    }
		
    // GC.free(s.ptr);
    GC.free(GC.addrOf(s.ptr));
}
I think the automatic GC is also affected by this issue.
Feb 08
parent reply Siemargl <inqnone gmail.com> writes:
On Tuesday, 9 February 2021 at 04:05:04 UTC, frame wrote:
 On Saturday, 6 February 2021 at 20:24:00 UTC, frame wrote:

 Hmmm.. with -m64 it's reporting 80 MB used, 203 MB are really 
 marked as private bytes. Constant. If I use GC.minimize() it 
 goes up and down and sometimes consumes more than 203 MB. Best 
 is 100MB. But it doesn't leak endlessly like the 32bit variant.
str += "1" added 5 million times. Then i fix this using StringBuilder, as documented. It works fine. Next i searched flang forums for D's StringBuilder - found this https://forum.dlang.org/post/l667ab$cfa$1 digitalmars.com auto strBuilder = appender!string; foreach (i; 0 .. 50_000_00) { strBuilder.put("a"); And it works too, for 32-bit also =) Consuming about 100MB RAM.
Feb 13
parent reply frame <frame86 live.com> writes:
On Saturday, 13 February 2021 at 17:54:53 UTC, Siemargl wrote:

 And it works too, for 32-bit also =)
 Consuming about 100MB RAM.
Yes, Appender is nice but I had no control about .data since the real property is private so I chose that edgy example to find the problem with the GC. As someone mentioned before, in a real application you would choose some pre-allocation like reserve() on Appender instead which performs better.
Feb 13
parent Siemargl <inqnone gmail.com> writes:
On Saturday, 13 February 2021 at 19:14:32 UTC, frame wrote:
 On Saturday, 13 February 2021 at 17:54:53 UTC, Siemargl wrote:

 And it works too, for 32-bit also =)
 Consuming about 100MB RAM.
Yes, Appender is nice but I had no control about .data since the real property is private so I chose that edgy example to find the problem with the GC. As someone mentioned before, in a real application you would choose some pre-allocation like reserve() on Appender instead which performs better.
LDC 1.24 is unaffected of this bug and x64 target consume less memory.
Feb 13
prev sibling parent Max Haughton <maxhaton gmail.com> writes:
On Saturday, 30 January 2021 at 16:42:35 UTC, frame wrote:
 Is there a way to force the GC to re-use memory in already 
 existing pools?

 I set maxPoolSize:1 to gain pools that can be quicker released 
 after there no longer in use. This already reduces memory usage 
 to 1:3. Sadly the application creates multiple pools that are 
 not necessary in my POV - just fragmented temporary slice data 
 like from format(). What can I do to optimize?
I can't tell you much about the inner workings of the GC, but maybe take a look at experimental.allocator and see if anything there can help you (specifically that you understand your program better than the GC)
Jan 31