www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Memory issues. GC not giving back memory to OS?

reply Cristian Becerescu <cristian.becerescu yahoo.com> writes:
Hi!

A little bit of context first:

I was using DPP and I noticed huge amounts of RAM being used.
So I used valgrind massif and found out that 98% of the process’ 
memory (~6GB) was allocated for arrays / Appender with mmap.

I then performed a simple test where I incrementally appended 
2^30 integers (4GB) to a dynamic array (memory measurements are 
the same for Appender).
-> Memory used (peak; increasing towards the end of execution): 
~7GB
-> capacity == 1.107 * size (at the end of the program)

This is a bit odd, because 1.107 * 2^30 is roughly 4.4GB, and the 
peak memory consumption was 7GB. Apparently, the GC can correctly 
collect the memory when manually calling collect() at the end of 
appending, but that memory (we are talking 7 - 4.4 = 2.6GB) is 
never given back to the system. At least this is our intuition 
after making those observations.

I have created a gist with the test code and results (thanks Edi 
for augmenting the test code to profile the GC): 
https://gist.github.com/cbecerescu/e6606a8530c56ae06c52e5b1cd32b31f

Just some notes:
- if reserving 2^30 elements for the array (or Appender) 
beforehand, memory peaks are at 4GB
- C++'s std::vector, without reservation, never gets beyond 4GB 
and has size == capacity at the end
Apr 21
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, April 21, 2020 12:31:28 PM MDT Cristian Becerescu via 
Digitalmars-d wrote:
 This is a bit odd, because 1.107 * 2^30 is roughly 4.4GB, and the
 peak memory consumption was 7GB. Apparently, the GC can correctly
 collect the memory when manually calling collect() at the end of
 appending, but that memory (we are talking 7 - 4.4 = 2.6GB) is
 never given back to the system. At least this is our intuition
 after making those observations.
It is my understanding that under normal circumstances, the GC will never return memory to the OS until the program terminates but rather will just keep it around to reuse when more memory needs to be allocated. However, the documentation for core.memory's GC.minimize says that it will return free memory to the OS. So, if you need memory to be returned to the OS while the program is running, you'll probably need to use that. - Jonathan M Davis
Apr 21
parent Arafel <er.krali gmail.com> writes:
On 21/4/20 22:23, Jonathan M Davis wrote:
 On Tuesday, April 21, 2020 12:31:28 PM MDT Cristian Becerescu via
 Digitalmars-d wrote:
 This is a bit odd, because 1.107 * 2^30 is roughly 4.4GB, and the
 peak memory consumption was 7GB. Apparently, the GC can correctly
 collect the memory when manually calling collect() at the end of
 appending, but that memory (we are talking 7 - 4.4 = 2.6GB) is
 never given back to the system. At least this is our intuition
 after making those observations.
It is my understanding that under normal circumstances, the GC will never return memory to the OS until the program terminates but rather will just keep it around to reuse when more memory needs to be allocated. However, the documentation for core.memory's GC.minimize says that it will return free memory to the OS. So, if you need memory to be returned to the OS while the program is running, you'll probably need to use that. - Jonathan M Davis
I had a similar issue some time ago, and found that the memory wouldn't be returned to the OS even after the GC had freed it. I had to call malloc_trim [1] manually, this seems to be a libc / OS issue (I'm exclusively using linux, I don't know if this is also an issue with Windows or Mac). Could this be also happening here? A. [1]: http://man7.org/linux/man-pages/man3/malloc_trim.3.html
Apr 22
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/21/20 2:31 PM, Cristian Becerescu wrote:

 I then performed a simple test where I incrementally appended 2^30 
 integers (4GB) to a dynamic array (memory measurements are the same for 
 Appender).
 -> Memory used (peak; increasing towards the end of execution): ~7GB
 -> capacity == 1.107 * size (at the end of the program)
 
 This is a bit odd, because 1.107 * 2^30 is roughly 4.4GB, and the peak 
 memory consumption was 7GB. Apparently, the GC can correctly collect the 
 memory when manually calling collect() at the end of appending, but that 
 memory (we are talking 7 - 4.4 = 2.6GB) is never given back to the 
 system. At least this is our intuition after making those observations.
The GC doesn't automatically give back memory to the OS. And it really can't. There's a GC.minimize function, but that is only going to release memory to the OS that can be released. It highly depends on the implementation and the mechanism the OS gives to access memory. So for example, if all the "free" memory is in the middle of the OS-provided memory segment, then it can't give it back.
 
 I have created a gist with the test code and results (thanks Edi for 
 augmenting the test code to profile the GC): 
 https://gist.github.com/cbecerescu/e6606a8530c56ae06c52e5b1cd32b31f
 
 Just some notes:
 - if reserving 2^30 elements for the array (or Appender) beforehand, 
 memory peaks are at 4GB
Right, because it will never reallocate, it just grows within the original memory block. This is what I'd recommend for something like this. If you don't reserve, then as it grows, it needs a bigger and bigger segment. And it's not always going to reuse memory that you already used on your way up. Why? Because it can't get a contiguous segment that is free and fits the new requirement. It does try extending in-place if it can, but once it can't, that memory is not usable because the segment is too small to fit your massive data. But I'd say that the stats you are printing are a bit puzzling. Why does it all of a sudden allow you to collect at the end when it didn't before? It does seem like your output doesn't match your example code. But there are a number of reasons why the GC may not do what you are expecting, including possible bugs in the GC.
 - C++'s std::vector, without reservation, never gets beyond 4GB and has 
 size == capacity at the end
C++ frees the original memory immediately when growing. So it's going to be more memory efficient. You are never going to match a manually managed memory efficiency in terms of space used with a GC. -Steve
Apr 21
parent reply Arun Chandrasekaran <aruncxy gmail.com> writes:
On Tuesday, 21 April 2020 at 20:29:37 UTC, Steven Schveighoffer 
wrote:
 
 C++ frees the original memory immediately when growing. So it's 
 going to be more memory efficient. You are never going to match 
 a manually managed memory efficiency in terms of space used 
 with a GC.
How much of Phobos is betterC compatible? I encountered the same issues with GC couple of years ago and abandoned our plans to migrate from C++ to D for one of our core products. (I'm not encouraging anyone to do the same, do your own analysis and take the decision.) To see recent posts about chasing Rust with live with all these existing baggage... Hmm.. Don't know what to say... This might excite a PL theorist/researcher, but not a programmer who can't get his app to work in the most basic form... Walter, memory efficiency first please, arcane safety later. -- If you don't have anything nice to say, don't say anything at all.
Apr 22
parent reply welkam <wwwelkam gmail.com> writes:
On Wednesday, 22 April 2020 at 07:25:34 UTC, Arun Chandrasekaran 
wrote:
 Walter, memory efficiency first please, arcane safety later.
You can do everything in D that you can do in C++ when it comes to memory management. Also a good system that tracks pointers can be used to turn GC allocations to malloc/free pair and some allocations can be turnet to stack allocations (llvm does some of that). Safety features can be used as performance features with some additional work.
Apr 22
parent Arun Chandrasekaran <aruncxy gmail.com> writes:
On Wednesday, 22 April 2020 at 13:13:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 07:25:34 UTC, Arun 
 Chandrasekaran wrote:
 Walter, memory efficiency first please, arcane safety later.
You can do everything in D that you can do in C++ when it comes to memory management.
We can do the same with Java as well, use JNI, manual memory management, etc. But will we? So when I say "we can't" it doesn't mean technically we can't. It is just that the alternatives are better than what's being offered in D.
Apr 22
prev sibling parent ikod <geller.garry gmail.com> writes:
On Tuesday, 21 April 2020 at 18:31:28 UTC, Cristian Becerescu 
wrote:
 Hi!

 A little bit of context first:

 I was using DPP and I noticed huge amounts of RAM being used.
 So I used valgrind massif and found out that 98% of the 
 process’ memory (~6GB) was allocated for arrays / Appender with 
 mmap.

 I then performed a simple test where I incrementally appended 
 2^30 integers (4GB) to a dynamic array (memory measurements are 
 the same for Appender).
 -> Memory used (peak; increasing towards the end of execution): 
 ~7GB
 -> capacity == 1.107 * size (at the end of the program)

 This is a bit odd, because 1.107 * 2^30 is roughly 4.4GB, and 
 the peak memory consumption was 7GB. Apparently, the GC can 
 correctly collect the memory when manually calling collect() at 
 the end of appending, but that memory (we are talking 7 - 4.4 = 
 2.6GB) is never given back to the system. At least this is our 
 intuition after making those observations.

 I have created a gist with the test code and results (thanks 
 Edi for augmenting the test code to profile the GC): 
 https://gist.github.com/cbecerescu/e6606a8530c56ae06c52e5b1cd32b31f

 Just some notes:
 - if reserving 2^30 elements for the array (or Appender) 
 beforehand, memory peaks are at 4GB
 - C++'s std::vector, without reservation, never gets beyond 4GB 
 and has size == capacity at the end
IMHO this happens because each time you requested larger contiguous memory region. Runtime have to allocate (or reallocate) larger piece of memory (at higher addresses), copy old content and then release old piece of memory. But old piece of memory can't be released to OS as heap area can be released only from the top.
Apr 21