www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A possible leak

reply bearophile <bearophileHUGS lycos.com> writes:
Time ago Jon Harrop has found a memory leak in a F# program running on Mono, he
has reduced the program to a minimal test case. I have translated that code to
D2 and Python2:

-----------------

struct Node {
    int data;
    Node* next;

    this(int data, Node* next) {
        this.data = data;
        this.next = next;
    }
}

void main() {
    Node* lh = new Node(1, null);
    lh.next = lh;

    while (true) {
        Node* lh2 = lh;
        lh2.next = new Node(2, lh2.next);
        lh = lh2.next;
        lh2 = lh;
        lh2.next = lh2.next.next;
    }
}


-----------------

class Node:
    def __init__(self, data, next):
        self.data = data
        self.next = next

def main():
    lh = Node(1, None)
    lh.next = lh

    while True:
        lh2 = lh
        lh2.next = Node(2, lh2.next)
        lh = lh2.next
        lh2 = lh
        lh2.next = lh2.next.next

main()

-----------------

It just creates a circular single-linked list that represents a queue. And then
adds an item and pops another out. The popped out items have the "next" field
that point to the struct itself.

The D version of the code eats more and more RAM, while the Python version runs
at constant memory (probably because CPython GC is a reference count augmented
with a cycle detector, that helps in such situations).

This was an answer on the mono list regarding this GC problem (that the dotnet
GC doesn't share, so that F# program doesn't leak on dotnet):
http://lists.ximian.com/pipermail/mono-list/2009-February/041236.html

This topic may interest Lucarella too.

Can the D GC be improved to avoid such problem, maybe like the CPython GC? And
is such improvement worth it (= useful in practical programs)?

Bye,
bearophile
Oct 01 2009
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
bearophile, el  1 de octubre a las 07:23 me escribiste:
 Can the D GC be improved to avoid such problem, maybe like the CPython
 GC? And is such improvement worth it (= useful in practical programs)?
I've tested it with LDC (using classes instead of structs, because D1 doesn't support struct constructors) and it works with a perfectly stable memory usage. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- A lo que Peperino respondióles: aquel que tenga sabañones que se los moje, aquel que padece calvicie no padece un osito, no es bueno comer lechón en día de gastritis, no mezcleis el vino con la sandía, sacad la basura después de las ocho, en caso de emergencia rompa el vidrio con el martillo, a cien metros desvio por Pavón. -- Peperino Pómoro
Oct 01 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Leandro Lucarella:

 I've tested it with LDC (using classes instead of structs, because D1
 doesn't support struct constructors) and it works with a perfectly stable
 memory usage.
That's a different situation, the compiler is different, the data structure is different, and the runtime too may be a little different. I have tested with LDC with both structs and classes and the two programs don't leak. Bye, bearophile
Oct 01 2009
parent Leandro Lucarella <llucax gmail.com> writes:
bearophile, el  1 de octubre a las 12:39 me escribiste:
 Leandro Lucarella:
 
 I've tested it with LDC (using classes instead of structs, because D1
 doesn't support struct constructors) and it works with a perfectly stable
 memory usage.
That's a different situation, the compiler is different, the data structure is different, and the runtime too may be a little different. I have tested with LDC with both structs and classes and the two programs don't leak.
Sure, the point was: this is not a problem inherently to the D GC, it's just a bug in a particular (version of the) compiler you are testing. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si pudiera acercarme un poco más, hacia vos Te diría que me tiemblan los dos pies, cuando me mirás Si supieras todo lo que me costó, llegar Hoy sabrías que me cuesta respirar, cuando me mirás
Oct 01 2009
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 01 Oct 2009 14:23:48 +0300, bearophile <bearophileHUGS lycos.com>  
wrote:

 Time ago Jon Harrop has found a memory leak in a F# program running on  
 Mono, he has reduced the program to a minimal test case. I have  
 translated that code to D2 and Python2:
Sorry for the late reply. I wanted to get to the bottom of this, and facilitate others to do the same easily, thus this motivated me to improve Diamond to make that possible. What basically happens here is that the program creates an infinite contiguous linked list. Even though there is no reference to the original node, a stray reference to any of the nodes in the list will prevent any nodes following it to be freed. The program's behavior was inconsistent on my machine (it either leaked or didn't leak memory), because the memory pools were allocated at different memory addresses. *** SHAMELESS PLUG ALERT *** Diamond ( my post-mortem memory debugger - http://dsource.org/projects/diamond ) should allow to diagnose these situations easily, more so with the latest additions. Here's an example of how it's possible to easily analyze this program's memory behavior and find the source of the leak: Firstly, the program was backported to D1 (I don't use D2, thus Diamond doesn't support it ATM). Its execution time is limited, to allow a clean exit. The final source is as follows: ------------------------------------------- import diamond; struct Node { int data; Node* next; } void main() { Node* lh = new Node; lh.data = 1; lh.next = lh; for(int i=0;i<1000000;i++) { Node* lh2 = lh; lh = new Node; lh.data = 2; lh.next = lh2.next; lh2.next = lh; lh2 = lh; lh2.next = lh2.next.next; } } ------------------------------------------- The diamond runtime module was configured with only the MEMLOG and MEMLOG_CRC32 options. The program was compiled and executed; the runtime module created a .mem file which contained a log of memory operations, and full contents for every garbage collect. This file was loaded into the memory log analyzer:
 Diamond Memory Log Analyzer, v0.2
 by Vladimir "CyberShadow" Panteleev, 2008-2009

 Using the most recent memory dump file.
 Loading diamond_2009-09-04_02.25.49.mem...
 1000013 events loaded.
 Using the most recent map file.
 1047 symbols loaded from test1final.map.
 dumps
65023 02:25:49: MEMDUMP ( 256/ 256/ 256) 89503 02:25:49: MEMDUMP ( 256/ 256/ 256) 155041 02:25:49: MEMDUMP ( 512/ 512/ 512) 286115 02:25:49: MEMDUMP ( 1024/ 1024/ 1024) 482725 02:25:50: MEMDUMP ( 1792/ 1792/ 1792) 744871 02:25:51: MEMDUMP ( 2816/ 2816/ 2816)
MEMDUMP events are generated automatically before garbage collects. Judging by the history, the GC failed to free up space starting with the second collect. Let's look at the memory map before and after the first collect:
 nextdump
D65023> map Page map for pool 02090000 - 02190000 ( 256/ 256/ 256 pages): (page size = 0x1000) +10000 +20000 +30000 02090000: 6744444444444444 4444444444444444 4444444444444444 4444444444444444 020D0000: 4444444444444444 4444444444444444 4444444444444444 4444444444444444 02110000: 4444444444444444 4444444444444444 4444444444444444 4444444444444444 02150000: 4444444444444444 4444444444444444 4444444444444444 4444444444444444 D65023> n M65024> map Page map for pool 02090000 - 02190000 ( 162/ 256/ 256 pages): (page size = 0x1000) +10000 +20000 +30000 02090000: 674............. ................ ................ ................ 020D0000: ................ ................ .444444444444444 4444444444444444 02110000: 4444444444444444 4444444444444444 4444444444444444 4444444444444444 02150000: 4444444444444444 4444444444444444 4444444444444444 4444444444444444
We can see the list's nodes filling up all available memory, before trigging a collect (4 - pages containing objects <=16 bytes in size). However, the collect stopped at the page at 020F1000. Let's get a closer look:
 M65024> binmap 020F1000
 Bin map for page 020F1000 - 020F2000:
 (object size = 0x010)            +100             +200             +300
 020F1000: DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD  
 DDDDDDDDDDDDDDDD
 020F1400: DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD D...............  
 ................
 020F1800: ................ ................ ................  
 ................
 020F1C00: ................ ................ ................  
 ................
Note that inside a page, objects are allocated in reverse order (due to how freelists are constructed). We can conclude that the rest of the linked list is not freed due to a stray reference to the node at 020F1600. To find the stray reference, we use the "refs" command:
 M65024> refs 020F1600 020F1610
 00433D2C -> 020F160D - in root range 0042F000 - 00438A00
00433D2C is located within the program's static data segment. Let's see if it's in the map file:
 M65024> symbolat 00433D2C
 00433D2C - ___rtl_criticalsection +30
This symbol is from snn.lib, the standard C library. We can also dump the memory region around that address to examine it:
 M65024> dump 00433D00
 00433D00: FF FF FF FF 00 00 00 00  00 00 00 00 00 00 00 00 | ....
 00433D10: 00 00 00 00 01 16 02 02  03 02 04 18 05 0D 06 09 |      
 ............
 00433D20: 07 0C 08 0C 09 0C 0A 07  0B 08 0C 16 0D 16 0F 02 |  
 ................
 00433D30: 10 0D 11 12 12 02 21 0D  35 02 41 0D 43 02 50 11 |  
 ......!.5.A.C.P.
 00433D40: 52 0D 53 0D 57 16 59 0B  6C 0D 6D 20 70 1C 72 09 | R.S.W.Y.l.m  
 p.r.
 00433D50: 76 16 80 0A 81 0A 82 09  83 16 84 0D 91 29 9E 0D |  
 v............)..
 00433D60: A1 02 A4 0B A7 0D B7 11  CE 02 D7 0B 00 00 00 00 | ............
 00433D70: 7F 13 00 00 25 73 3A 20  00 00 00 00 25 73 0A 00 | ..  %s:      
 %s.
 00433D80: 4E 6F 20 65 72 72 6F 72  00 4F 70 65 72 61 74 69 | No error  
 Operati
 00433D90: 6F 6E 20 6E 6F 74 20 70  65 72 6D 69 74 74 65 64 | on not  
 permitted
 00433DA0: 00 4E 6F 20 73 75 63 68  20 66 69 6C 65 20 6F 72 |  No such  
 file or
 00433DB0: 20 64 69 72 65 63 74 6F  72 79 00 4E 6F 20 73 75 |  directory  
 No su
 00433DC0: 63 68 20 70 72 6F 63 65  73 73 00 49 6E 74 65 72 | ch process  
 Inter
 00433DD0: 72 75 70 74 65 64 20 66  75 6E 63 74 69 6F 6E 20 | rupted  
 function
 00433DE0: 63 61 6C 6C 00 49 6E 70  75 74 2F 6F 75 74 70 75 | call  
 Input/outpu
 00433DF0: 74 20 65 72 72 6F 72 00  4E 6F 20 73 75 63 68 20 | t error No  
 such
=== Conclusion === This problem isn't something that can be easily fixed. Any stray reference to any node in the list will prevent nodes following it from ever being deallocated. It can be avoided by avoiding memory structures that could be vulnerable to one stray pointer causing the entire structure "leaking". For this particular case, the stray reference came from a C library's static data. Can the D compiler be modified to distinguish between C and D object files, and only add D object files' data to the list of GC roots? -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 03 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Vladimir Panteleev:

 This problem isn't something that can be easily fixed. Any stray reference  
 to any node in the list will prevent nodes following it from ever being  
 deallocated.
Whoa, a very long and detailed answer :-) Thank you. Do you know why in LDC this program seems to work in constant memory? Bye, bearophile
Oct 03 2009
parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 04 Oct 2009 09:07:59 +0300, bearophile <bearophileHUGS lycos.com>  
wrote:

 Vladimir Panteleev:

 This problem isn't something that can be easily fixed. Any stray  
 reference
 to any node in the list will prevent nodes following it from ever being
 deallocated.
Whoa, a very long and detailed answer :-) Thank you. Do you know why in LDC this program seems to work in constant memory?
My guess would be that it doesn't add the entire static data segment of the standard C library to the list of GC roots, therefore the chances of a stray pointer are much lower. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 04 2009