www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 4358] New: Potential Memory Leaks in std.file.read() ?

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358

           Summary: Potential Memory Leaks in std.file.read() ?
           Product: D
           Version: D1
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: moi vermi.fr



Created an attachment (id=668)
Source file showing problem in memory managment

Hi,
it looks like the Garbage Collector don't free the memory used by
std.file.read().

Here a simple example :

[code]
import std.file;
import std.gc;

void main()
{
    for (int i=0; i<4096; i++)
    {
        void[] buffer = std.file.read("C:\\windows\\System32\\imageres.dll");

        std.gc.fullCollect();
    }
}
[/code]

With this code I get a memory usage of ~200 MB. Now, simply change it with :
[code]
import std.file;
import std.gc;

void main()
{
    for (int i=0; i<4096; i++)
    {
        void[] buffer = std.file.read("C:\\windows\\System32\\imageres.dll");

        std.gc.fullCollect();

        delete buffer;
    }
}
[/code]

The memory usage is now about 20 MB. (the size of the file). It poses me a big
problem with a program I'm writing, mermoy usage is rising until ~600 MB...
Must I add delete everywhere in my code ?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358


nfxjfg gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg gmail.com



Congratulations, you just found out that D's GC implementation is fucking shit.
There doesn't seem to be a bug in std.file.read. It just uses the GC to
allocate a big chunk of memory, and returns it. There are probably false
pointers. Note that the latest allocated buffer will not be free'd with
fullCollect() (because you still have a reference to the buffer in that
variable), but I assume imageres.dll is much smaller than 200 MB.

Rule of thumb: never allocate big arrays with the GC.

There's a patch in bug 3463 to alleviate the problem, but it seems to be
abandoned.

Though, there is one potential problem in the D2 version of std.file.read. It
contains this line:

auto buf = GC.malloc(size, GC.BlkAttr.NO_SCAN)[0 .. size];

buf is returned as void[] to the user. As far as I understand the recently
introduced changes to array appending, the memory block for an array slice
contains a hidden length field. This is not present in this case, thus random
things may happen if you try to append to the returned array. I do not know if
this actually is a problem.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358




It's very strange, I never had this kind of problem before. I must first finish
my software, so I will add delete() in the code, but as soon as I have free
time I will look if I can help with the D :) (I will take a look at GC's
implementation).
To answer you yes, imareges.dll is about 20 MB. I tried with a ~2MB file, and
all worked well, so I guess there a limit size where the GC stop working
correctly. I'll make some try this winter to see if I can make something. I
take a look in the patch, but I need more than 2 minutes to understand the
whole way the memory is managed in D.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358




Most likely, the GC _is_ working correctly. It's a conservative GC, and it
treats integers, floats, random binary data the same as actual pointers. The GC
doesn't have enough information to know what is pointer and what not. If an
integer value looks like a pointer to a memory block, it's called a "false
pointer". This may lead to memory leaks.

And the larger a memory block is, the higher the probability that a false
pointer exists and will prevent the memory block from being free'd. At least
that's my theory. Conclusion: use malloc() for really large arrays.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358





 Most likely, the GC _is_ working correctly. It's a conservative GC, and it
 treats integers, floats, random binary data the same as actual pointers. The GC
 doesn't have enough information to know what is pointer and what not. If an
 integer value looks like a pointer to a memory block, it's called a "false
 pointer". This may lead to memory leaks.
 
 And the larger a memory block is, the higher the probability that a false
 pointer exists and will prevent the memory block from being free'd. At least
 that's my theory. Conclusion: use malloc() for really large arrays.
Maybe, depend of how the GC scan the stack. I will think about it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358


Leandro Lucarella <llucax gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |llucax gmail.com



PDT ---

 Most likely, the GC _is_ working correctly. It's a conservative GC, and it
 treats integers, floats, random binary data the same as actual pointers.
This is false. The GC only treats void[] as a potential source of pointers, which makes sense, really. int[], float[], char[], byte[] are *NOT* scanned for pointers. For a binary buffer that doesn't have pointers in it, you should probably use ubyte[]. If std.file.read() return void[], I'd say that std.file.read() is broken. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358




PDT ---


 Most likely, the GC _is_ working correctly. It's a conservative GC, and it
 treats integers, floats, random binary data the same as actual pointers.
This is false. The GC only treats void[] as a potential source of pointers, which makes sense, really. int[], float[], char[], byte[] are *NOT* scanned for pointers. For a binary buffer that doesn't have pointers in it, you should probably use ubyte[]. If std.file.read() return void[], I'd say that std.file.read() is broken.
I should add that, even when ubyte[] is not scanned for pointers, you are right about allocating big chunks of memory in the GC could lead to leaks, as the probabilities of having a false pointer pointing to that chunk of data gets really high. There is another bug report (with a patch too, and from David Simcha too) that helps with this problem: bug 2927. The idea is to add a new property to the GC to mark a chunk as not having interior pointers. If you mark a chunk of memory as not having interior pointers, the chances of a false pointer pointing to the beginning of your memory chunk becomes *very* low. It's a real shame that David's patches didn't get into druntime, as they are very helpful avoiding this kind of issues. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358




std.file.read returns a void[], but it's allocated manually via gc.malloc(),
and I think the no-pointers thing was done right.

The correct solution to this problem would be to apply the patch in bug 3463
(precise heap scanning), and then probably investigate precise stack scanning.

I believe the patch in bug 2927 (no interior pointers attribute) would not roll
well with Andrei's safety ideology, so let's forget that.

Note that enhancement 2927 could be simulated by using a wrapper object, that
behaves like an array, but actually uses C malloc to allocate memory. When the
wrapper gets collected, its finalizer can free() the memory. Or using reference
counting, like (I believe) Andrei's TightArray (or what was it named) does.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 22 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4358




PDT ---

 std.file.read returns a void[], but it's allocated manually via gc.malloc(),
 and I think the no-pointers thing was done right.
And what is the rationale for not returning ubyte[]?
 The correct solution to this problem would be to apply the patch in bug 3463
 (precise heap scanning), and then probably investigate precise stack scanning.
You can't have full precise scanning in D, because of unions and other low-level constructs, so the problem of false pointers will be still there even with that patch (of course the chances of having a false pointer will be much lower).
 I believe the patch in bug 2927 (no interior pointers attribute) would not roll
 well with Andrei's safety ideology, so let's forget that.
That's stupid, it's not unsafer than NO_SCAN or casting a pointer to int, or whatever construct you like. That's obviously a construct that not everybody will use. So let's not forget that (and putting words in somebody else's mouth is not a good argument either =).
 Note that enhancement 2927 could be simulated by using a wrapper object, that
 behaves like an array, but actually uses C malloc to allocate memory. When the
 wrapper gets collected, its finalizer can free() the memory. Or using reference
 counting, like (I believe) Andrei's TightArray (or what was it named) does.
That's true, is a little more convoluted than having NO_INTERIOR and is not as easy to use by the compiler itself, but could be another option. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 22 2010