digitalmars.D - TempAlloc: Request for Review For Inclusion in core.memory
- dsimcha (37/37) Apr 19 2011 My TempAlloc module's been ready for review for a while, but
- Andrej Mitrovic (18/18) Apr 19 2011 Nice work. I've tried it out with allocating some large float arrays,
- dsimcha (7/25) Apr 19 2011 Ironically, these benchmarks just show that the C heap is faster than th...
- Andrej Mitrovic (9/9) Apr 19 2011 Ok, perhaps you should put a note in the docs about falling back to
- dsimcha (8/11) Apr 19 2011 Right. This is roughly equivalent to allocating a new segment. I might...
- Denis Koroskin (11/41) Apr 19 2011 I think you are wrong about contiguous memory block. I believe virtual
- Daniel Gibson (3/50) Apr 19 2011 That doesn't help if the logical memory of your process is fragmented as
- Andrej Mitrovic (2/2) Apr 19 2011 Oh, I should mention I was using a C library to actually load the data
- Dmitry Olshansky (6/6) Apr 19 2011 slackCat should probably use the common type, not the first one passed?
My TempAlloc module's been ready for review for a while, but std.net.isemail and std.parallelism were ahead of it in the review queue. Now that the reviews for both are done (I doubt running a review concurrently with a vote is problematic), I want to put this next in line. I'd like to get it in druntime soon because Don wants to use it in BigInt and I also have some things I want to contribute to Phobos that TempAlloc would be useful for. TempAlloc is a segmented stack for allocating and freeing temporary buffers efficiently. It works by allocating memory in large chunks (currently 4 MB, but this number may be revised) at a time from the C heap, and parceling them out with the standard last in first out stack restriction. The proposal also contains a few small helper functions that I think would be useful, but these can be made private or removed if people don't like them. TempAlloc has the following advantages compared to allocation on the call stack: 1. Pointers to memory allocated on the TempAlloc stack are still valid when the function they were allocated from returns. Functions can be written to create and return data structures on the TempAlloc stack. 2. Since it is a segmented stack, large allocations can be performed with no danger of stack overflow errors. It has the following advantages compared to heap allocation: 1. Both allocation and deallocation are extremely fast. Most allocations consist of verifying enough space is available, incrementing a pointer and a performing a few cheap bookkeeping operations. Most deallocations consist decrementing a pointer and performing a few cheap bookkeeping operations. 2. The segmented stack is thread-local, so synchronization is only needed when a segment needs to be allocated or freed. 3. Fragmentation is not an issue when allocating memory on the TempAlloc stack, though it can be an issue when trying to allocate a new segment. Code: https://github.com/dsimcha/TempAlloc/blob/master/tempalloc.d Docs: http://cis.jhu.edu/~dsimcha/d/phobos/core_tempalloc.html
Apr 19 2011
Nice work. I've tried it out with allocating some large float arrays, quick two test cases were a 94MB and 282MB audio file. File 1 w/o TempAlloc: 410msecs File 1 w/ TempAlloc: 184msecs File 2 w/o TempAlloc 1321 msecs File 2 w/ TempAlloc 567 msecs Btw, how much *can* I allocate at once by using new (not specific to TempAlloc)? I've tried a 500MB file, but allocation fails even though I have plenty of free memory on my system. Is there a limit to using 'new'? Of course, ideally I'd use some kind of buffering mechanism where a portion of a file is loaded into memory, and when that's exhausted I'd load another chunk. But I'd like to know the limits for a single memory allocation on the garbage-collected heap. Is there some internal limit for D? I know I can't allocate more than 3GB in the entire app, but I'm looking at single allocations. Pardon me if this all sounds stupid, I'm a complete newb when it comes to memory allocation mechanics. :)
Apr 19 2011
== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleNice work. I've tried it out with allocating some large float arrays, quick two test cases were a 94MB and 282MB audio file. File 1 w/o TempAlloc: 410msecs File 1 w/ TempAlloc: 184msecs File 2 w/o TempAlloc 1321 msecs File 2 w/ TempAlloc 567 msecs Btw, how much *can* I allocate at once by using new (not specific to TempAlloc)? I've tried a 500MB file, but allocation fails even though I have plenty of free memory on my system. Is there a limit to using 'new'? Of course, ideally I'd use some kind of buffering mechanism where a portion of a file is loaded into memory, and when that's exhausted I'd load another chunk. But I'd like to know the limits for a single memory allocation on the garbage-collected heap. Is there some internal limit for D? I know I can't allocate more than 3GB in the entire app, but I'm looking at single allocations. Pardon me if this all sounds stupid, I'm a complete newb when it comes to memory allocation mechanics. :)Ironically, these benchmarks just show that the C heap is faster than the D GC heap. They are well over the 4 MB mark, where TempAlloc just falls back on the C heap. As far as how much you can allocate on the GC heap, it boils down to how much **contiguous** free space you have in your address space. Since 32-bit address space is only 4 GB (and only 2 GB for user space on Win32), I'm not surprised you can't allocate more than 500 MB.
Apr 19 2011
Ok, perhaps you should put a note in the docs about falling back to the C heap. Or maybe that's just common sense for people in the know-how? I've realized the allocation attempt was actually quite a bit more than 500Mb. The C lib returned the number of channels and frame count, which totalled to a frame count of 291_335_184. Multiply that by the float size (4), and you get: 1_165_340_736, or 1.65 GB. The GC heap can actually allocate even 200_000_000 elements on my system, so that's around 800MB if I'm right.
Apr 19 2011
== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleOk, perhaps you should put a note in the docs about falling back to the C heap. Or maybe that's just common sense for people in the know-how?Right. This is roughly equivalent to allocating a new segment. I might put more details into the docs about the heuristics used, but I'm not sure how much I should document them. They're somewhat complicated, may change, and are implementation details. On the other hand, the whole point of TempAlloc is a performance optimization, and knowing such details is useful for predicting performance. On the third hand, it's open source. If you really want to know **exactly** how it works instead of just how to use it, you can always read the code.
Apr 19 2011
On Tue, 19 Apr 2011 18:34:33 +0400, dsimcha <dsimcha yahoo.com> wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleI think you are wrong about contiguous memory block. I believe virtual address space is divided into blocks (which are, indeed, contiguous) but the address space itself is not. What I mean is that if you try to allocate 1GB of memory, an OS might succeed that even if the physical memory is "fragmented" by other processes. This isn't the case for ALL operating systems, though. I've believe some OSes don't even allocate memory physically until you actually access it (by reading or writing to it), and even then they may only allocate memory for requested pages only. I might be wrong though.Nice work. I've tried it out with allocating some large float arrays, quick two test cases were a 94MB and 282MB audio file. File 1 w/o TempAlloc: 410msecs File 1 w/ TempAlloc: 184msecs File 2 w/o TempAlloc 1321 msecs File 2 w/ TempAlloc 567 msecs Btw, how much *can* I allocate at once by using new (not specific to TempAlloc)? I've tried a 500MB file, but allocation fails even though I have plenty of free memory on my system. Is there a limit to using 'new'? Of course, ideally I'd use some kind of buffering mechanism where a portion of a file is loaded into memory, and when that's exhausted I'd load another chunk. But I'd like to know the limits for a single memory allocation on the garbage-collected heap. Is there some internal limit for D? I know I can't allocate more than 3GB in the entire app, but I'm looking at single allocations. Pardon me if this all sounds stupid, I'm a complete newb when it comes to memory allocation mechanics. :)Ironically, these benchmarks just show that the C heap is faster than the D GC heap. They are well over the 4 MB mark, where TempAlloc just falls back on the C heap. As far as how much you can allocate on the GC heap, it boils down to how much **contiguous** free space you have in your address space. Since 32-bit address space is only 4 GB (and only 2 GB for user space on Win32), I'm not surprised you can't allocate more than 500 MB.
Apr 19 2011
Am 19.04.2011 17:02, schrieb Denis Koroskin:On Tue, 19 Apr 2011 18:34:33 +0400, dsimcha <dsimcha yahoo.com> wrote:That doesn't help if the logical memory of your process is fragmented as well :)== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleI think you are wrong about contiguous memory block. I believe virtual address space is divided into blocks (which are, indeed, contiguous) but the address space itself is not. What I mean is that if you try to allocate 1GB of memory, an OS might succeed that even if the physical memory is "fragmented" by other processes.Nice work. I've tried it out with allocating some large float arrays, quick two test cases were a 94MB and 282MB audio file. File 1 w/o TempAlloc: 410msecs File 1 w/ TempAlloc: 184msecs File 2 w/o TempAlloc 1321 msecs File 2 w/ TempAlloc 567 msecs Btw, how much *can* I allocate at once by using new (not specific to TempAlloc)? I've tried a 500MB file, but allocation fails even though I have plenty of free memory on my system. Is there a limit to using 'new'? Of course, ideally I'd use some kind of buffering mechanism where a portion of a file is loaded into memory, and when that's exhausted I'd load another chunk. But I'd like to know the limits for a single memory allocation on the garbage-collected heap. Is there some internal limit for D? I know I can't allocate more than 3GB in the entire app, but I'm looking at single allocations. Pardon me if this all sounds stupid, I'm a complete newb when it comes to memory allocation mechanics. :)Ironically, these benchmarks just show that the C heap is faster than the D GC heap. They are well over the 4 MB mark, where TempAlloc just falls back on the C heap. As far as how much you can allocate on the GC heap, it boils down to how much **contiguous** free space you have in your address space. Since 32-bit address space is only 4 GB (and only 2 GB for user space on Win32), I'm not surprised you can't allocate more than 500 MB.This isn't the case for ALL operating systems, though. I've believe some OSes don't even allocate memory physically until you actually access it (by reading or writing to it), and even then they may only allocate memory for requested pages only. I might be wrong though.
Apr 19 2011
Oh, I should mention I was using a C library to actually load the data into the buffer, so I didn't use -J imports or anything like that.
Apr 19 2011
slackCat should probably use the common type, not the first one passed? Also it would be cool to have addRange/removeRange somehow automated for types that have indirections. Or it's too much state to carry around? -- Dmitry Olshansky
Apr 19 2011
== Quote from Dmitry Olshansky (dmitry.olsh gmail.com)'s articleslackCat should probably use the common type, not the first one passed?Good point. Will fix.Also it would be cool to have addRange/removeRange somehow automated for types that have indirections. Or it's too much state to carry around?Yeah, I don't want this because a major purpose of TempAlloc is to allow efficient parallelism, and addRange/removeRange involves locks. Besides, I think they use linear search in the current GC implementation rather than something more efficient. Once you're doing stuff like this, the advantages of TempAlloc go away rapidly. Besides, it's perfectly safe (at least now, while the GC isn't moving) to store a reference to GC memory in TempAlloc, as long as it's not the **only** reference. It's also perfectly safe to have indirections from TempAlloc data into other TempAlloc data, as long as the LIFO freeing order isn't violated.
Apr 19 2011
On 4/19/2011 12:08 PM, Dmitry Olshansky wrote:slackCat should probably use the common type, not the first one passed?Fixed. https://github.com/dsimcha/TempAlloc/commit/5e3fd0db9c905d361695d267e460a436c2b26666 https://github.com/dsimcha/TempAlloc/commit/5e74dd20d92c545cb4e3945c81175307c60b1253
Apr 19 2011