digitalmars.D - TempAlloc: Request for Review For Inclusion in core.memory

dsimcha (37/37) Apr 19 2011 My TempAlloc module's been ready for review for a while, but

Andrej Mitrovic (18/18) Apr 19 2011 Nice work. I've tried it out with allocating some large float arrays,

dsimcha (7/25) Apr 19 2011 Ironically, these benchmarks just show that the C heap is faster than th...

Andrej Mitrovic (9/9) Apr 19 2011 Ok, perhaps you should put a note in the docs about falling back to

dsimcha (8/11) Apr 19 2011 Right. This is roughly equivalent to allocating a new segment. I might...

Denis Koroskin (11/41) Apr 19 2011 I think you are wrong about contiguous memory block. I believe virtual

Daniel Gibson (3/50) Apr 19 2011 That doesn't help if the logical memory of your process is fragmented as

Andrej Mitrovic (2/2) Apr 19 2011 Oh, I should mention I was using a C library to actually load the data
Dmitry Olshansky (6/6) Apr 19 2011 slackCat should probably use the common type, not the first one passed?

dsimcha (10/14) Apr 19 2011 Yeah, I don't want this because a major purpose of TempAlloc is to allow...
dsimcha (4/5) Apr 19 2011 Fixed.

dsimcha <dsimcha yahoo.com> writes:

My TempAlloc module's been ready for review for a while, but 
std.net.isemail and std.parallelism were ahead of it in the review 
queue.  Now that the reviews for both are done (I doubt running a review 
concurrently with a vote is problematic), I want to put this next in 
line.  I'd like to get it in druntime soon because Don wants to use it 
in BigInt and I also have some things I want to contribute to Phobos 
that TempAlloc would be useful for.

TempAlloc is a segmented stack for allocating and freeing temporary 
buffers efficiently.  It works by allocating memory in large chunks 
(currently 4 MB, but this number may be revised) at a time from the C 
heap, and parceling them out with the standard last in first out stack 
restriction.  The proposal also contains a few small helper functions 
that I think would be useful, but these can be made private or removed 
if people don't like them.

TempAlloc has the following advantages compared to allocation on the
call stack:

1. Pointers to memory allocated on the TempAlloc stack are still
valid when the function they were allocated from returns.
Functions can be written to create and return data structures on the
TempAlloc stack.

2. Since it is a segmented stack, large allocations can be performed 
with no danger of stack overflow errors.

It has the following advantages compared to heap allocation:

1. Both allocation and deallocation are extremely fast. Most allocations
consist of verifying enough space is available, incrementing a pointer 
and a performing a few cheap bookkeeping operations. Most deallocations
consist decrementing a pointer and performing a few cheap bookkeeping
operations.

2. The segmented stack is thread-local, so synchronization is only 
needed when a segment needs to be allocated or freed.

3. Fragmentation is not an issue when allocating memory on the
TempAlloc stack, though it can be an issue when trying to allocate
a new segment.

Code:

https://github.com/dsimcha/TempAlloc/blob/master/tempalloc.d

Docs:

http://cis.jhu.edu/~dsimcha/d/phobos/core_tempalloc.html

Apr 19 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Nice work. I've tried it out with allocating some large float arrays,
quick two test cases were a 94MB and 282MB audio file.

File 1 w/o TempAlloc: 410msecs
File 1 w/ TempAlloc: 184msecs

File 2 w/o TempAlloc 1321 msecs
File 2 w/ TempAlloc 567 msecs

Btw, how much *can* I allocate at once by using new (not specific to
TempAlloc)? I've tried a 500MB file, but allocation fails even though
I have plenty of free memory on my system. Is there a limit to using
'new'?

Of course, ideally I'd use some kind of buffering mechanism where a
portion of a file is loaded into memory, and when that's exhausted I'd
load another chunk. But I'd like to know the limits for a single
memory allocation on the garbage-collected heap. Is there some
internal limit for D? I know I can't allocate more than 3GB in the
entire app, but I'm looking at single allocations.

Pardon me if this all sounds stupid, I'm a complete newb when it comes
to memory allocation mechanics. :)

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s article
 Nice work. I've tried it out with allocating some large float arrays,
 quick two test cases were a 94MB and 282MB audio file.
 File 1 w/o TempAlloc: 410msecs
 File 1 w/ TempAlloc: 184msecs
 File 2 w/o TempAlloc 1321 msecs
 File 2 w/ TempAlloc 567 msecs
 Btw, how much *can* I allocate at once by using new (not specific to
 TempAlloc)? I've tried a 500MB file, but allocation fails even though
 I have plenty of free memory on my system. Is there a limit to using
 'new'?
 Of course, ideally I'd use some kind of buffering mechanism where a
 portion of a file is loaded into memory, and when that's exhausted I'd
 load another chunk. But I'd like to know the limits for a single
 memory allocation on the garbage-collected heap. Is there some
 internal limit for D? I know I can't allocate more than 3GB in the
 entire app, but I'm looking at single allocations.
 Pardon me if this all sounds stupid, I'm a complete newb when it comes
 to memory allocation mechanics. :)

Ironically, these benchmarks just show that the C heap is faster than the D GC
heap.  They are well over the 4 MB mark, where TempAlloc just falls back on the
C
heap.  As far as how much you can allocate on the GC heap, it boils down to how
much **contiguous** free space you have in your address space.  Since 32-bit
address space is only 4 GB (and only 2 GB for user space on Win32), I'm not
surprised you can't allocate more than 500 MB.

Apr 19 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Ok, perhaps you should put a note in the docs about falling back to
the C heap. Or maybe that's just common sense for people in the
know-how?

I've realized the allocation attempt was actually quite a bit more
than 500Mb. The C lib returned the number of channels and frame count,
which totalled to a frame count of 291_335_184. Multiply that by the
float size (4), and you get: 1_165_340_736, or 1.65 GB.

The GC heap can actually allocate even 200_000_000 elements on my
system, so that's around 800MB if I'm right.

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s article
 Ok, perhaps you should put a note in the docs about falling back to
 the C heap. Or maybe that's just common sense for people in the
 know-how?

Right.  This is roughly equivalent to allocating a new segment.  I might put
more
details into the docs about the heuristics used, but I'm not sure how much I
should document them.  They're somewhat complicated, may change, and are
implementation details.  On the other hand, the whole point of TempAlloc is a
performance optimization, and knowing such details is useful for predicting
performance.  On the third hand, it's open source.  If you really want to know
**exactly** how it works instead of just how to use it, you can always read the
code.

Apr 19 2011

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 19 Apr 2011 18:34:33 +0400, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s article
 Nice work. I've tried it out with allocating some large float arrays,
 quick two test cases were a 94MB and 282MB audio file.
 File 1 w/o TempAlloc: 410msecs
 File 1 w/ TempAlloc: 184msecs
 File 2 w/o TempAlloc 1321 msecs
 File 2 w/ TempAlloc 567 msecs
 Btw, how much *can* I allocate at once by using new (not specific to
 TempAlloc)? I've tried a 500MB file, but allocation fails even though
 I have plenty of free memory on my system. Is there a limit to using
 'new'?
 Of course, ideally I'd use some kind of buffering mechanism where a
 portion of a file is loaded into memory, and when that's exhausted I'd
 load another chunk. But I'd like to know the limits for a single
 memory allocation on the garbage-collected heap. Is there some
 internal limit for D? I know I can't allocate more than 3GB in the
 entire app, but I'm looking at single allocations.
 Pardon me if this all sounds stupid, I'm a complete newb when it comes
 to memory allocation mechanics. :)

 Ironically, these benchmarks just show that the C heap is faster than  
 the D GC
 heap.  They are well over the 4 MB mark, where TempAlloc just falls back  
 on the C
 heap.  As far as how much you can allocate on the GC heap, it boils down  
 to how
 much **contiguous** free space you have in your address space.  Since  
 32-bit
 address space is only 4 GB (and only 2 GB for user space on Win32), I'm  
 not
 surprised you can't allocate more than 500 MB.

I think you are wrong about contiguous memory block. I believe virtual  
address space is divided into blocks (which are, indeed, contiguous) but  
the address space itself is not. What I mean is that if you try to  
allocate 1GB of memory, an OS might succeed that even if the physical  
memory is "fragmented" by other processes.

This isn't the case for ALL operating systems, though.

I've believe some OSes don't even allocate memory physically until you  
actually access it (by reading or writing to it), and even then they may  
only allocate memory for requested pages only.

I might be wrong though.

Apr 19 2011

Daniel Gibson <metalcaedes gmail.com> writes:

Am 19.04.2011 17:02, schrieb Denis Koroskin:
 On Tue, 19 Apr 2011 18:34:33 +0400, dsimcha <dsimcha yahoo.com> wrote:
 
 == Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s article
 Nice work. I've tried it out with allocating some large float arrays,
 quick two test cases were a 94MB and 282MB audio file.
 File 1 w/o TempAlloc: 410msecs
 File 1 w/ TempAlloc: 184msecs
 File 2 w/o TempAlloc 1321 msecs
 File 2 w/ TempAlloc 567 msecs
 Btw, how much *can* I allocate at once by using new (not specific to
 TempAlloc)? I've tried a 500MB file, but allocation fails even though
 I have plenty of free memory on my system. Is there a limit to using
 'new'?
 Of course, ideally I'd use some kind of buffering mechanism where a
 portion of a file is loaded into memory, and when that's exhausted I'd
 load another chunk. But I'd like to know the limits for a single
 memory allocation on the garbage-collected heap. Is there some
 internal limit for D? I know I can't allocate more than 3GB in the
 entire app, but I'm looking at single allocations.
 Pardon me if this all sounds stupid, I'm a complete newb when it comes
 to memory allocation mechanics. :)

 Ironically, these benchmarks just show that the C heap is faster than
 the D GC
 heap.  They are well over the 4 MB mark, where TempAlloc just falls
 back on the C
 heap.  As far as how much you can allocate on the GC heap, it boils
 down to how
 much **contiguous** free space you have in your address space.  Since
 32-bit
 address space is only 4 GB (and only 2 GB for user space on Win32),
 I'm not
 surprised you can't allocate more than 500 MB.

 
 I think you are wrong about contiguous memory block. I believe virtual
 address space is divided into blocks (which are, indeed, contiguous) but
 the address space itself is not. What I mean is that if you try to
 allocate 1GB of memory, an OS might succeed that even if the physical
 memory is "fragmented" by other processes.
 

That doesn't help if the logical memory of your process is fragmented as
well :)

 This isn't the case for ALL operating systems, though.
 
 I've believe some OSes don't even allocate memory physically until you
 actually access it (by reading or writing to it), and even then they may
 only allocate memory for requested pages only.
 
 I might be wrong though.

Apr 19 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Oh, I should mention I was using a C library to actually load the data
into the buffer, so I didn't use -J imports or anything like that.

Apr 19 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

slackCat should probably use the common type, not the first one passed?
Also it would be cool to have addRange/removeRange somehow automated for 
types that have indirections.
Or it's too much state to carry around?

-- 
Dmitry Olshansky

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from Dmitry Olshansky (dmitry.olsh gmail.com)'s article
 slackCat should probably use the common type, not the first one passed?

Good point.  Will fix.

 Also it would be cool to have addRange/removeRange somehow automated for
 types that have indirections.
 Or it's too much state to carry around?

Yeah, I don't want this because a major purpose of TempAlloc is to allow
efficient
parallelism, and addRange/removeRange involves locks.  Besides, I think they use
linear search in the current GC implementation rather than something more
efficient.  Once you're doing stuff like this, the advantages of TempAlloc go
away
rapidly.  Besides, it's perfectly safe (at least now, while the GC isn't moving)
to store a reference to GC memory in TempAlloc, as long as it's not the **only**
reference.  It's also perfectly safe to have indirections from TempAlloc data
into
other TempAlloc data, as long as the LIFO freeing order isn't violated.

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

On 4/19/2011 12:08 PM, Dmitry Olshansky wrote:
 slackCat should probably use the common type, not the first one passed?

Fixed.

https://github.com/dsimcha/TempAlloc/commit/5e3fd0db9c905d361695d267e460a436c2b26666

https://github.com/dsimcha/TempAlloc/commit/5e74dd20d92c545cb4e3945c81175307c60b1253

Apr 19 2011

D Programming

C/C++ Programming

Other

digitalmars.D - TempAlloc: Request for Review For Inclusion in core.memory