digitalmars.D - How to save RAM in D programs (on zero initialized buffers): Reloaded

Marco Leise (61/61) Feb 07 2012 Hi, this is me again with some "size matters" topic. This time, it's not

Nick Sabalausky (3/3) Feb 07 2012 Is void initialization not good enough?

Marco Leise (9/12) Feb 07 2012 That gives me a) no buffer, who's pointer is b) not initialized to null.

Michel Fortin (10/28) Feb 07 2012 What would be nice is a GC that would just track system-allocated

Martin Nowak (10/31) Feb 07 2012 A similar idea popped up to solve non-deterministic shared library
Jacob Carlborg (4/28) Feb 07 2012 What about GC.addRoot ?

Michel Fortin (8/38) Feb 08 2012 This is perfect if you want the GC to scan a memory range for pointers.

Martin Nowak (3/18) Feb 07 2012 You usually don't need initialized memory from the GC.

Kapps (5/8) Feb 07 2012 This example would be uninitializedArray!(ubyte[])(1024 * 1024).

Iain Buclaw (16/77) Feb 07 2012 What about these functions?

Marco Leise (6/18) Feb 07 2012 Yeah I know these and used them, only to find out the program I am porti...

F i L (3/3) Feb 07 2012 Can't you just write a custom allocator using calloc for

Timon Gehr (3/6) Feb 07 2012 The solution with the best performance and least memory requirements
Artur Skawina (3/4) Feb 07 2012 That won't help - the compiler will defeat the optimization by "initiali...

F i L (5/9) Feb 07 2012 I see.

F i L (2/4) Feb 07 2012 Note-to-self: google before asking stupid questions...

Manfred Nowak (14/15) Feb 07 2012 The tests only cover a very small fraction of an unknown data

Marco Leise (11/26) Feb 07 2012 I meant caveats with calloc vs. other allocators. I don't want to discus...

Manfred Nowak (21/25) Feb 08 2012 [...]

Marco Leise (25/50) Feb 08 2012 That sounds a bit vague. If I understand you correctly you would impleme...

Manfred Nowak (4/5) Feb 08 2012 Andrei has written a paper on allocation:

Marco Leise (18/23) Feb 09 2012 Oh ok, that's farther than I ever digged into memory management. My

Kagamin (3/3) Feb 09 2012 I guess, calloc will reuse blocks too, so if you run the

Marco Leise (11/14) Feb 09 2012 You don't understand how it works. calloc gives you exactly 0 KB of

Jose Armando Garcia (7/69) Feb 07 2012 Special? What do you mean by special? Most OS use Virtual Memory so

Marco Leise (9/14) Feb 07 2012 Yes, special. Like, say, you fork a process on Unix. Instead of copying ...

"Marco Leise" <Marco.Leise gmx.de> writes:

Hi, this is me again with some "size matters" topic. This time, it's not
the executable size, no! Instead I want to discuss a runtime memory
footprint and speed issue that affects everyone, and how to improve the
situation dramatically.

In D we allocate memory through the GC, that is initialized according to
the type's .init, which gives us a save default. In most cases this will
result in the memory block being zeroed out, like in the case of
allocating ubyte[] buffers. Let's assume, we have a program that
allocates some buffers in advance, that it may not use fully. This often  
happens
when the input data is much smaller than the anticipated case. So our
memory manager should handle this situation well:
    o  zero out a memory block
    o  we probably don't need all of it

So here is a small benchmark that allocates 512 * 1 MB, first using the
typical method: new ubyte[1024 * 1024]. The oputput is:

	** new ubyte[1024 * 1024]
	   ressource usage: +526840 KB
	   user time: +0.098s | sys. time: +0.368s

As expected we have a physical memory usage increase of ~512 MB and spent
a considerable amount of time in the system to find free memory blocks
and in our program to initialize the data to zero. Can we use the GC more
directly? Let's try GC.calloc:

	** GC.calloc(1024 * 1024)
	   ressource usage: +525104 KB
	   user time: +0.089s | sys. time: +0.370s

Again, 512 MB and about the same time. Nothing gained, but my RAM is
starting to fill up. By the way, how does a good old system call to
'malloc' compare? That gives us a block of garbage 'initialized' data - a
situation we left behind for good in D! So here we go with another test:

	** malloc(1024 * 1024)
	   ressource usage: +2048 KB
	   user time: +0.000s | sys. time: +0.002s

Oh nice! May I say... these 512 calls were for free? 2 MB and 0.002
seconds ain't worth talking about. The operating system didn't actually
allocate the memory, it merely gave us a virtual memory range to use.
Only when we write to the memory will physical memory be bound. That's  
perfect
for a generously sized buffer, right? Well... we still want it zeroed
out, so let's initialize this data to zero with ptr[0 .. 1024 * 1024] = 0:

	** malloc(1024 * 1024) + zero out
	   ressource usage: +526336 KB
	   user time: +0.053s | sys. time: +0.366s

... and we are back at square one. With the exception, that the user time
is notably lower. What we need is a facility that gives us lazily
allocated zeroed out memory. And guess what, it's not too much to ask
for. Here is 'calloc' to the rescue:

	** calloc(1, 1024 * 1024)
	   ressource usage: +2048 KB
	   user time: +0.001s | sys. time: +0.001s

How does it work? The operating system fakes the memory allocation and
just gives us 131072 references to a special read-only memory page of
zeroes. The semantic is copy-on-write. So we start with a view on zeroed
out memory and get the real thing once we write into it. (Sorry, if I
tell
some of you nothing new, but I just found this out today ;) )
The question I have is, should we go and improve druntime with that
knowledge? I'm not aware of any caveats, are there any?
Thanks for reading and the test program for Linux is in the attachment (I
used GDC to compile).

-- Marco

Feb 07 2012

"Nick Sabalausky" <a a.a> writes:

Is void initialization not good enough?

IIRC it's something like:

ubyte[] buf = void;

Feb 07 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 07.02.2012, 21:11 Uhr, schrieb Nick Sabalausky <a a.a>:

 Is void initialization not good enough?

 IIRC it's something like:

 ubyte[] buf = void;

That gives me a) no buffer, who's pointer is b) not initialized to null.
I want instead a defined pointer, to a valid array, that is initialized to  
zero.

Anyway, I think the flaw in my proposal is the use of a GC. Since we don't  
get the memory directly from the operating system, but from a memory pool  
in the GC, it is generally 'recycled' and already used memory. It has to  
be zeroed out manually, unless there was a way to tell the OS to rebind  
some virtual memory addresses in our program to this magic 'zero page'.

Feb 07 2012

Michel Fortin <michel.fortin michelf.com> writes:

On 2012-02-07 20:24:40 +0000, "Marco Leise" <Marco.Leise gmx.de> said:

 Am 07.02.2012, 21:11 Uhr, schrieb Nick Sabalausky <a a.a>:
 
 Is void initialization not good enough?
 
 IIRC it's something like:
 
 ubyte[] buf = void;

 
 That gives me a) no buffer, who's pointer is b) not initialized to null.
 I want instead a defined pointer, to a valid array, that is initialized 
 to  zero.
 
 Anyway, I think the flaw in my proposal is the use of a GC. Since we 
 don't  get the memory directly from the operating system, but from a 
 memory pool  in the GC, it is generally 'recycled' and already used 
 memory. It has to  be zeroed out manually, unless there was a way to 
 tell the OS to rebind  some virtual memory addresses in our program to 
 this magic 'zero page'.

What would be nice is a GC that would just track system-allocated 
memory blocks. What would be really great is if you could allocate a 
block with malloc/calloc (or something else) and later pass ownership 
to the GC, so that the GC calls free (or something else) when all 
references disappear. But I'm not sure how you can do that efficiently.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 07 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

On Tue, 07 Feb 2012 21:37:03 +0100, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2012-02-07 20:24:40 +0000, "Marco Leise" <Marco.Leise gmx.de> said:

 Am 07.02.2012, 21:11 Uhr, schrieb Nick Sabalausky <a a.a>:

 Is void initialization not good enough?
  IIRC it's something like:
  ubyte[] buf = void;

  That gives me a) no buffer, who's pointer is b) not initialized to  
 null.
 I want instead a defined pointer, to a valid array, that is initialized  
 to  zero.
  Anyway, I think the flaw in my proposal is the use of a GC. Since we  
 don't  get the memory directly from the operating system, but from a  
 memory pool  in the GC, it is generally 'recycled' and already used  
 memory. It has to  be zeroed out manually, unless there was a way to  
 tell the OS to rebind  some virtual memory addresses in our program to  
 this magic 'zero page'.

 What would be nice is a GC that would just track system-allocated memory  
 blocks. What would be really great is if you could allocate a block with  
 malloc/calloc (or something else) and later pass ownership to the GC, so  
 that the GC calls free (or something else) when all references  
 disappear. But I'm not sure how you can do that efficiently.

A similar idea popped up to solve non-deterministic shared library  
unloading
and/or mapped memory freeing.
I'm don't think though that putting additional pressure on the GC is the  
right
approach to resource management.

The interface would be simple for non-overlapping ranges.
void GC.trackRange(void* p, size_t sz, void delegate() finalizer)

Feb 07 2012

Jacob Carlborg <doob me.com> writes:

On 2012-02-07 21:37, Michel Fortin wrote:
 On 2012-02-07 20:24:40 +0000, "Marco Leise" <Marco.Leise gmx.de> said:

 Am 07.02.2012, 21:11 Uhr, schrieb Nick Sabalausky <a a.a>:

 Is void initialization not good enough?

 IIRC it's something like:

 ubyte[] buf = void;

 That gives me a) no buffer, who's pointer is b) not initialized to null.
 I want instead a defined pointer, to a valid array, that is
 initialized to zero.

 Anyway, I think the flaw in my proposal is the use of a GC. Since we
 don't get the memory directly from the operating system, but from a
 memory pool in the GC, it is generally 'recycled' and already used
 memory. It has to be zeroed out manually, unless there was a way to
 tell the OS to rebind some virtual memory addresses in our program to
 this magic 'zero page'.

 What would be nice is a GC that would just track system-allocated memory
 blocks. What would be really great is if you could allocate a block with
 malloc/calloc (or something else) and later pass ownership to the GC, so
 that the GC calls free (or something else) when all references
 disappear. But I'm not sure how you can do that efficiently.

What about GC.addRoot ?

-- 
/Jacob Carlborg

Feb 07 2012

Michel Fortin <michel.fortin michelf.com> writes:

On 2012-02-08 07:30:29 +0000, Jacob Carlborg <doob me.com> said:

 On 2012-02-07 21:37, Michel Fortin wrote:
 On 2012-02-07 20:24:40 +0000, "Marco Leise" <Marco.Leise gmx.de> said:
 
 Am 07.02.2012, 21:11 Uhr, schrieb Nick Sabalausky <a a.a>:
 
 Is void initialization not good enough?
 
 IIRC it's something like:
 
 ubyte[] buf = void;

 
 That gives me a) no buffer, who's pointer is b) not initialized to null.
 I want instead a defined pointer, to a valid array, that is
 initialized to zero.
 
 Anyway, I think the flaw in my proposal is the use of a GC. Since we
 don't get the memory directly from the operating system, but from a
 memory pool in the GC, it is generally 'recycled' and already used
 memory. It has to be zeroed out manually, unless there was a way to
 tell the OS to rebind some virtual memory addresses in our program to
 this magic 'zero page'.

 
 What would be nice is a GC that would just track system-allocated memory
 blocks. What would be really great is if you could allocate a block with
 malloc/calloc (or something else) and later pass ownership to the GC, so
 that the GC calls free (or something else) when all references
 disappear. But I'm not sure how you can do that efficiently.
 

 
 What about GC.addRoot ?

This is perfect if you want the GC to scan a memory range for pointers. 
But the GC can't track pointers pointing to the given memory range and 
notify you when that range is no longer referenced.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 08 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

On Tue, 07 Feb 2012 21:24:40 +0100, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 07.02.2012, 21:11 Uhr, schrieb Nick Sabalausky <a a.a>:

 Is void initialization not good enough?

 IIRC it's something like:

 ubyte[] buf = void;

 That gives me a) no buffer, who's pointer is b) not initialized to null.
 I want instead a defined pointer, to a valid array, that is initialized  
 to zero.

 Anyway, I think the flaw in my proposal is the use of a GC. Since we  
 don't get the memory directly from the operating system, but from a  
 memory pool in the GC, it is generally 'recycled' and already used  
 memory. It has to be zeroed out manually, unless there was a way to tell  
 the OS to rebind some virtual memory addresses in our program to this  
 magic 'zero page'.

You usually don't need initialized memory from the GC.
The calloc behavior is still pretty interesting.

Feb 07 2012

Kapps <Kapps NotValidEmail.com> writes:

On 07/02/2012 2:11 PM, Nick Sabalausky wrote:
 Is void initialization not good enough?

 IIRC it's something like:

 ubyte[] buf = void;

This example would be uninitializedArray!(ubyte[])(1024 * 1024).
I would guess that it gives significantly better performance. There's 
also minimallyInitializerArray!(ubyte[])(1024 * 1024) that just 
initializes the bare data needed to be safe. They're both in std.array.

Feb 07 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 7 February 2012 19:39, Marco Leise <Marco.Leise gmx.de> wrote:
 Hi, this is me again with some "size matters" topic. This time, it's not
 the executable size, no! Instead I want to discuss a runtime memory
 footprint and speed issue that affects everyone, and how to improve the
 situation dramatically.

 In D we allocate memory through the GC, that is initialized according to
 the type's .init, which gives us a save default. In most cases this will
 result in the memory block being zeroed out, like in the case of
 allocating ubyte[] buffers. Let's assume, we have a program that
 allocates some buffers in advance, that it may not use fully. This often
 happens
 when the input data is much smaller than the anticipated case. So our
 memory manager should handle this situation well:
 =A0 o =A0zero out a memory block
 =A0 o =A0we probably don't need all of it

 So here is a small benchmark that allocates 512 * 1 MB, first using the
 typical method: new ubyte[1024 * 1024]. The oputput is:

 =A0 =A0 =A0 =A0** new ubyte[1024 * 1024]
 =A0 =A0 =A0 =A0 =A0 ressource usage: +526840 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.098s | sys. time: +0.368s

 As expected we have a physical memory usage increase of ~512 MB and spent
 a considerable amount of time in the system to find free memory blocks
 and in our program to initialize the data to zero. Can we use the GC more
 directly? Let's try GC.calloc:

 =A0 =A0 =A0 =A0** GC.calloc(1024 * 1024)
 =A0 =A0 =A0 =A0 =A0 ressource usage: +525104 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.089s | sys. time: +0.370s

 Again, 512 MB and about the same time. Nothing gained, but my RAM is
 starting to fill up. By the way, how does a good old system call to
 'malloc' compare? That gives us a block of garbage 'initialized' data - a
 situation we left behind for good in D! So here we go with another test:

 =A0 =A0 =A0 =A0** malloc(1024 * 1024)
 =A0 =A0 =A0 =A0 =A0 ressource usage: +2048 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.000s | sys. time: +0.002s

 Oh nice! May I say... these 512 calls were for free? 2 MB and 0.002
 seconds ain't worth talking about. The operating system didn't actually
 allocate the memory, it merely gave us a virtual memory range to use.
 Only when we write to the memory will physical memory be bound. That's
 perfect
 for a generously sized buffer, right? Well... we still want it zeroed
 out, so let's initialize this data to zero with ptr[0 .. 1024 * 1024] =3D=

 0:
 =A0 =A0 =A0 =A0** malloc(1024 * 1024) + zero out
 =A0 =A0 =A0 =A0 =A0 ressource usage: +526336 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.053s | sys. time: +0.366s

 ... and we are back at square one. With the exception, that the user time
 is notably lower. What we need is a facility that gives us lazily
 allocated zeroed out memory. And guess what, it's not too much to ask
 for. Here is 'calloc' to the rescue:

 =A0 =A0 =A0 =A0** calloc(1, 1024 * 1024)
 =A0 =A0 =A0 =A0 =A0 ressource usage: +2048 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.001s | sys. time: +0.001s

 How does it work? The operating system fakes the memory allocation and
 just gives us 131072 references to a special read-only memory page of
 zeroes. The semantic is copy-on-write. So we start with a view on zeroed
 out memory and get the real thing once we write into it. (Sorry, if I
 tell
 some of you nothing new, but I just found this out today ;) )
 The question I have is, should we go and improve druntime with that
 knowledge? I'm not aware of any caveats, are there any?
 Thanks for reading and the test program for Linux is in the attachment (I
 used GDC to compile).

 -- Marco

What about these functions?


import std.array;

byte[][ALLOCS] a, b;

writeln("** uninitializedArray!(ubyte[])(1024*1024)");
foreach(i; 0 .. ALLOCS) b[i] =3D uninitializedArray!(ubyte[])(1024 * 1024);
prev =3D print_ressources(&prev);

writeln("** minimallyInitializedArray!(ubyte[])(1024*1024)");
foreach(i; 0 .. ALLOCS) c[i] =3D minimallyInitializedArray!(ubyte[])(1024 *=
 1024);
prev =3D print_ressources(&prev);



Regards

--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Feb 07 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 07.02.2012, 22:15 Uhr, schrieb Iain Buclaw <ibuclaw ubuntu.com>:

   o  zero out a memory block     <---------- !!!


 What about these functions?


 import std.array;

 byte[][ALLOCS] a, b;

 writeln("** uninitializedArray!(ubyte[])(1024*1024)");
 foreach(i; 0 .. ALLOCS) b[i] = uninitializedArray!(ubyte[])(1024 * 1024);
 prev = print_ressources(&prev);

 writeln("** minimallyInitializedArray!(ubyte[])(1024*1024)");
 foreach(i; 0 .. ALLOCS) c[i] = minimallyInitializedArray!(ubyte[])(1024  
 * 1024);
 prev = print_ressources(&prev);



 Regards

Yeah I know these and used them, only to find out the program I am porting  
relies on the data beeing zeroed out (it is a certain type of compression  
utility). It's a shame to see the C version get away with a low memory  
footprint _and_ zero initialized buffers. I'll just copy the 'Array' class  
 from the original then, that uses calloc and free.

Feb 07 2012

"F i L" <witte2008 gmail.com> writes:

Can't you just write a custom allocator using calloc for 
performance critical structures 
(http://dlang.org/memory.html#newdelete), or do what Iain said.

Feb 07 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 02/08/2012 12:01 AM, F i L wrote:
 Can't you just write a custom allocator using calloc for performance
 critical structures (http://dlang.org/memory.html#newdelete), or do what
 Iain said.


The solution with the best performance and least memory requirements 
obviously must be the default.

Feb 07 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 02/08/12 00:01, F i L wrote:
 Can't you just write a custom allocator using calloc for performance critical
structures (http://dlang.org/memory.html#newdelete), or do what Iain said.

That won't help - the compiler will defeat the optimization by "initializing"
the area...


artur

Feb 07 2012

"F i L" <witte2008 gmail.com> writes:

Artur Skawina wrote:
 That won't help - the compiler will defeat the optimization by 
 "initializing" the area...

I see.


Timon Gehr wrote:
 The solution with the best performance and least memory 
 requirements obviously must be the default.

No argument here. Only, if calloc is an all-around better 
allocation method than malloc, why is malloc even used?

Feb 07 2012

"F i L" <witte2008 gmail.com> writes:

F i L wrote:
 Only, if calloc is an all-around better allocation method than 
 malloc, why is malloc even used?

Note-to-self: google before asking stupid questions...

Feb 07 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Marco Leise wrote:

 I'm not aware of any caveats, are there any?

The tests only cover a very small fraction of an unknown data 
structure: the allocation phase.

Of course one can want to make a bad design running faster. Especially 
if one need to allocate 0.5 TB main memory and can allocate this in 1 
second instead of 98 seconds.

But neglecting

1) the time for building a usable data structure of 0.5 TB main memory,
2) the time for quering this data structure for some element,
3) the time for inserting some element into this data structure,
4) the time for deleting some element from this data structure and
5) the amortized times of the actions in 3) and 4)

is at least not in accordance with engineering principles.

-manfred

Feb 07 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 08.02.2012, 04:37 Uhr, schrieb Manfred Nowak <svv1999 hotmail.com>:

 Marco Leise wrote:

 I'm not aware of any caveats, are there any?

 The tests only cover a very small fraction of an unknown data
 structure: the allocation phase.

 Of course one can want to make a bad design running faster. Especially
 if one need to allocate 0.5 TB main memory and can allocate this in 1
 second instead of 98 seconds.

 But neglecting

 1) the time for building a usable data structure of 0.5 TB main memory,
 2) the time for quering this data structure for some element,
 3) the time for inserting some element into this data structure,
 4) the time for deleting some element from this data structure and
 5) the amortized times of the actions in 3) and 4)

 is at least not in accordance with engineering principles.

 -manfred

I meant caveats with calloc vs. other allocators. I don't want to discuss  
the design of compression algorithms, that allocate a fixed sized buffer  
in advance. I am porting one of those to D and of the many data structures  
it allocates with calloc, there are some that may not be fully filled when  
the program exits. Yet, the user notices a start-up delay in the D version  
for the memory allocation and initialization as well as an instant jump to  
the maximum memory use. The C++ version starts instantly and maps memory  
pages on demand.
The algorithm relies on some of the buffers being zeroed out, so there is  
really no alternative to calloc.

Feb 07 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Marco Leise wrote:

 In D we allocate memory through the GC

[...]
 Let's assume, we have a program that allocates some buffers in
 advance, that it may not use fully.

[...]
 there is really no alternative to calloc.

1)
calloc implements a strategie for allocating memory. If this strategy 
is usefull for parts of a program, then the GC should be informed 
from the code and be capable to adapt its behavior to the requested 
strategy.

This seems necessary because there is not only the null pattern to 
initialize allocated memory. 

2)
According to your OP the request for allocating memory can be 
disjoint partitioned into three:
a) memory that is guaranteed to be used
b) memory that is guaranteed not to be used and
c) memory that might be used.

For the case that a + b exceeds available memory, should the coder 
predeclare a strategy, i.e. should the GC signal out of memory on 
request of the memory or should the GC wait until that sum reaches 
some deadline?

-manfred

Feb 08 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 08.02.2012, 14:30 Uhr, schrieb Manfred Nowak <svv1999 hotmail.com>:

 Marco Leise wrote:

 In D we allocate memory through the GC

 [...]
 Let's assume, we have a program that allocates some buffers in
 advance, that it may not use fully.

 [...]
 there is really no alternative to calloc.

 1)
 calloc implements a strategie for allocating memory. If this strategy
 is usefull for parts of a program, then the GC should be informed
 from the code and be capable to adapt its behavior to the requested
 strategy.

That sounds a bit vague. If I understand you correctly you would impleme=
nt  =

this as a hint flag, like

	GC.allocHint =3D AllocationHint.lazyZero;
	auto arr =3D new ubyte[=E2=80=A6];

 This seems necessary because there is not only the null pattern to
 initialize allocated memory.

 2)
 According to your OP the request for allocating memory can be
 disjoint partitioned into three:
 a) memory that is guaranteed to be used
 b) memory that is guaranteed not to be used and
 c) memory that might be used.

 For the case that a + b exceeds available memory, should the coder
 predeclare a strategy, i.e. should the GC signal out of memory on
 request of the memory or should the GC wait until that sum reaches
 some deadline?

 -manfred

Here I can't follow you. The request to allocate memory contains memory =
 =

regions that are guaranteed not to be used? Why would I request them the=
n?  =

Also, what is your definition of available memory? Available RAM, RAM + =
 =

swap or available virtual memory going beyond what the operating system =
 =

can commit and resulting in random killing of applications  =

(http://linux-mm.org/OOM_Killer)
I don't see where the GC enters the picture, since all this 'may use  =

memory, but don't commit on it' is handled solely by the OS. I could onl=
y  =

imagine a function in the GC that always allocates a new chunnk of memor=
y  =

and does that through calloc - or the hammer method - all allocations us=
e  =

calloc and some of the manual memset(..., 0, ...) is removed.

-- Marco

Feb 08 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Marco Leise wrote:

 That sounds a bit vague.

Andrei has written a paper on allocation:
  http://erdani.com/publications/cuj-2005-12.pdf

-manfred

Feb 08 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 09.02.2012, 03:56 Uhr, schrieb Manfred Nowak <svv1999 hotmail.com>:

 Marco Leise wrote:

 That sounds a bit vague.

 Andrei has written a paper on allocation:
   http://erdani.com/publications/cuj-2005-12.pdf

 -manfred

Oh ok, that's farther than I ever digged into memory management. My  
current problem with the compression utility has been solved at a small  
scale with a manual memory management 'ZeroInitializedAndAlignedBuffer'  
struct that uses calloc - right in the spirit of the original source code.  
The startup time was reduced by ~0.7 seconds and is now relaitvely close  
to the original as well, at least there is no longer that perceived delay.  
Techniques like calloc make it possible to use the algorithm in other  
places like batch processing, compressed FUSE file systems on Linux or  
libraries, where the new instances may be spawned in quick succession.

All technical details aside, I would wish for some solution to "I spawn a  
new instance of some huge complex data structure, but I probably wont need  
all of it, so use calloc". I don't know what the situation with other D  
programs is, but would be interested in some larger scale experiments with  
calloc and app startup time. It may be worse in some cases, but what if it  
improved the general situation - without writing a memory management  
library? I don't know the GC internals enough to tell if calloc is already  
used somewhere (in place of malloc and memset).

Feb 09 2012

"Kagamin" <spam here.lot> writes:

I guess, calloc will reuse blocks too, so if you run the 
compressing function twice, it will reuse the memory block used 
and freed previously and zero it out honestly.

Feb 09 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 09.02.2012, 11:55 Uhr, schrieb Kagamin <spam here.lot>:

 I guess, calloc will reuse blocks too, so if you run the compressing  
 function twice, it will reuse the memory block used and freed previously  
 and zero it out honestly.

You don't understand how it works. calloc gives you exactly 0 KB of  
memory. There is nothing to zero out :)
There is on page, a block of 4096 bytes somewhere in the kernel, that is  
all zeroes and read-only. If you allocate memory you get a bunch of  
references to it. Or in other words, a zillion views on the same 4096  
bytes repeating over and over. Only once you need to write to it, will  
happen what you say. The zero page is 'copied' into some (probably  
previously freed) page. This is equivalent to zeroing out the target page.
The main difference here is that the zeroing out happens with a 'lazy'  
keyword attached to it.

Feb 09 2012

Jose Armando Garcia <jsancio gmail.com> writes:

On Tue, Feb 7, 2012 at 5:39 PM, Marco Leise <Marco.Leise gmx.de> wrote:
 Hi, this is me again with some "size matters" topic. This time, it's not
 the executable size, no! Instead I want to discuss a runtime memory
 footprint and speed issue that affects everyone, and how to improve the
 situation dramatically.

 In D we allocate memory through the GC, that is initialized according to
 the type's .init, which gives us a save default. In most cases this will
 result in the memory block being zeroed out, like in the case of
 allocating ubyte[] buffers. Let's assume, we have a program that
 allocates some buffers in advance, that it may not use fully. This often
 happens
 when the input data is much smaller than the anticipated case. So our
 memory manager should handle this situation well:
 =A0 o =A0zero out a memory block
 =A0 o =A0we probably don't need all of it

 So here is a small benchmark that allocates 512 * 1 MB, first using the
 typical method: new ubyte[1024 * 1024]. The oputput is:

 =A0 =A0 =A0 =A0** new ubyte[1024 * 1024]
 =A0 =A0 =A0 =A0 =A0 ressource usage: +526840 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.098s | sys. time: +0.368s

 As expected we have a physical memory usage increase of ~512 MB and spent
 a considerable amount of time in the system to find free memory blocks
 and in our program to initialize the data to zero. Can we use the GC more
 directly? Let's try GC.calloc:

 =A0 =A0 =A0 =A0** GC.calloc(1024 * 1024)
 =A0 =A0 =A0 =A0 =A0 ressource usage: +525104 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.089s | sys. time: +0.370s

 Again, 512 MB and about the same time. Nothing gained, but my RAM is
 starting to fill up. By the way, how does a good old system call to
 'malloc' compare? That gives us a block of garbage 'initialized' data - a
 situation we left behind for good in D! So here we go with another test:

 =A0 =A0 =A0 =A0** malloc(1024 * 1024)
 =A0 =A0 =A0 =A0 =A0 ressource usage: +2048 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.000s | sys. time: +0.002s

 Oh nice! May I say... these 512 calls were for free? 2 MB and 0.002
 seconds ain't worth talking about. The operating system didn't actually
 allocate the memory, it merely gave us a virtual memory range to use.
 Only when we write to the memory will physical memory be bound. That's
 perfect
 for a generously sized buffer, right? Well... we still want it zeroed
 out, so let's initialize this data to zero with ptr[0 .. 1024 * 1024] =3D=

 0:
 =A0 =A0 =A0 =A0** malloc(1024 * 1024) + zero out
 =A0 =A0 =A0 =A0 =A0 ressource usage: +526336 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.053s | sys. time: +0.366s

 ... and we are back at square one. With the exception, that the user time
 is notably lower. What we need is a facility that gives us lazily
 allocated zeroed out memory. And guess what, it's not too much to ask
 for. Here is 'calloc' to the rescue:

 =A0 =A0 =A0 =A0** calloc(1, 1024 * 1024)
 =A0 =A0 =A0 =A0 =A0 ressource usage: +2048 KB
 =A0 =A0 =A0 =A0 =A0 user time: +0.001s | sys. time: +0.001s

 How does it work? The operating system fakes the memory allocation and
 just gives us 131072 references to a special read-only memory page of
 zeroes. The semantic is copy-on-write. So we start with a view on zeroed
 out memory and get the real thing once we write into it.

Special? What do you mean by special? Most OS use Virtual Memory so
sure they can say here is a page and yet not have that page backed by
physical memory. To my knowledge, they can only do this if the
allocated memory points to an unmapped page. I doubt this is the case
in non-trivial programs.

 (Sorry, if I
 tell
 some of you nothing new, but I just found this out today ;) )
 The question I have is, should we go and improve druntime with that
 knowledge? I'm not aware of any caveats, are there any?
 Thanks for reading and the test program for Linux is in the attachment (I
 used GDC to compile).

 -- Marco

Feb 07 2012

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 08.02.2012, 04:40 Uhr, schrieb Jose Armando Garcia <jsancio gmail.com>:

 Special? What do you mean by special? Most OS use Virtual Memory so
 sure they can say here is a page and yet not have that page backed by
 physical memory. To my knowledge, they can only do this if the
 allocated memory points to an unmapped page. I doubt this is the case
 in non-trivial programs.

Yes, special. Like, say, you fork a process on Unix. Instead of copying  
all the uses memory, the processes share the same pages in read-only,  
copy-on-write mode. As soon as one of the processes writes to a memory  
page it is duplicated. The same is true for the 'zero page'. Just that  
there is one for the whole system and it can have a zillion references to  
it: http://en.wikipedia.org/wiki/Copy-on-write <- Look for calloc
So the page can even be mapped and read without increasing the memory  
footprint, as I understand it.

Feb 07 2012

D Programming

C/C++ Programming

Other

digitalmars.D - How to save RAM in D programs (on zero initialized buffers): Reloaded