digitalmars.D - More PC Precision Stuff

dsimcha (20/20) Oct 29 2009 I've gotten underway hacking the GC to add precise heap scanning, but I

bearophile (5/6) Oct 29 2009 I don't think this is a so uncommon case, I use something similar for my...

dsimcha (15/18) Oct 29 2009 pools.

Michel Fortin (61/76) Oct 29 2009 You don't need something with nesting, but something a little more
Lutger (3/30) Oct 29 2009 Could you or anyone else solve this problem at a later stage? If that wo...

Denis Koroskin (3/38) Oct 29 2009 Don't worry, it won't have any impact on the existing code.
dsimcha (15/45) Oct 29 2009 I would be much more inclined to cop out on this, except that I would be...

bearophile (28/45) Oct 29 2009 Fixed sized arrays can be quite useful, so you can learn to use them mor...

dsimcha <dsimcha yahoo.com> writes:

I've gotten underway hacking the GC to add precise heap scanning, but I
thought of one really annoying corner case that really would make things an
order of magnitude more complicated if it were handled properly:  Structs and
classes that have large static arrays embedded.  For example:

class Foo {
    Foo next;
    void*[4 * 1024 * 1024] hugeArray;
}

The problem here is that scanning this precisely would require me to either
generate a megabyte bitmask that explicitly says "scan every element of
hugeArray" or to change my bitmask data structure from a flat array to
something nested and an order of magnitude more complex to generate at compile
time.

Since this is such a rare case in practice, I'm tempted to just say that any
object with size above some arbitrary limit, say 1 kb, just gets scanned
conservatively and be done with it.  For arrays, this would be a limit on the
size of the element, i.e. for a T[], it would be a limit on T.sizeof, *not*
T.sizeof * length.  I'd like to get the community's input on this:  Is this
enough of a corner case that I have permission to cop out of solving it
properly for the sake of simplicity?

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 Since this is such a rare case in practice,<

I don't think this is a so uncommon case, I use something similar for my memory
pools. But if handling this makes your code too much complex, then it may be
acceptable to ignore it anyway.

Two persons have shown the need for D benchmarks that stress the GC, so I am
translating the Olden benchmarks to D. You can find two of them already in D on
my site (in several variants) and I'll add more (but some of those variants use
memory pools that you will not scan if you adopt that simplification). Such
benchmarks (and other ones I have already) can be used to test your changes to
the GC.

Bye,
bearophile

Oct 29 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha:
 Since this is such a rare case in practice,<

 I don't think this is a so uncommon case, I use something similar for my memory

pools.

Why not dynamic arrays?  Wouldn't it make more sense to do:

class MemoryPool {
    // other stuff
    uint[] memory;

    this() {
        memory = new uint[someHugeNumber];
    }
}

This would have negligible additional overhead and would allow you to change the
size of the memory pool at runtime.  I personally find that I almost never use
static arrays, either on the stack or inside heap-allocated objects because the
fact that their size is fixed at compile time is just too restrictive.  About my
only use for them is to store compile-time constants in the static data segment.

Oct 29 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-10-29 09:42:58 -0400, dsimcha <dsimcha yahoo.com> said:

 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things an
 order of magnitude more complicated if it were handled properly:  Structs and
 classes that have large static arrays embedded.  For example:
 
 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }
 
 The problem here is that scanning this precisely would require me to either
 generate a megabyte bitmask that explicitly says "scan every element of
 hugeArray" or to change my bitmask data structure from a flat array to
 something nested and an order of magnitude more complex to generate at compile
 time.

You don't need something with nesting, but something a little more 
complex yes. For instance you could replace the bitmask with an array 
of integers, each telling you in turn how many words to skip and how 
many words have pointers. For instance:

	struct Foo {
		int a;
		void* b1;
		void* b2;
		void* b3;
		int[4] c;
		void*[3000] d;
	}

	ushort[4] fooMemoryMap = [1, 3, 4, 3000];

	1    word without pointer
	3    words with pointers
	4    words without pointer
	3000 words with pointers

Now one of the remaining problems is how to handle weak pointers. The 
other problem is that this doesn't scale well if your static array 
contains a struct with pointers and non-pointers. For this you'd need 
to have a way to repeat a pattern (something like a rewind command in 
the above stream). Both can be solved by giving a special meaning to 
the most significant bit:

	enum {
		// For even values (non pointers)
		SKIP_FLAG   = 0 << ushort.sizeof*8-1;
		REPEAT_FLAG = 1 << ushort.sizeof*8-1;
		// for odd values (pointers)
		STRONG_PTR_FLAG = 0 << ushort.sizeof*8-1;
		WEAK_PTR_FLAG   = 1 << ushort.sizeof*8-1;
	}

Giving you this memory map for Foo:

	ushort[4] fooMemoryMap = [
		1    | SKIP_FLAG,           // one word of non-pointers
		3    | STRONG_PTR_FLAG,     // three words of pointers
		4    | SKIP_FLAG,           // four words of non-pointers
		3000 | STRONG_PTR_FLAG,     // 3000 words of pointers
	];

Now you can encode even a big static array of Foos in the middle of a 
struct with a reasonably-sized memory map (14 bytes):

	struct Bar {
		Foo[500] a;
		int[500] b;
	}

	ushort[7] barMemoryMap = [
		1    | SKIP_FLAG,           // one word of non-pointers
		3    | STRONG_PTR_FLAG,     // three words of pointers
		4    | SKIP_FLAG,           // four words of non-pointers
		3000 | STRONG_PTR_FLAG,     // 3000 words of pointers
		4    | REPEAT_FLAG,         // rewind 4 instructions
        500                         //   and repeat 500 times
        500  | SKIP_FLAG,           // 500 words of non-pointers
	];

Here, Foo's memory map just gets inserted at the right place in Bar's 
memory map. No nesting or pointers or anything, just a repeat 
instruction. An it takes only 14 bytes.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 29 2009

Lutger <lutger.blijdestijn gmail.com> writes:

dsimcha wrote:

 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things
 an
 order of magnitude more complicated if it were handled properly:  Structs
 and
 classes that have large static arrays embedded.  For example:
 
 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }
 
 The problem here is that scanning this precisely would require me to
 either generate a megabyte bitmask that explicitly says "scan every
 element of hugeArray" or to change my bitmask data structure from a flat
 array to something nested and an order of magnitude more complex to
 generate at compile time.
 
 Since this is such a rare case in practice, I'm tempted to just say that
 any object with size above some arbitrary limit, say 1 kb, just gets
 scanned
 conservatively and be done with it.  For arrays, this would be a limit on
 the size of the element, i.e. for a T[], it would be a limit on T.sizeof,
 *not*
 T.sizeof * length.  I'd like to get the community's input on this:  Is
 this enough of a corner case that I have permission to cop out of solving
 it properly for the sake of simplicity?

Could you or anyone else solve this problem at a later stage? If that would 
not be made more difficult then I would say cop out, at least for now.

Oct 29 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 29 Oct 2009 22:32:54 +0300, Lutger <lutger.blijdestijn gmail.com>  
wrote:

 dsimcha wrote:

 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things
 an
 order of magnitude more complicated if it were handled properly:   
 Structs
 and
 classes that have large static arrays embedded.  For example:

 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }

 The problem here is that scanning this precisely would require me to
 either generate a megabyte bitmask that explicitly says "scan every
 element of hugeArray" or to change my bitmask data structure from a flat
 array to something nested and an order of magnitude more complex to
 generate at compile time.

 Since this is such a rare case in practice, I'm tempted to just say that
 any object with size above some arbitrary limit, say 1 kb, just gets
 scanned
 conservatively and be done with it.  For arrays, this would be a limit  
 on
 the size of the element, i.e. for a T[], it would be a limit on  
 T.sizeof,
 *not*
 T.sizeof * length.  I'd like to get the community's input on this:  Is
 this enough of a corner case that I have permission to cop out of  
 solving
 it properly for the sake of simplicity?

 Could you or anyone else solve this problem at a later stage? If that  
 would
 not be made more difficult then I would say cop out, at least for now.

Don't worry, it won't have any impact on the existing code.

Oct 29 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Lutger (lutger.blijdestijn gmail.com)'s article
 dsimcha wrote:
 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things
 an
 order of magnitude more complicated if it were handled properly:  Structs
 and
 classes that have large static arrays embedded.  For example:

 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }

 The problem here is that scanning this precisely would require me to
 either generate a megabyte bitmask that explicitly says "scan every
 element of hugeArray" or to change my bitmask data structure from a flat
 array to something nested and an order of magnitude more complex to
 generate at compile time.

 Since this is such a rare case in practice, I'm tempted to just say that
 any object with size above some arbitrary limit, say 1 kb, just gets
 scanned
 conservatively and be done with it.  For arrays, this would be a limit on
 the size of the element, i.e. for a T[], it would be a limit on T.sizeof,
 *not*
 T.sizeof * length.  I'd like to get the community's input on this:  Is
 this enough of a corner case that I have permission to cop out of solving
 it properly for the sake of simplicity?

 Could you or anyone else solve this problem at a later stage? If that would
 not be made more difficult then I would say cop out, at least for now.

I would be much more inclined to cop out on this, except that I would be making
a
design decision that might not be easy to change in the future w.r.t. how
bitmasks
are structured.  If we eventually fixed this, it would likely be a breaking
change
to the GC interface, after D2 had gone gold.  Then again, GC.malloc is a pretty
low-level interface, and most access would be through whatever replaces new,
which
would likely be in the standard lib and could be modified in lockstep. 
Therefore,
if/when we were to get a more principled scheme, it might be transparent to most
user code, at least at the source level.

One thing I definitely need to know from the insiders (Walter, Sean, etc.) is,
how
much do we care about having a stable, well thought out, formally specified
interface to the GC routines, both at the source and binary level, at this stage
in the game?  If I did something hackish to get precise heap scanning mostly
working, would the patch likely be rejected because it would create cruft that
would have to be kept for backwards compatibility?

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 I personally find that I almost never use
 static arrays, either on the stack or inside heap-allocated objects because the
 fact that their size is fixed at compile time is just too restrictive.  About
my
 only use for them is to store compile-time constants in the static data
segment.

Fixed sized arrays can be quite useful, so you can learn to use them more
often. Their max size is fixed, but you can use a smaller part of them, so they
can be flexible enough. Stack allocations are usually quite faster.

For example in my dlibs you can find an edit distance. This function needs an
temporary internal buffer. If the input strings are small enough, it uses a
fixed stack allocated array, otherwise it calls another variant of the same
function that allocates the buffer from the heap. This keeps this function
flexible, and (I have timed it) makes it quite more faster for the (common)
case of small input strings.

Here you can see another example of mine:
http://www.fantascienza.net/leonardo/js/wirun2.c
It's a Wireworld simulator, to run it needs a matrix. Indexing in a statically
allocated matrix can be faster because you know the sizes at compile time. So a
good compiler can find an item in the matrix avoiding a costly multiplication,
using a cheaper algorithm made of few sums and shifts. Here the matrix sizes
are a power of two, so to find an item in the matrix the compiler just uses a
shift and a sum, this is quite faster. If you replace the static array in this
wirun program (that can be translated to very similar C-like D code to be
compiled with LDC) you will see this program to slow down. I have timed it. If
you use a dynamic matrix you have to pay more (there are ways to simulate just
a shift in a dynamically allocated matrix too, I have done that too, but that's
less intuitive and commonly done).


 Why not dynamic arrays?  Wouldn't it make more sense to do:
 
 class MemoryPool {
     // other stuff
     uint[] memory;
 
     this() {
         memory = new uint[someHugeNumber];
     }
 }
 
 This would have negligible additional overhead and would allow you to change
the
 size of the memory pool at runtime.

This is a bare-bone implementation, reduced from the "extra" module of my dlibs:

struct MemoryPool(T, int MAX_BLOCK_BYTES=1 << 14) {
    struct Block {
        T[(MAX_BLOCK_BYTES / T.sizeof)] items;
    }

    static Block*[] blocks;
    static T* nextFree, lastFree;

    static T* newItem() {
        if (nextFree >= lastFree) {
            blocks.length = blocks.length + 1;
            blocks[$-1] = new Block;

            nextFree = blocks[$-1].items.ptr;
            lastFree = nextFree + Block.items.length;
        }

        return nextFree++;
    }
}

(I have removed many static asserts, some asserts, the freeAll method,
comments, ddocs, unittests, etc.)
Blocks are allocated from the CG stack. A less safe variant of this struct
allocates memory from the C heap (it also optionally tests at compile time if T
contains reference types).
You can see there's a single pool for each type, this can be a disadvantage if
you want to keep more than one for the same type.

Bye,
bearophile

Oct 29 2009

D Programming

C/C++ Programming

Other

digitalmars.D - More PC Precision Stuff