www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More PC Precision Stuff

reply dsimcha <dsimcha yahoo.com> writes:
I've gotten underway hacking the GC to add precise heap scanning, but I
thought of one really annoying corner case that really would make things an
order of magnitude more complicated if it were handled properly:  Structs and
classes that have large static arrays embedded.  For example:

class Foo {
    Foo next;
    void*[4 * 1024 * 1024] hugeArray;
}

The problem here is that scanning this precisely would require me to either
generate a megabyte bitmask that explicitly says "scan every element of
hugeArray" or to change my bitmask data structure from a flat array to
something nested and an order of magnitude more complex to generate at compile
time.

Since this is such a rare case in practice, I'm tempted to just say that any
object with size above some arbitrary limit, say 1 kb, just gets scanned
conservatively and be done with it.  For arrays, this would be a limit on the
size of the element, i.e. for a T[], it would be a limit on T.sizeof, *not*
T.sizeof * length.  I'd like to get the community's input on this:  Is this
enough of a corner case that I have permission to cop out of solving it
properly for the sake of simplicity?
Oct 29 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 Since this is such a rare case in practice,<
I don't think this is a so uncommon case, I use something similar for my memory pools. But if handling this makes your code too much complex, then it may be acceptable to ignore it anyway. Two persons have shown the need for D benchmarks that stress the GC, so I am translating the Olden benchmarks to D. You can find two of them already in D on my site (in several variants) and I'll add more (but some of those variants use memory pools that you will not scan if you adopt that simplification). Such benchmarks (and other ones I have already) can be used to test your changes to the GC. Bye, bearophile
Oct 29 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha:
 Since this is such a rare case in practice,<
I don't think this is a so uncommon case, I use something similar for my memory
pools. Why not dynamic arrays? Wouldn't it make more sense to do: class MemoryPool { // other stuff uint[] memory; this() { memory = new uint[someHugeNumber]; } } This would have negligible additional overhead and would allow you to change the size of the memory pool at runtime. I personally find that I almost never use static arrays, either on the stack or inside heap-allocated objects because the fact that their size is fixed at compile time is just too restrictive. About my only use for them is to store compile-time constants in the static data segment.
Oct 29 2009
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-10-29 09:42:58 -0400, dsimcha <dsimcha yahoo.com> said:

 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things an
 order of magnitude more complicated if it were handled properly:  Structs and
 classes that have large static arrays embedded.  For example:
 
 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }
 
 The problem here is that scanning this precisely would require me to either
 generate a megabyte bitmask that explicitly says "scan every element of
 hugeArray" or to change my bitmask data structure from a flat array to
 something nested and an order of magnitude more complex to generate at compile
 time.
You don't need something with nesting, but something a little more complex yes. For instance you could replace the bitmask with an array of integers, each telling you in turn how many words to skip and how many words have pointers. For instance: struct Foo { int a; void* b1; void* b2; void* b3; int[4] c; void*[3000] d; } ushort[4] fooMemoryMap = [1, 3, 4, 3000]; 1 word without pointer 3 words with pointers 4 words without pointer 3000 words with pointers Now one of the remaining problems is how to handle weak pointers. The other problem is that this doesn't scale well if your static array contains a struct with pointers and non-pointers. For this you'd need to have a way to repeat a pattern (something like a rewind command in the above stream). Both can be solved by giving a special meaning to the most significant bit: enum { // For even values (non pointers) SKIP_FLAG = 0 << ushort.sizeof*8-1; REPEAT_FLAG = 1 << ushort.sizeof*8-1; // for odd values (pointers) STRONG_PTR_FLAG = 0 << ushort.sizeof*8-1; WEAK_PTR_FLAG = 1 << ushort.sizeof*8-1; } Giving you this memory map for Foo: ushort[4] fooMemoryMap = [ 1 | SKIP_FLAG, // one word of non-pointers 3 | STRONG_PTR_FLAG, // three words of pointers 4 | SKIP_FLAG, // four words of non-pointers 3000 | STRONG_PTR_FLAG, // 3000 words of pointers ]; Now you can encode even a big static array of Foos in the middle of a struct with a reasonably-sized memory map (14 bytes): struct Bar { Foo[500] a; int[500] b; } ushort[7] barMemoryMap = [ 1 | SKIP_FLAG, // one word of non-pointers 3 | STRONG_PTR_FLAG, // three words of pointers 4 | SKIP_FLAG, // four words of non-pointers 3000 | STRONG_PTR_FLAG, // 3000 words of pointers 4 | REPEAT_FLAG, // rewind 4 instructions 500 // and repeat 500 times 500 | SKIP_FLAG, // 500 words of non-pointers ]; Here, Foo's memory map just gets inserted at the right place in Bar's memory map. No nesting or pointers or anything, just a repeat instruction. An it takes only 14 bytes. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 29 2009
prev sibling next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
dsimcha wrote:

 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things
 an
 order of magnitude more complicated if it were handled properly:  Structs
 and
 classes that have large static arrays embedded.  For example:
 
 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }
 
 The problem here is that scanning this precisely would require me to
 either generate a megabyte bitmask that explicitly says "scan every
 element of hugeArray" or to change my bitmask data structure from a flat
 array to something nested and an order of magnitude more complex to
 generate at compile time.
 
 Since this is such a rare case in practice, I'm tempted to just say that
 any object with size above some arbitrary limit, say 1 kb, just gets
 scanned
 conservatively and be done with it.  For arrays, this would be a limit on
 the size of the element, i.e. for a T[], it would be a limit on T.sizeof,
 *not*
 T.sizeof * length.  I'd like to get the community's input on this:  Is
 this enough of a corner case that I have permission to cop out of solving
 it properly for the sake of simplicity?
Could you or anyone else solve this problem at a later stage? If that would not be made more difficult then I would say cop out, at least for now.
Oct 29 2009
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 29 Oct 2009 22:32:54 +0300, Lutger <lutger.blijdestijn gmail.com>  
wrote:

 dsimcha wrote:

 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things
 an
 order of magnitude more complicated if it were handled properly:   
 Structs
 and
 classes that have large static arrays embedded.  For example:

 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }

 The problem here is that scanning this precisely would require me to
 either generate a megabyte bitmask that explicitly says "scan every
 element of hugeArray" or to change my bitmask data structure from a flat
 array to something nested and an order of magnitude more complex to
 generate at compile time.

 Since this is such a rare case in practice, I'm tempted to just say that
 any object with size above some arbitrary limit, say 1 kb, just gets
 scanned
 conservatively and be done with it.  For arrays, this would be a limit  
 on
 the size of the element, i.e. for a T[], it would be a limit on  
 T.sizeof,
 *not*
 T.sizeof * length.  I'd like to get the community's input on this:  Is
 this enough of a corner case that I have permission to cop out of  
 solving
 it properly for the sake of simplicity?
Could you or anyone else solve this problem at a later stage? If that would not be made more difficult then I would say cop out, at least for now.
Don't worry, it won't have any impact on the existing code.
Oct 29 2009
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Lutger (lutger.blijdestijn gmail.com)'s article
 dsimcha wrote:
 I've gotten underway hacking the GC to add precise heap scanning, but I
 thought of one really annoying corner case that really would make things
 an
 order of magnitude more complicated if it were handled properly:  Structs
 and
 classes that have large static arrays embedded.  For example:

 class Foo {
     Foo next;
     void*[4 * 1024 * 1024] hugeArray;
 }

 The problem here is that scanning this precisely would require me to
 either generate a megabyte bitmask that explicitly says "scan every
 element of hugeArray" or to change my bitmask data structure from a flat
 array to something nested and an order of magnitude more complex to
 generate at compile time.

 Since this is such a rare case in practice, I'm tempted to just say that
 any object with size above some arbitrary limit, say 1 kb, just gets
 scanned
 conservatively and be done with it.  For arrays, this would be a limit on
 the size of the element, i.e. for a T[], it would be a limit on T.sizeof,
 *not*
 T.sizeof * length.  I'd like to get the community's input on this:  Is
 this enough of a corner case that I have permission to cop out of solving
 it properly for the sake of simplicity?
Could you or anyone else solve this problem at a later stage? If that would not be made more difficult then I would say cop out, at least for now.
I would be much more inclined to cop out on this, except that I would be making a design decision that might not be easy to change in the future w.r.t. how bitmasks are structured. If we eventually fixed this, it would likely be a breaking change to the GC interface, after D2 had gone gold. Then again, GC.malloc is a pretty low-level interface, and most access would be through whatever replaces new, which would likely be in the standard lib and could be modified in lockstep. Therefore, if/when we were to get a more principled scheme, it might be transparent to most user code, at least at the source level. One thing I definitely need to know from the insiders (Walter, Sean, etc.) is, how much do we care about having a stable, well thought out, formally specified interface to the GC routines, both at the source and binary level, at this stage in the game? If I did something hackish to get precise heap scanning mostly working, would the patch likely be rejected because it would create cruft that would have to be kept for backwards compatibility?
Oct 29 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 I personally find that I almost never use
 static arrays, either on the stack or inside heap-allocated objects because the
 fact that their size is fixed at compile time is just too restrictive.  About
my
 only use for them is to store compile-time constants in the static data
segment.
Fixed sized arrays can be quite useful, so you can learn to use them more often. Their max size is fixed, but you can use a smaller part of them, so they can be flexible enough. Stack allocations are usually quite faster. For example in my dlibs you can find an edit distance. This function needs an temporary internal buffer. If the input strings are small enough, it uses a fixed stack allocated array, otherwise it calls another variant of the same function that allocates the buffer from the heap. This keeps this function flexible, and (I have timed it) makes it quite more faster for the (common) case of small input strings. Here you can see another example of mine: http://www.fantascienza.net/leonardo/js/wirun2.c It's a Wireworld simulator, to run it needs a matrix. Indexing in a statically allocated matrix can be faster because you know the sizes at compile time. So a good compiler can find an item in the matrix avoiding a costly multiplication, using a cheaper algorithm made of few sums and shifts. Here the matrix sizes are a power of two, so to find an item in the matrix the compiler just uses a shift and a sum, this is quite faster. If you replace the static array in this wirun program (that can be translated to very similar C-like D code to be compiled with LDC) you will see this program to slow down. I have timed it. If you use a dynamic matrix you have to pay more (there are ways to simulate just a shift in a dynamically allocated matrix too, I have done that too, but that's less intuitive and commonly done).
 Why not dynamic arrays?  Wouldn't it make more sense to do:
 
 class MemoryPool {
     // other stuff
     uint[] memory;
 
     this() {
         memory = new uint[someHugeNumber];
     }
 }
 
 This would have negligible additional overhead and would allow you to change
the
 size of the memory pool at runtime.
This is a bare-bone implementation, reduced from the "extra" module of my dlibs: struct MemoryPool(T, int MAX_BLOCK_BYTES=1 << 14) { struct Block { T[(MAX_BLOCK_BYTES / T.sizeof)] items; } static Block*[] blocks; static T* nextFree, lastFree; static T* newItem() { if (nextFree >= lastFree) { blocks.length = blocks.length + 1; blocks[$-1] = new Block; nextFree = blocks[$-1].items.ptr; lastFree = nextFree + Block.items.length; } return nextFree++; } } (I have removed many static asserts, some asserts, the freeAll method, comments, ddocs, unittests, etc.) Blocks are allocated from the CG stack. A less safe variant of this struct allocates memory from the C heap (it also optionally tests at compile time if T contains reference types). You can see there's a single pool for each type, this can be a disadvantage if you want to keep more than one for the same type. Bye, bearophile
Oct 29 2009