digitalmars.D - a GC question

Ben Hinkle (12/12) May 19 2005 I was doing some experiments with comparing the MinTL container performa...

Uwe Salomon (6/16) May 19 2005 Whew. Thanks for that information! I have to adjust that in my container...

Uwe Salomon (4/17) May 19 2005 By the way, that improves vector's allocating performance by 15%. Quite ...

Ben Hinkle (5/21) May 19 2005 How would you adjust your code? I don't see what a container can do.

Uwe Salomon (31/38) May 19 2005 The container always allocates more memory than needed. If you simply

Ben Hinkle (16/45) May 19 2005 Order of magnitude means a power of 10 or 2 depending on context. My use...

Uwe Salomon (5/7) May 19 2005 This really needs to be changed. You are perfectly right, rounding to

Uwe Salomon (13/25) May 19 2005 What i forgot to ask: Does the GC always add 1 to the length of the

Ben Hinkle (3/28) May 19 2005 It adds 1 byte to every request.

Lionello Lunesu (6/8) May 19 2005 Oh my.. I'd say: remove the possibility for an array to be castable to a...

Uwe Salomon (46/48) May 20 2005 Nope. Sorry, but if you program some data structure and try to optimize ...

Ben Hinkle (8/44) May 20 2005 eek. I know that's a common C trick to extend an array at the end of a

Uwe Salomon (15/25) May 20 2005 What do you recommend, then? I thought D should be a programming languag...

Ben Hinkle (41/66) May 20 2005 Low level memory management tricks should use malloc instead of the GC. ...

Uwe Salomon (15/31) May 20 2005 This almost doubles the costs of inserting an element, as there are 2

Lionello Lunesu (2/5) May 23 2005 With this, I can only agree.

Uwe Salomon (16/33) May 20 2005 After some thinking i realized this with some extra overhead:

Kevin Bealer (25/50) May 20 2005 Maybe instead of /byte[n]/ or /char[n]/, use /(void*)[(n+3)/4]/. This w...

Uwe Salomon (5/7) May 20 2005 Hmm, i read something similar back then, about randomized binary trees.....

Kevin Bealer (43/50) May 25 2005 I would say it has the following advantages:

Lionello Lunesu (23/39) May 20 2005 You make it sound as if I asked for the removal of pointers from the D s...

Uwe Salomon (6/11) May 20 2005 Regrettably not, because the level of the node is only known at compile ...

Uwe Salomon (2/6) May 20 2005 Err, it is only known at runtime. Sorry for that.

Ben Hinkle (1/6) May 19 2005 note the +1 seems to have appeared in dmd.122 so it is fairly recent.
Walter (23/34) May 20 2005 performance

Uwe Salomon (4/6) May 20 2005 Is that for Linux, too? Because my code is much faster in doing that... ...

Walter (4/8) May 20 2005 redundant

Ben Hinkle (21/57) May 20 2005 I suggest foo ~= bar for arrays be roughly equivalent to

Walter (11/31) May 20 2005 would

Ben Hinkle (17/50) May 20 2005 Then why when foo.length==0 does "foo.length = 1" reallocate but "foo ~=...
Ben Hinkle (14/18) May 20 2005 I should add that reserving using this technique is very inefficient (as...
Lionello Lunesu (23/27) May 23 2005 That really shouldn't be supported, since it depends heavily on the curr...

Regan Heath (5/43) May 23 2005 Like 'reserve' proposed several times over the last year or two...

Sean Kelly (9/17) May 20 2005 Okay, I must be missing something, because the above code looks to me li...

Sean Kelly (3/4) May 20 2005 I meant a zero length array at an invalid location.

"Ben Hinkle" <ben.hinkle gmail.com> writes:

I was doing some experiments with comparing the MinTL container performance 
against the STL performance and MinTL was getting stomped because of the GC. 
It seemed to be allocating twice as much memory as the STL runs. The 
allocations I was making was for arrays with lengths a power of two. I 
looked at gc.d and noticed all the allocations were being made with +1 added 
to the lengths, which means my nice power of two was being incremented to 
the next larger bucket and hence using twice the space of the C++ code. Can 
we get rid of those +1? I recompiled phobos without the +1 and it all seemed 
to work ok. I think it will be common to use power-of-two sizes for objects 
assuming those will fit nicely together when in fact those fit the worst 
together.

-Ben

May 19 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 I was doing some experiments with comparing the MinTL container  
 performance
 against the STL performance and MinTL was getting stomped because of the  
 GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1  
 added
 to the lengths, which means my nice power of two was being incremented to
 the next larger bucket and hence using twice the space of the C++ code.

Whew. Thanks for that information! I have to adjust that in my containers  
as well (By the way, would you mind testing them as well?  
*sweet-question*). I think the +1 comes from the string handling... The GC  
puts a 0 after them to help you cope with the native C api...

Ciao
uwe

May 19 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 I was doing some experiments with comparing the MinTL container  
 performance
 against the STL performance and MinTL was getting stomped because of  
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1  
 added
 to the lengths, which means my nice power of two was being incremented  
 to
 the next larger bucket and hence using twice the space of the C++ code.

 Whew. Thanks for that information! I have to adjust that in my  
 containers as well.

By the way, that improves vector's allocating performance by 15%. Quite a  
change.

Thanks!
uwe

May 19 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq0zk4il6yjbe6 sandmann.maerchenwald.net...
 I was doing some experiments with comparing the MinTL container 
 performance
 against the STL performance and MinTL was getting stomped because of 
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1 
 added
 to the lengths, which means my nice power of two was being incremented 
 to
 the next larger bucket and hence using twice the space of the C++ code.

 Whew. Thanks for that information! I have to adjust that in my 
 containers as well.


How would you adjust your code? I don't see what a container can do.

 By the way, that improves vector's allocating performance by 15%. Quite a 
 change.

For my power-of-two tests it was an order of magnitude. What kinds of tests 
are you using?

May 19 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 Whew. Thanks for that information! I have to adjust that in my
 containers as well.


 How would you adjust your code? I don't see what a container can do.

The container always allocates more memory than needed. If you simply  
append() elements, as soon as he does not have enough space, he allocates  
the next power of two but one (right grammar?? I mean the one following  
the next) bytes, later the next power of two that is larger than 1.5 *  
requested size. Thus he often allocated way too much, as you said.

Of course the container can't prevent you from adding exactly as much  
elements that their size in bytes is a power of two. This is especially a  
problem in reserve(). But if you simply don't care and just append() what  
you have, the problem will not occur.

 By the way, that improves vector's allocating performance by 15%. Quite  
 a change.

 For my power-of-two tests it was an order of magnitude.

What do you mean with "order of magnitude"? The dictionary spits out  
several translations... ;)

 What kinds of tests are you using?

// This tests how fast writing items and
// dynamically allocating space as needed is.
for (int i = 0; i < anz; ++i)
   array.append(i);


// This tests only the writing speed.
array.reserve(anz);
for (int i = 0; i < anz; ++i)
   array.append(i);


I do these tests both for my arrays and the D arrays, and compare.  
Currently they are equally fast. Note that the first test has this  
equivalent in D:

array.length = 16;
for (int i = 0; i < anz; ++i)
{
   if (i >= array.length)
     array.length = array.length * 2;

   array[i] = i;
}


Ciao
uwe

May 19 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq03yu2e6yjbe6 sandmann.maerchenwald.net...
 Whew. Thanks for that information! I have to adjust that in my
 containers as well.


 How would you adjust your code? I don't see what a container can do.

 The container always allocates more memory than needed. If you simply 
 append() elements, as soon as he does not have enough space, he allocates 
 the next power of two but one (right grammar?? I mean the one following 
 the next) bytes, later the next power of two that is larger than 1.5 * 
 requested size. Thus he often allocated way too much, as you said.

 Of course the container can't prevent you from adding exactly as much 
 elements that their size in bytes is a power of two. This is especially a 
 problem in reserve(). But if you simply don't care and just append() what 
 you have, the problem will not occur.

 By the way, that improves vector's allocating performance by 15%. Quite 
 a change.

 For my power-of-two tests it was an order of magnitude.

 What do you mean with "order of magnitude"? The dictionary spits out 
 several translations... ;)

Order of magnitude means a power of 10 or 2 depending on context. My use 
above means power of 2 (or more actually). The first time I ran the test it 
started swapping and I had to kill it - only when I removed the +1 did it 
fit without swapping and it ran in approximately the same space as the STL 
version.

 I do these tests both for my arrays and the D arrays, and compare. 
 Currently they are equally fast. Note that the first test has this 
 equivalent in D:

 array.length = 16;
 for (int i = 0; i < anz; ++i)
 {
   if (i >= array.length)
     array.length = array.length * 2;

   array[i] = i;
 }

This is the kind of resizing (powers of two) that you want to avoid since it 
results in 1/2 wasted space even when you fill up the array entirely. For 
example, say i==16. Then the line array.length = array.length*2; will 
actually result in 2*16+1 bytes being requested which is 33 which gets 
rounded up to 64 even though you will only ever use 32. Once i == 32 the 
resize will request 2*32+1 bytes which is 65 which doesn't fit in 64 so it 
will against reallocate and ask for 128. So you are always wasting 1/2 the 
space when you resize by powers of 2. It becomes, ironically, the most 
inefficient way to geometrically grow the array.

May 19 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 It becomes, ironically, the most
 inefficient way to geometrically grow the array.

This really needs to be changed. You are perfectly right, rounding to  
powers of two is such a common approach that it should work as fast as  
expected.

Ciao
uwe

May 19 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 I was doing some experiments with comparing the MinTL container
 performance
 against the STL performance and MinTL was getting stomped because of
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1
 added
 to the lengths, which means my nice power of two was being incremented
 to
 the next larger bucket and hence using twice the space of the  
 C++ code.




What i forgot to ask: Does the GC always add 1 to the length of the  
dynamic array? Even for large elements?

struct Verylarge
{
   dchar[50] x;
}

Verylarge[] dynArray;
dynArray.length = 10;

This would waste 200 bytes then... With no benefit, i think. The GC should  
only add 1 length for (d)(w)char, i cannot see of what use this is in  
other cases. Well, we have to ask Walter then.

ciao
uwe

May 19 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq06dwv76yjbe6 sandmann.maerchenwald.net...
 I was doing some experiments with comparing the MinTL container
 performance
 against the STL performance and MinTL was getting stomped because of
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1
 added
 to the lengths, which means my nice power of two was being incremented
 to
 the next larger bucket and hence using twice the space of the  C++ 
 code.




 What i forgot to ask: Does the GC always add 1 to the length of the 
 dynamic array? Even for large elements?

 struct Verylarge
 {
   dchar[50] x;
 }

 Verylarge[] dynArray;
 dynArray.length = 10;

 This would waste 200 bytes then... With no benefit, i think. The GC should 
 only add 1 length for (d)(w)char, i cannot see of what use this is in 
 other cases. Well, we have to ask Walter then.

 ciao
 uwe

It adds 1 byte to every request.

May 19 2005

"Lionello Lunesu" <lio lunesu.removethis.com> writes:

 *sweet-question*). I think the +1 comes from the string handling... The GC 
 puts a 0 after them to help you cope with the native C api...

Oh my.. I'd say: remove the possibility for an array to be castable to a 
pointer; remove this hack; add a property .ptr...

It all seems too big a price to pay for compatibility with old C libraries. 
New D libraries, Phobos in particular, should just use D arrays, with no 
pointer whatsoever.

L.

May 19 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 New D libraries, Phobos in particular, should just use D arrays, with no
 pointer whatsoever.

Nope. Sorry, but if you program some data structure and try to optimize  
it, it is really necessary to work with these pointers. Removing this  
possibility would simply make a lot of things impossible. And one of the  
goals of D is having the possibility to work at low level when necessary.  
And to say something about the 0 after strings (i'm not sure if the +1  
comes from that issue!) ... my containers do the same thing, and my  
toUtfXXX converters too. It costs a byte (or a dchar for an UTF-32  
string), and opens up the whole world of existing C APIs. Rewrite ICU &  
friends for D, and i will not need that any more and delete it from my  
containers...

Just a simple example for the necessity of low-level pointer access: A  
skip-list is used to build a map-like structure. It is basically a linked  
list, but some of the nodes not only link to their direct successor, but  
also to some node much further in the list. The basic node looks like this:

struct Node
{
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[1] next;
}

This node has only one pointer, which points to the next node. If the node  
has level 1 (that means it has 1 extra pointer to a node which is about 7  
nodes ahead), it is allocated like that:

Node* newNode = cast(Node*) new char[Node.sizeof + level * (Node*).sizeof];
// where level == 1.

Now you can access that extra pointer if you know for sure that the node  
has level 1 (the internal structure lets you know):

newNode.next.ptr[1] = xxx;

This simple and very fast approach relies on pointers, casts and all that  
low-level stuff. And the benefit is that you don't have to declare the  
Node like that:

struct Node
{
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[11] next;
}

Wasting 40 bytes for every level-0-node (7 of 8 are level 0). Now imagine  
you do some real work, with about a million elements (if you have only  
1000, you could simply run through all of them to find a peculiar one):

7/8 * 1000000 * 40 = 35 MByte.

See what i mean?

Ciao
uwe

May 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 Just a simple example for the necessity of low-level pointer access: A 
 skip-list is used to build a map-like structure. It is basically a linked 
 list, but some of the nodes not only link to their direct successor, but 
 also to some node much further in the list. The basic node looks like 
 this:

 struct Node
 {
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[1] next;
 }

 This node has only one pointer, which points to the next node. If the node 
 has level 1 (that means it has 1 extra pointer to a node which is about 7 
 nodes ahead), it is allocated like that:

 Node* newNode = cast(Node*) new char[Node.sizeof + level * 
 (Node*).sizeof];
 // where level == 1.

 Now you can access that extra pointer if you know for sure that the node 
 has level 1 (the internal structure lets you know):

 newNode.next.ptr[1] = xxx;

 This simple and very fast approach relies on pointers, casts and all that 
 low-level stuff. And the benefit is that you don't have to declare the 
 Node like that:

 struct Node
 {
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[11] next;
 }

 Wasting 40 bytes for every level-0-node (7 of 8 are level 0). Now imagine 
 you do some real work, with about a million elements (if you have only 
 1000, you could simply run through all of them to find a peculiar one):

 7/8 * 1000000 * 40 = 35 MByte.

 See what i mean?

eek. I know that's a common C trick to extend an array at the end of a 
struct but does D recommend that practice? I assume it could wreak havoc on 
garbage collectors that are "type aware" since the pointers stored in the 
extended section have unknown types to the GC. For example if the GC were 
changed to allocate pointer-free data from a special non-scanned heap then 
the new char[..] would allocate from that heap and they the cast(Node*) 
would wind up meaning that pointers are stored hidden inside what the GC 
thinks is a char[].

May 20 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 eek. I know that's a common C trick to extend an array at the end of a
 struct but does D recommend that practice? I assume it could wreak havoc  
 on
 garbage collectors that are "type aware" since the pointers stored in the
 extended section have unknown types to the GC. For example if the GC were
 changed to allocate pointer-free data from a special non-scanned heap  
 then
 the new char[..] would allocate from that heap and they the cast(Node*)
 would wind up meaning that pointers are stored hidden inside what the GC
 thinks is a char[].

What do you recommend, then? I thought D should be a programming language  
that lets you access the lower levels if you need to. Now this is all not  
true anymore, because we have a super-duper overclever garbage collector?  
Perhaps the D garbage collector needs to be robust, and simply cannot be  
too funky.

I mean, what would the benefit of all that cleanness in the programs be?  
Look at some code from an average open source programmer... your hairs  
will stand on end. That is only logical, because high quality code needs a  
lot of time. You won't find a lot of programs where the programmers took  
this time. So why optimize the GC to the bare bone, if its total execution  
time is under 5% in the average program? It is much better to use some  
advanced memory techniques (as outlined in the D docs somewhere) when it  
really matters, i think.

Ciao
uwe

May 20 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq2quseu6yjbe6 sandmann.maerchenwald.net...
 eek. I know that's a common C trick to extend an array at the end of a
 struct but does D recommend that practice? I assume it could wreak havoc 
 on
 garbage collectors that are "type aware" since the pointers stored in the
 extended section have unknown types to the GC. For example if the GC were
 changed to allocate pointer-free data from a special non-scanned heap 
 then
 the new char[..] would allocate from that heap and they the cast(Node*)
 would wind up meaning that pointers are stored hidden inside what the GC
 thinks is a char[].

 What do you recommend, then? I thought D should be a programming language 
 that lets you access the lower levels if you need to. Now this is all not 
 true anymore, because we have a super-duper overclever garbage collector? 
 Perhaps the D garbage collector needs to be robust, and simply cannot be 
 too funky.

Low level memory management tricks should use malloc instead of the GC. If 
one wants to use the GC one has to accept some rules. Your code would work 
fine with today's GC so it's not like all GC's would have trouble with it so 
if you want you could keep that code and just say it is only usable with 
certain GC's - but that probably isn't what you want.
What would I recommend... the first thing that comes to mind is just live 
with the 'next' list as a separate allocation stored as a dynamic array:
struct Node
{
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[] next;
}
Node* newNode = new Node;
newNode.next = new Node*[4]; // this node has 4 next pointers.

If profiling indicates that is unacceptable then I'd look into more complex 
solutions. For example
import std.stdio;
struct FullNode(int N)
{
   int key;
   int data;
   .FullNode!(1)* prev;
   int n = N;
   .FullNode!(1)*[N] next;
}
alias FullNode!(1) Node;
int main() {
    Node* newNode = new Node; // 1 item
    writefln(newNode.n);
    newNode.next[0] = cast(Node*)new FullNode!(4); // 4 items
    writefln(newNode.next[0].n);
    return 0;
}
But there are probably plenty of options available.

 I mean, what would the benefit of all that cleanness in the programs be? 
 Look at some code from an average open source programmer... your hairs 
 will stand on end. That is only logical, because high quality code needs a 
 lot of time. You won't find a lot of programs where the programmers took 
 this time. So why optimize the GC to the bare bone, if its total execution 
 time is under 5% in the average program? It is much better to use some 
 advanced memory techniques (as outlined in the D docs somewhere) when it 
 really matters, i think.

I don't understand your point. Are you saying low level memory tricks are 
good or bad - or that the advanced memory techniques section in the doc 
needs changing? I can't keep up with your thought stream.

 Ciao
 uwe

May 20 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 Node* newNode = new Node;
 newNode.next = new Node*[4]; // this node has 4 next pointers.

This almost doubles the costs of inserting an element, as there are 2  
allocations instead of one. And most of the time i allocate an array with  
1 or 2 elements...

 newNode.next[0] = cast(Node*)new FullNode!(4); // 4 items

The level is dynamically calculated at runtime.

 I mean, what would the benefit of all that cleanness in the programs be?
 Look at some code from an average open source programmer... your hairs
 will stand on end. That is only logical, because high quality code  
 needs a
 lot of time. You won't find a lot of programs where the programmers took
 this time. So why optimize the GC to the bare bone, if its total  
 execution
 time is under 5% in the average program? It is much better to use some
 advanced memory techniques (as outlined in the D docs somewhere) when it
 really matters, i think.

 I don't understand your point. Are you saying low level memory tricks are
 good or bad - or that the advanced memory techniques section in the doc
 needs changing? I can't keep up with your thought stream.

I think low level memory tricks are very useful, and therefore good in  
situations where the benefit is high. You have to weight simplicity and  
therefore understandability and maintainability against performance. But  
for generic containers i would always vote for the performance, as peoply  
simply use them and don't think about it much.

The advanced memory section in the docs is very good.

What i wanted to say is that i think it is better to have a robust, but  
slightly slower GC, than a very complex GC that spoils all optimization  
efforts.

Ciao
uwe

May 20 2005

"Lionello Lunesu" <lio lunesu.removethis.com> writes:

 What i wanted to say is that i think it is better to have a robust, but
 slightly slower GC, than a very complex GC that spoils all optimization 
 efforts.

With this, I can only agree.

L.

May 23 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 struct FullNode(int N)
 {
    int key;
    int data;
    .FullNode!(1)* prev;
    int n = N;
    .FullNode!(1)*[N] next;
 }
 alias FullNode!(1) Node;
 int main() {
     Node* newNode = new Node; // 1 item
     writefln(newNode.n);
     newNode.next[0] = cast(Node*)new FullNode!(4); // 4 items
     writefln(newNode.next[0].n);
     return 0;
 }
 But there are probably plenty of options available.

After some thinking i realized this with some extra overhead:

switch (level)
{
   case 1 : neu = cast(Node*) new NodeType!(Key, T, 1); break;
   case 2 : neu = cast(Node*) new NodeType!(Key, T, 2); break;
   case 3 : neu = cast(Node*) new NodeType!(Key, T, 3); break;
   case 4 : neu = cast(Node*) new NodeType!(Key, T, 4); break;
   case 5 : neu = cast(Node*) new NodeType!(Key, T, 5); break;
   // And so on till 11...
}

Not too beautiful, but no more obscure char[] allocation. Thanks a lot for  
your hints, this is really a lot cleaner (and the extra overhead for  
traversing the switch is low, especially because the later case statements  
are seldomly needed)!

Ciao
uwe

May 20 2005

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <op.sq2quseu6yjbe6 sandmann.maerchenwald.net>, Uwe Salomon says...
 eek. I know that's a common C trick to extend an array at the end of a
 struct but does D recommend that practice? I assume it could wreak havoc  
 on
 garbage collectors that are "type aware" since the pointers stored in the
 extended section have unknown types to the GC. For example if the GC were
 changed to allocate pointer-free data from a special non-scanned heap  
 then
 the new char[..] would allocate from that heap and they the cast(Node*)
 would wind up meaning that pointers are stored hidden inside what the GC
 thinks is a char[].

What do you recommend, then? I thought D should be a programming language  
that lets you access the lower levels if you need to. Now this is all not  
true anymore, because we have a super-duper overclever garbage collector?  
Perhaps the D garbage collector needs to be robust, and simply cannot be  
too funky.

I mean, what would the benefit of all that cleanness in the programs be?  
Look at some code from an average open source programmer... your hairs  
will stand on end. That is only logical, because high quality code needs a  
lot of time. You won't find a lot of programs where the programmers took  
this time. So why optimize the GC to the bare bone, if its total execution  
time is under 5% in the average program? It is much better to use some  
advanced memory techniques (as outlined in the D docs somewhere) when it  
really matters, i think.

Ciao
uwe

Maybe instead of /byte[n]/ or /char[n]/, use /(void*)[(n+3)/4]/.  This way the
GC will think of the data as pointers, but will know that it cannot safely
modify void* objects because the type info is unknown.

I would think a skip list could be implemented as a template taking the node
level as a parameter.


There is also a data structure that has the form of a binary tree, but works
just like a skip list; ie it makes the same decisions based on all the same
comparisons in the same order.

The key observation was that in any particular skip list, many of the pointers
are never really used.  For example, if a level 4 node is followed by a level 5
node, the pointers on the level 4 node are never followed.  If you can legally
move to the level 5 node, you would have done so when you were traversing at
level 5.  Once you descend to traversal at level 4, all higher level, subsequent
nodes, have been eliminated.

Thus, for any node X followed by a higher level node Y, X is essentially a leaf
of Y.  This is not exactly how it works.  Ultimately, every decision is a binary
decision, so you can build a binary tree that represents that decision space.
Each node in the binary tree contains a random integer that corresponds
(roughly) to the "skip list node level".

I worked out the details of how to do this once and wrote it up in C++, but I
discovered that someone else had already invented what appeared to be the same
data structure.  I can never find the link when I want it though... regardless,
I think my implementation was better than his -- his required node pivoting.

Kevin

May 20 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 Each node in the binary tree contains a random integer that corresponds
 (roughly) to the "skip list node level".

Hmm, i read something similar back then, about randomized binary trees...  
But that is some other issue, i think. Your approach reads interesting.  
Does it only save some space, or does it also speed the map up?

Ciao
uwe

May 20 2005

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <op.sq3exlzr6yjbe6 sandmann.maerchenwald.net>, Uwe Salomon says...
 Each node in the binary tree contains a random integer that corresponds
 (roughly) to the "skip list node level".

Hmm, i read something similar back then, about randomized binary trees...  
But that is some other issue, i think. Your approach reads interesting.  
Does it only save some space, or does it also speed the map up?

Ciao
uwe

I would say it has the following advantages:

1. Easier memory management than a skip list, nodes are the same size.

2. I don't know if its faster than a skip list, but I'm fairly sure its
faster than a balanced tree.  You don't need to check for balance, and you
don't need to balance the tree explicitely.

3. There is no worst-case input sequence.  Most balanced trees do poorly with
sequential input for example.

Other notes:

Disadvantages:

1. Computing and storing the random number.  Should be very fast, but not
instant of course.  This is probably an overhead relative to a perfect skip
list, but almost all balanced binaries trees need a little overhead.

2. No guarantee of balance -- random is random.

3. Less common and less studied.  I have a partial implementation of it
somewhere.  I described it in more detail in other locations, which I could also
track down.

Other Notes:

It should have properties that are very similar to the skip list -- after all,
it is supposed to have the same "decision tree" that a skip list has, and the
design of the balancing technique is based on the corresponding skip list.

Compared to a red-black tree (for example), it trades some determinism for speed
and other properties.

A normal non-balanced binary tree has GC-friendly properties.  If you have two
pointers to one such binary tree and you add X nodes to each of them, the number
of nodes that need to be rebuilt is approximately ln2(N) * X, where N is the
size of the tree and X is the number of elements added.

This is true because an ordinary binary tree adds elements in the downward
direction, never "pivoting" parts of the tree.  Each node is immutable.  Now, in
C++, trees are not handled this way, but in LISP they normally are (AFAIK),
because GC is available.

The randomized balancing preserves this property -- it adds about ln2(N) nodes
and always builds nodes in the downward direction.  A C++ implementation would
not be concerned with this property because sharing is not std. operating
procedure, but in GC languages this can be a really useful thing.  For example,
a text editor can get the "undo" feature very cheaply using an adaptation of
this binary tree sharing.

In practical terms, if you have 1000000 elements in a shared tree, and add 3
elements, you end up generating and GC-ing 60 elements, but the old million are
still shared and only the modified tree has the new items.

I'm not sure if this practice is still in vogue, or if not, if that is just
because GC has been less mainstream for a few decades.

Kevin

May 25 2005

"Lionello Lunesu" <lio lunesu.removethis.com> writes:

Hey,

 Nope. Sorry, but if you program some data structure and try to optimize 
 it, it is really necessary to work with these pointers. Removing this 
 possibility would simply make a lot of things impossible. And one of the 
 goals of D is having the possibility to work at low level when necessary.

You make it sound as if I asked for the removal of pointers from the D spec!
I said no such thing. The only thing I'm suggesting is that calling 
functions in old libraries that rely heavily on pointers (like the C 
run-time) should not be done implicitely.

I should get a compiler error if I try passing a char[] to puts() cause that 
puts doesn't know about char[], it wants char* and wants the string to be 
zero terminated! Fixed by adding a simple toStringz(x), or a ~ '\0'.

 And to say something about the 0 after strings (i'm not sure if the +1 
 comes from that issue!) ... my containers do the same thing, and my 
 toUtfXXX converters too. It costs a byte (or a dchar for an UTF-32 
 string), and opens up the whole world of existing C APIs. Rewrite ICU & 
 friends for D, and i will not need that any more and delete it from my 
 containers...

Yes, sure, they can add a zero to the array, but shouldn't pretend it's not 
there. I mean, the array.length should grow by one, because of that extra 
element. Not assuming the GC/mem.manager added it and "just pass the 
pointer, it'll work". :-S

 Just a simple example for the necessity of low-level pointer access: A 
 skip-list is used to build a map-like structure. It is basically a linked 
 list, but some of the nodes not only link to their direct successor, but 
 also to some node much further in the list. The basic node looks like 
 this:

<snip>

 See what i mean?

Yes. First of all, you can use templates for that, using an integer template 
argument that defines how large the array should be.
Furthermore, I really have nothing against that kind of coding. It is not 
what meant to 'attack' in my previous post.

Sometimes it just feels that people want to be able to compile old C code 
with a D compiler without changing anything. Not only that, they _expect_ D 
to compile old C code, cursing if it doesn't. Of course, D should be able to 
interface with the C run-time, but should arrays therefor be implicetely 
castable to pointers? Or worse, char[] to char* ?

L.

May 20 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 I said no such thing. The only thing I'm suggesting is that calling
 functions in old libraries that rely heavily on pointers (like the C
 run-time) should not be done implicitely.

Hm. This sounds like a good idea to me. Yes. I misunderstood you then,  
sorry.

 Yes. First of all, you can use templates for that, using an integer  
 template argument that defines how large the array should be.

Regrettably not, because the level of the node is only known at compile  
time...

Ciao
uwe

May 20 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 Yes. First of all, you can use templates for that, using an integer  
 template argument that defines how large the array should be.

 Regrettably not, because the level of the node is only known at compile  
 time...

Err, it is only known at runtime. Sorry for that.

uwe

May 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 Can
 we get rid of those +1? I recompiled phobos without the +1 and it all 
 seemed to work ok. I think it will be common to use power-of-two sizes for 
 objects assuming those will fit nicely together when in fact those fit the 
 worst together.

note the +1 seems to have appeared in dmd.122 so it is fairly recent.

May 19 2005

"Walter" <newshound digitalmars.com> writes:

"Ben Hinkle" <ben.hinkle gmail.com> wrote in message
news:d6i1ps$1osr$1 digitaldaemon.com...
 I was doing some experiments with comparing the MinTL container

performance
 against the STL performance and MinTL was getting stomped because of the

GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1

added
 to the lengths, which means my nice power of two was being incremented to
 the next larger bucket and hence using twice the space of the C++ code.

Can
 we get rid of those +1? I recompiled phobos without the +1 and it all

seemed
 to work ok. I think it will be common to use power-of-two sizes for

objects
 assuming those will fit nicely together when in fact those fit the worst
 together.

The +1 was to support the following usage:

    T[] foo;
    T[] bar;
    ...
    foo = foo[length .. length];
    ...
    foo ~= bar;

Without the +1, if length is a power of 2, and the power of 2 is the bucket
size, then foo[length..length] happens to cross the bucket boundary and
point at the start of the *next* bucket. The array append code just sees
that foo[] is pointing to the start of a bucket, and so happilly inserts bar
overwritting whatever was already using that bucket. This would cause
erratic, random crashes.

I suggest for a fix that your code not preallocate arrays; it's redundant
with what the runtime is already doing.

May 20 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 I suggest for a fix that your code not preallocate arrays; it's redundant
 with what the runtime is already doing.

Is that for Linux, too? Because my code is much faster in doing that...  
ok, perhaps he's just much more aggressive.

Ciao
uwe

May 20 2005

"Walter" <newshound digitalmars.com> writes:

"Uwe Salomon" <post uwesalomon.de> wrote in message
news:op.sq3c0da86yjbe6 sandmann.maerchenwald.net...
 I suggest for a fix that your code not preallocate arrays; it's


redundant
 with what the runtime is already doing.

 Is that for Linux, too? Because my code is much faster in doing that...
 ok, perhaps he's just much more aggressive.

The same code is used for both.

May 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:d6lh2e$1keg$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote in message
 news:d6i1ps$1osr$1 digitaldaemon.com...
 I was doing some experiments with comparing the MinTL container

 performance
 against the STL performance and MinTL was getting stomped because of the

 GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1

 added
 to the lengths, which means my nice power of two was being incremented to
 the next larger bucket and hence using twice the space of the C++ code.

 Can
 we get rid of those +1? I recompiled phobos without the +1 and it all

 seemed
 to work ok. I think it will be common to use power-of-two sizes for

 objects
 assuming those will fit nicely together when in fact those fit the worst
 together.

 The +1 was to support the following usage:

    T[] foo;
    T[] bar;
    ...
    foo = foo[length .. length];
    ...
    foo ~= bar;

 Without the +1, if length is a power of 2, and the power of 2 is the 
 bucket
 size, then foo[length..length] happens to cross the bucket boundary and
 point at the start of the *next* bucket. The array append code just sees
 that foo[] is pointing to the start of a bucket, and so happilly inserts 
 bar
 overwritting whatever was already using that bucket. This would cause
 erratic, random crashes.

I suggest foo ~= bar for arrays be roughly equivalent to
  size_t oldlen = foo.length;
  foo.length = foo.length + bar.length;
  foo[oldlen .. oldlen+bar.length] = bar[];
That way when foo.length is 0 a new pointer is allocated (as you probably 
already know _d_arraysetlength checks for 0 length). When foo.length is 
non-zero and the array can be resized in place then it behaves the same as 
what it does today. So basically to make the smallest change to gc.d I would 
change _d_arrayappend to check if length == 0 just like _d_arraysetlength 
and malloc a new block if that is the case.
In fact it would be very odd if ~= had different semantics than changing the 
length and copying. I can imagine subtle bugs creeping into code if they are 
different. Of course ~= can be more efficient than the two step code but it 
shouldn't end up with a different result.

 I suggest for a fix that your code not preallocate arrays; it's redundant
 with what the runtime is already doing.

For my case I wasn't preallocating - I was allocating blocks for a deque. 
Each block has a fixed size that will never be sliced or resized. I would 
like to know how to efficiently allocate such a block of memory and 
initialize it to hold a paramtrizing type T. I'm starting to suspect a 
general-purpose deque in D will very hard to get competitive with C++.

May 20 2005

"Walter" <newshound digitalmars.com> writes:

"Ben Hinkle" <ben.hinkle gmail.com> wrote in message
news:d6lrve$1rpt$1 digitaldaemon.com...
 I suggest foo ~= bar for arrays be roughly equivalent to
   size_t oldlen = foo.length;
   foo.length = foo.length + bar.length;
   foo[oldlen .. oldlen+bar.length] = bar[];
 That way when foo.length is 0 a new pointer is allocated (as you probably
 already know _d_arraysetlength checks for 0 length). When foo.length is
 non-zero and the array can be resized in place then it behaves the same as
 what it does today. So basically to make the smallest change to gc.d I

would
 change _d_arrayappend to check if length == 0 just like _d_arraysetlength
 and malloc a new block if that is the case.
 In fact it would be very odd if ~= had different semantics than changing

the
 length and copying. I can imagine subtle bugs creeping into code if they

are
 different. Of course ~= can be more efficient than the two step code but

it
 shouldn't end up with a different result.

The idea was to support the idea of 'reserving' a length for an array by
setting the .length property to the reserved size, then setting .length to
0. This means that while filling the array, reallocation wouldn't be
happening and it'd be more efficient.

 For my case I wasn't preallocating - I was allocating blocks for a deque.
 Each block has a fixed size that will never be sliced or resized.

Why were the sizes picked to be powers of 2?

 I would
 like to know how to efficiently allocate such a block of memory and
 initialize it to hold a paramtrizing type T. I'm starting to suspect a
 general-purpose deque in D will very hard to get competitive with C++.

May 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:d6m0i1$1uoh$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote in message
 news:d6lrve$1rpt$1 digitaldaemon.com...
 I suggest foo ~= bar for arrays be roughly equivalent to
   size_t oldlen = foo.length;
   foo.length = foo.length + bar.length;
   foo[oldlen .. oldlen+bar.length] = bar[];
 That way when foo.length is 0 a new pointer is allocated (as you probably
 already know _d_arraysetlength checks for 0 length). When foo.length is
 non-zero and the array can be resized in place then it behaves the same 
 as
 what it does today. So basically to make the smallest change to gc.d I

 would
 change _d_arrayappend to check if length == 0 just like _d_arraysetlength
 and malloc a new block if that is the case.
 In fact it would be very odd if ~= had different semantics than changing

 the
 length and copying. I can imagine subtle bugs creeping into code if they

 are
 different. Of course ~= can be more efficient than the two step code but

 it
 shouldn't end up with a different result.

 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

Then why when foo.length==0 does "foo.length = 1" reallocate but "foo ~= x" 
doesn't? Long ago I had argued that setting length shouldn't reallocate when 
resizing from 0. Now that I know the append behavior and the impact it has 
on the GC I'd rather have resizing from 0 reallocate both for consitency 
with setlength and to save the GC performance. Either that or enfore the 
rule that the first index of a slice must be a valid index within array 
bounds. After all the section on array slicing says "slicing an array means 
to specify a subarray of it" which to me implies the pointer refers to a 
valid location. I never knew one could slice off the end but I suppose that 
could come up if you are seaching for an item and it isn't there and so you 
wind up slicing off the end.

 For my case I wasn't preallocating - I was allocating blocks for a deque.
 Each block has a fixed size that will never be sliced or resized.

 Why were the sizes picked to be powers of 2?

I'm open to suggestions. I'm trying to write a deque that competes with C++ 
implementations. Efficient memory usage is an important part of a deque and 
powers of 2 (or I suppose power-of-2*element.sizeof) are traditionally the 
most efficient packing I believe.

 I would
 like to know how to efficiently allocate such a block of memory and
 initialize it to hold a paramtrizing type T. I'm starting to suspect a
 general-purpose deque in D will very hard to get competitive with C++.

May 20 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

I should add that reserving using this technique is very inefficient (as 
I've posted about long ago) because of the 0 checks in setlength. More 
precisely, suppose foo.length is 0 and we want to reserve some space for it. 
Trace the memory allocations:
  foo.length = 100; // allocates and sets the ptr and length
  foo.length = 0; // nulls the ptr and sets length to 0
  // let's try again...
  foo = new Foo[100];
  foo = foo[0 .. 0]; // now ptr is set and length is 0
  foo.length = 1; // throws away old ptr and allocates a new length 1 array

So basically reserving length using 0 does not work. You lose the ptr when 
you resize from 0 and you lose the ptr when you resize to 0. That's why I 
always resize to/from 1 - it keeps the ptr. It makes the code uglier. I 
wouldn't mind resizing to/from 0 preserving the ptr, though.

May 20 2005

"Lionello Lunesu" <lio lunesu.removethis.com> writes:

Hi.

 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

That really shouldn't be supported, since it depends heavily on the current 
implementation of the araray.  For all you know (I mean, not _you_, since 
you wrote it :-S), setting length to zero will immediately free the memory. 
To reserve memory before filling an array, you simply keep track of a 
seperate 'count', resizing if it tends to get higher than array.length. And 
if this is too much work, use a container class for god's sake.

Walter, I really love the way D is progressing. But I'm so scared that some 
strange things are left in the language, just because people have grown used 
to them, as they tend to do, with ALL side effects in a language. This 
'length =0' is just one example. My fear also applies to casting arrays to 
pointer implicitely, and printf("%.*s",x), etc.. I really should write them 
down.

It's just bad practice to let "length = 0" mean anything else than it says: 
empty array. Of course, it's just an optimization. Any implementation that 
immediately frees the array will reallocate it again when being filled, so 
it will always work.

What about adding a 'member' function to arrays to allocate new memory. This 
member function would also not call any constructors, since it's just 
allocating memory, leaving the length 0. The current 'approach' of a = new 
someclass[x]; a.length=0; calls the constructors for all x elements, when 
these (probably) get overwritten later on. Wasting cycles.

L.

May 23 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 23 May 2005 11:14:08 +0300, Lionello Lunesu  
<lio lunesu.removethis.com> wrote:
 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length  
 to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

 That really shouldn't be supported, since it depends heavily on the  
 current
 implementation of the araray.  For all you know (I mean, not _you_, since
 you wrote it :-S), setting length to zero will immediately free the  
 memory.
 To reserve memory before filling an array, you simply keep track of a
 seperate 'count', resizing if it tends to get higher than array.length.  
 And
 if this is too much work, use a container class for god's sake.

 Walter, I really love the way D is progressing. But I'm so scared that  
 some
 strange things are left in the language, just because people have grown  
 used
 to them, as they tend to do, with ALL side effects in a language. This
 'length =0' is just one example. My fear also applies to casting arrays  
 to
 pointer implicitely, and printf("%.*s",x), etc.. I really should write  
 them
 down.

 It's just bad practice to let "length = 0" mean anything else than it  
 says:
 empty array. Of course, it's just an optimization. Any implementation  
 that
 immediately frees the array will reallocate it again when being filled,  
 so
 it will always work.

 What about adding a 'member' function to arrays to allocate new memory.  
 This
 member function would also not call any constructors, since it's just
 allocating memory, leaving the length 0. The current 'approach' of a =  
 new
 someclass[x]; a.length=0; calls the constructors for all x elements, when
 these (probably) get overwritten later on. Wasting cycles.

Like 'reserve' proposed several times over the last year or two...
   a.reserve(50);

Regan

May 23 2005

Sean Kelly <sean f4.ca> writes:

In article <d6lh2e$1keg$1 digitaldaemon.com>, Walter says...
The +1 was to support the following usage:

    T[] foo;
    T[] bar;
    ...
    foo = foo[length .. length];
    ...
    foo ~= bar;

Okay, I must be missing something, because the above code looks to me like it
should result in undefined behavior.  Isn't the result of foo[length..length] a
single nonexistent array element just beyond the end of the array?  Assuming it
is, why should the GC be modified to support such usage?

This would cause erratic, random crashes.

Perhaps this should just be done in debug builds, with the appropriate bounds
checking in place.  Then the bugs could be found and corrected without any
impact on relese performance.


Sean

May 20 2005

Sean Kelly <sean f4.ca> writes:

In article <d6m2k5$1vv6$1 digitaldaemon.com>, Sean Kelly says...
Isn't the result of foo[length..length] a single nonexistent array element just
beyond the end of the array?

I meant a zero length array at an invalid location.


Sean

May 20 2005

D Programming

C/C++ Programming

Other

digitalmars.D - a GC question