www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - a GC question

reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
I was doing some experiments with comparing the MinTL container performance 
against the STL performance and MinTL was getting stomped because of the GC. 
It seemed to be allocating twice as much memory as the STL runs. The 
allocations I was making was for arrays with lengths a power of two. I 
looked at gc.d and noticed all the allocations were being made with +1 added 
to the lengths, which means my nice power of two was being incremented to 
the next larger bucket and hence using twice the space of the C++ code. Can 
we get rid of those +1? I recompiled phobos without the +1 and it all seemed 
to work ok. I think it will be common to use power-of-two sizes for objects 
assuming those will fit nicely together when in fact those fit the worst 
together.

-Ben 
May 19 2005
next sibling parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 I was doing some experiments with comparing the MinTL container  
 performance
 against the STL performance and MinTL was getting stomped because of the  
 GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1  
 added
 to the lengths, which means my nice power of two was being incremented to
 the next larger bucket and hence using twice the space of the C++ code.

Whew. Thanks for that information! I have to adjust that in my containers as well (By the way, would you mind testing them as well? *sweet-question*). I think the +1 comes from the string handling... The GC puts a 0 after them to help you cope with the native C api... Ciao uwe
May 19 2005
next sibling parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 I was doing some experiments with comparing the MinTL container  
 performance
 against the STL performance and MinTL was getting stomped because of  
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1  
 added
 to the lengths, which means my nice power of two was being incremented  
 to
 the next larger bucket and hence using twice the space of the C++ code.

Whew. Thanks for that information! I have to adjust that in my containers as well.

By the way, that improves vector's allocating performance by 15%. Quite a change. Thanks! uwe
May 19 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq0zk4il6yjbe6 sandmann.maerchenwald.net...
 I was doing some experiments with comparing the MinTL container 
 performance
 against the STL performance and MinTL was getting stomped because of 
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1 
 added
 to the lengths, which means my nice power of two was being incremented 
 to
 the next larger bucket and hence using twice the space of the C++ code.

Whew. Thanks for that information! I have to adjust that in my containers as well.


How would you adjust your code? I don't see what a container can do.
 By the way, that improves vector's allocating performance by 15%. Quite a 
 change.

For my power-of-two tests it was an order of magnitude. What kinds of tests are you using?
May 19 2005
next sibling parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 Whew. Thanks for that information! I have to adjust that in my
 containers as well.


How would you adjust your code? I don't see what a container can do.

The container always allocates more memory than needed. If you simply append() elements, as soon as he does not have enough space, he allocates the next power of two but one (right grammar?? I mean the one following the next) bytes, later the next power of two that is larger than 1.5 * requested size. Thus he often allocated way too much, as you said. Of course the container can't prevent you from adding exactly as much elements that their size in bytes is a power of two. This is especially a problem in reserve(). But if you simply don't care and just append() what you have, the problem will not occur.
 By the way, that improves vector's allocating performance by 15%. Quite  
 a change.

For my power-of-two tests it was an order of magnitude.

What do you mean with "order of magnitude"? The dictionary spits out several translations... ;)
 What kinds of tests are you using?

// This tests how fast writing items and // dynamically allocating space as needed is. for (int i = 0; i < anz; ++i) array.append(i); // This tests only the writing speed. array.reserve(anz); for (int i = 0; i < anz; ++i) array.append(i); I do these tests both for my arrays and the D arrays, and compare. Currently they are equally fast. Note that the first test has this equivalent in D: array.length = 16; for (int i = 0; i < anz; ++i) { if (i >= array.length) array.length = array.length * 2; array[i] = i; } Ciao uwe
May 19 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq03yu2e6yjbe6 sandmann.maerchenwald.net...
 Whew. Thanks for that information! I have to adjust that in my
 containers as well.


How would you adjust your code? I don't see what a container can do.

The container always allocates more memory than needed. If you simply append() elements, as soon as he does not have enough space, he allocates the next power of two but one (right grammar?? I mean the one following the next) bytes, later the next power of two that is larger than 1.5 * requested size. Thus he often allocated way too much, as you said. Of course the container can't prevent you from adding exactly as much elements that their size in bytes is a power of two. This is especially a problem in reserve(). But if you simply don't care and just append() what you have, the problem will not occur.
 By the way, that improves vector's allocating performance by 15%. Quite 
 a change.

For my power-of-two tests it was an order of magnitude.

What do you mean with "order of magnitude"? The dictionary spits out several translations... ;)

Order of magnitude means a power of 10 or 2 depending on context. My use above means power of 2 (or more actually). The first time I ran the test it started swapping and I had to kill it - only when I removed the +1 did it fit without swapping and it ran in approximately the same space as the STL version.
 I do these tests both for my arrays and the D arrays, and compare. 
 Currently they are equally fast. Note that the first test has this 
 equivalent in D:

 array.length = 16;
 for (int i = 0; i < anz; ++i)
 {
   if (i >= array.length)
     array.length = array.length * 2;

   array[i] = i;
 }

This is the kind of resizing (powers of two) that you want to avoid since it results in 1/2 wasted space even when you fill up the array entirely. For example, say i==16. Then the line array.length = array.length*2; will actually result in 2*16+1 bytes being requested which is 33 which gets rounded up to 64 even though you will only ever use 32. Once i == 32 the resize will request 2*32+1 bytes which is 65 which doesn't fit in 64 so it will against reallocate and ask for 128. So you are always wasting 1/2 the space when you resize by powers of 2. It becomes, ironically, the most inefficient way to geometrically grow the array.
May 19 2005
parent "Uwe Salomon" <post uwesalomon.de> writes:
 It becomes, ironically, the most
 inefficient way to geometrically grow the array.

This really needs to be changed. You are perfectly right, rounding to powers of two is such a common approach that it should work as fast as expected. Ciao uwe
May 19 2005
prev sibling parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 I was doing some experiments with comparing the MinTL container
 performance
 against the STL performance and MinTL was getting stomped because of
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1
 added
 to the lengths, which means my nice power of two was being incremented
 to
 the next larger bucket and hence using twice the space of the  
 C++ code.




What i forgot to ask: Does the GC always add 1 to the length of the dynamic array? Even for large elements? struct Verylarge { dchar[50] x; } Verylarge[] dynArray; dynArray.length = 10; This would waste 200 bytes then... With no benefit, i think. The GC should only add 1 length for (d)(w)char, i cannot see of what use this is in other cases. Well, we have to ask Walter then. ciao uwe
May 19 2005
parent "Ben Hinkle" <bhinkle mathworks.com> writes:
"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq06dwv76yjbe6 sandmann.maerchenwald.net...
 I was doing some experiments with comparing the MinTL container
 performance
 against the STL performance and MinTL was getting stomped because of
 the GC.
 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1
 added
 to the lengths, which means my nice power of two was being incremented
 to
 the next larger bucket and hence using twice the space of the  C++ 
 code.




What i forgot to ask: Does the GC always add 1 to the length of the dynamic array? Even for large elements? struct Verylarge { dchar[50] x; } Verylarge[] dynArray; dynArray.length = 10; This would waste 200 bytes then... With no benefit, i think. The GC should only add 1 length for (d)(w)char, i cannot see of what use this is in other cases. Well, we have to ask Walter then. ciao uwe

It adds 1 byte to every request.
May 19 2005
prev sibling parent reply "Lionello Lunesu" <lio lunesu.removethis.com> writes:
 *sweet-question*). I think the +1 comes from the string handling... The GC 
 puts a 0 after them to help you cope with the native C api...

Oh my.. I'd say: remove the possibility for an array to be castable to a pointer; remove this hack; add a property .ptr... It all seems too big a price to pay for compatibility with old C libraries. New D libraries, Phobos in particular, should just use D arrays, with no pointer whatsoever. L.
May 19 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 New D libraries, Phobos in particular, should just use D arrays, with no
 pointer whatsoever.

Nope. Sorry, but if you program some data structure and try to optimize it, it is really necessary to work with these pointers. Removing this possibility would simply make a lot of things impossible. And one of the goals of D is having the possibility to work at low level when necessary. And to say something about the 0 after strings (i'm not sure if the +1 comes from that issue!) ... my containers do the same thing, and my toUtfXXX converters too. It costs a byte (or a dchar for an UTF-32 string), and opens up the whole world of existing C APIs. Rewrite ICU & friends for D, and i will not need that any more and delete it from my containers... Just a simple example for the necessity of low-level pointer access: A skip-list is used to build a map-like structure. It is basically a linked list, but some of the nodes not only link to their direct successor, but also to some node much further in the list. The basic node looks like this: struct Node { SomeType key; OtherType data; Node* prev; Node*[1] next; } This node has only one pointer, which points to the next node. If the node has level 1 (that means it has 1 extra pointer to a node which is about 7 nodes ahead), it is allocated like that: Node* newNode = cast(Node*) new char[Node.sizeof + level * (Node*).sizeof]; // where level == 1. Now you can access that extra pointer if you know for sure that the node has level 1 (the internal structure lets you know): newNode.next.ptr[1] = xxx; This simple and very fast approach relies on pointers, casts and all that low-level stuff. And the benefit is that you don't have to declare the Node like that: struct Node { SomeType key; OtherType data; Node* prev; Node*[11] next; } Wasting 40 bytes for every level-0-node (7 of 8 are level 0). Now imagine you do some real work, with about a million elements (if you have only 1000, you could simply run through all of them to find a peculiar one): 7/8 * 1000000 * 40 = 35 MByte. See what i mean? Ciao uwe
May 20 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 Just a simple example for the necessity of low-level pointer access: A 
 skip-list is used to build a map-like structure. It is basically a linked 
 list, but some of the nodes not only link to their direct successor, but 
 also to some node much further in the list. The basic node looks like 
 this:

 struct Node
 {
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[1] next;
 }

 This node has only one pointer, which points to the next node. If the node 
 has level 1 (that means it has 1 extra pointer to a node which is about 7 
 nodes ahead), it is allocated like that:

 Node* newNode = cast(Node*) new char[Node.sizeof + level * 
 (Node*).sizeof];
 // where level == 1.

 Now you can access that extra pointer if you know for sure that the node 
 has level 1 (the internal structure lets you know):

 newNode.next.ptr[1] = xxx;

 This simple and very fast approach relies on pointers, casts and all that 
 low-level stuff. And the benefit is that you don't have to declare the 
 Node like that:

 struct Node
 {
   SomeType key;
   OtherType data;

   Node* prev;
   Node*[11] next;
 }

 Wasting 40 bytes for every level-0-node (7 of 8 are level 0). Now imagine 
 you do some real work, with about a million elements (if you have only 
 1000, you could simply run through all of them to find a peculiar one):

 7/8 * 1000000 * 40 = 35 MByte.

 See what i mean?

eek. I know that's a common C trick to extend an array at the end of a struct but does D recommend that practice? I assume it could wreak havoc on garbage collectors that are "type aware" since the pointers stored in the extended section have unknown types to the GC. For example if the GC were changed to allocate pointer-free data from a special non-scanned heap then the new char[..] would allocate from that heap and they the cast(Node*) would wind up meaning that pointers are stored hidden inside what the GC thinks is a char[].
May 20 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 eek. I know that's a common C trick to extend an array at the end of a
 struct but does D recommend that practice? I assume it could wreak havoc  
 on
 garbage collectors that are "type aware" since the pointers stored in the
 extended section have unknown types to the GC. For example if the GC were
 changed to allocate pointer-free data from a special non-scanned heap  
 then
 the new char[..] would allocate from that heap and they the cast(Node*)
 would wind up meaning that pointers are stored hidden inside what the GC
 thinks is a char[].

What do you recommend, then? I thought D should be a programming language that lets you access the lower levels if you need to. Now this is all not true anymore, because we have a super-duper overclever garbage collector? Perhaps the D garbage collector needs to be robust, and simply cannot be too funky. I mean, what would the benefit of all that cleanness in the programs be? Look at some code from an average open source programmer... your hairs will stand on end. That is only logical, because high quality code needs a lot of time. You won't find a lot of programs where the programmers took this time. So why optimize the GC to the bare bone, if its total execution time is under 5% in the average program? It is much better to use some advanced memory techniques (as outlined in the D docs somewhere) when it really matters, i think. Ciao uwe
May 20 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Uwe Salomon" <post uwesalomon.de> wrote in message 
news:op.sq2quseu6yjbe6 sandmann.maerchenwald.net...
 eek. I know that's a common C trick to extend an array at the end of a
 struct but does D recommend that practice? I assume it could wreak havoc 
 on
 garbage collectors that are "type aware" since the pointers stored in the
 extended section have unknown types to the GC. For example if the GC were
 changed to allocate pointer-free data from a special non-scanned heap 
 then
 the new char[..] would allocate from that heap and they the cast(Node*)
 would wind up meaning that pointers are stored hidden inside what the GC
 thinks is a char[].

What do you recommend, then? I thought D should be a programming language that lets you access the lower levels if you need to. Now this is all not true anymore, because we have a super-duper overclever garbage collector? Perhaps the D garbage collector needs to be robust, and simply cannot be too funky.

Low level memory management tricks should use malloc instead of the GC. If one wants to use the GC one has to accept some rules. Your code would work fine with today's GC so it's not like all GC's would have trouble with it so if you want you could keep that code and just say it is only usable with certain GC's - but that probably isn't what you want. What would I recommend... the first thing that comes to mind is just live with the 'next' list as a separate allocation stored as a dynamic array: struct Node { SomeType key; OtherType data; Node* prev; Node*[] next; } Node* newNode = new Node; newNode.next = new Node*[4]; // this node has 4 next pointers. If profiling indicates that is unacceptable then I'd look into more complex solutions. For example import std.stdio; struct FullNode(int N) { int key; int data; .FullNode!(1)* prev; int n = N; .FullNode!(1)*[N] next; } alias FullNode!(1) Node; int main() { Node* newNode = new Node; // 1 item writefln(newNode.n); newNode.next[0] = cast(Node*)new FullNode!(4); // 4 items writefln(newNode.next[0].n); return 0; } But there are probably plenty of options available.
 I mean, what would the benefit of all that cleanness in the programs be? 
 Look at some code from an average open source programmer... your hairs 
 will stand on end. That is only logical, because high quality code needs a 
 lot of time. You won't find a lot of programs where the programmers took 
 this time. So why optimize the GC to the bare bone, if its total execution 
 time is under 5% in the average program? It is much better to use some 
 advanced memory techniques (as outlined in the D docs somewhere) when it 
 really matters, i think.

I don't understand your point. Are you saying low level memory tricks are good or bad - or that the advanced memory techniques section in the doc needs changing? I can't keep up with your thought stream.
 Ciao
 uwe 

May 20 2005
next sibling parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 Node* newNode = new Node;
 newNode.next = new Node*[4]; // this node has 4 next pointers.

This almost doubles the costs of inserting an element, as there are 2 allocations instead of one. And most of the time i allocate an array with 1 or 2 elements...
 newNode.next[0] = cast(Node*)new FullNode!(4); // 4 items

The level is dynamically calculated at runtime.
 I mean, what would the benefit of all that cleanness in the programs be?
 Look at some code from an average open source programmer... your hairs
 will stand on end. That is only logical, because high quality code  
 needs a
 lot of time. You won't find a lot of programs where the programmers took
 this time. So why optimize the GC to the bare bone, if its total  
 execution
 time is under 5% in the average program? It is much better to use some
 advanced memory techniques (as outlined in the D docs somewhere) when it
 really matters, i think.

I don't understand your point. Are you saying low level memory tricks are good or bad - or that the advanced memory techniques section in the doc needs changing? I can't keep up with your thought stream.

I think low level memory tricks are very useful, and therefore good in situations where the benefit is high. You have to weight simplicity and therefore understandability and maintainability against performance. But for generic containers i would always vote for the performance, as peoply simply use them and don't think about it much. The advanced memory section in the docs is very good. What i wanted to say is that i think it is better to have a robust, but slightly slower GC, than a very complex GC that spoils all optimization efforts. Ciao uwe
May 20 2005
parent "Lionello Lunesu" <lio lunesu.removethis.com> writes:
 What i wanted to say is that i think it is better to have a robust, but
 slightly slower GC, than a very complex GC that spoils all optimization 
 efforts.

With this, I can only agree. L.
May 23 2005
prev sibling parent "Uwe Salomon" <post uwesalomon.de> writes:
 struct FullNode(int N)
 {
    int key;
    int data;
    .FullNode!(1)* prev;
    int n = N;
    .FullNode!(1)*[N] next;
 }
 alias FullNode!(1) Node;
 int main() {
     Node* newNode = new Node; // 1 item
     writefln(newNode.n);
     newNode.next[0] = cast(Node*)new FullNode!(4); // 4 items
     writefln(newNode.next[0].n);
     return 0;
 }
 But there are probably plenty of options available.

After some thinking i realized this with some extra overhead: switch (level) { case 1 : neu = cast(Node*) new NodeType!(Key, T, 1); break; case 2 : neu = cast(Node*) new NodeType!(Key, T, 2); break; case 3 : neu = cast(Node*) new NodeType!(Key, T, 3); break; case 4 : neu = cast(Node*) new NodeType!(Key, T, 4); break; case 5 : neu = cast(Node*) new NodeType!(Key, T, 5); break; // And so on till 11... } Not too beautiful, but no more obscure char[] allocation. Thanks a lot for your hints, this is really a lot cleaner (and the extra overhead for traversing the switch is low, especially because the later case statements are seldomly needed)! Ciao uwe
May 20 2005
prev sibling parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <op.sq2quseu6yjbe6 sandmann.maerchenwald.net>, Uwe Salomon says...
 eek. I know that's a common C trick to extend an array at the end of a
 struct but does D recommend that practice? I assume it could wreak havoc  
 on
 garbage collectors that are "type aware" since the pointers stored in the
 extended section have unknown types to the GC. For example if the GC were
 changed to allocate pointer-free data from a special non-scanned heap  
 then
 the new char[..] would allocate from that heap and they the cast(Node*)
 would wind up meaning that pointers are stored hidden inside what the GC
 thinks is a char[].

What do you recommend, then? I thought D should be a programming language that lets you access the lower levels if you need to. Now this is all not true anymore, because we have a super-duper overclever garbage collector? Perhaps the D garbage collector needs to be robust, and simply cannot be too funky. I mean, what would the benefit of all that cleanness in the programs be? Look at some code from an average open source programmer... your hairs will stand on end. That is only logical, because high quality code needs a lot of time. You won't find a lot of programs where the programmers took this time. So why optimize the GC to the bare bone, if its total execution time is under 5% in the average program? It is much better to use some advanced memory techniques (as outlined in the D docs somewhere) when it really matters, i think. Ciao uwe

Maybe instead of /byte[n]/ or /char[n]/, use /(void*)[(n+3)/4]/. This way the GC will think of the data as pointers, but will know that it cannot safely modify void* objects because the type info is unknown. I would think a skip list could be implemented as a template taking the node level as a parameter. There is also a data structure that has the form of a binary tree, but works just like a skip list; ie it makes the same decisions based on all the same comparisons in the same order. The key observation was that in any particular skip list, many of the pointers are never really used. For example, if a level 4 node is followed by a level 5 node, the pointers on the level 4 node are never followed. If you can legally move to the level 5 node, you would have done so when you were traversing at level 5. Once you descend to traversal at level 4, all higher level, subsequent nodes, have been eliminated. Thus, for any node X followed by a higher level node Y, X is essentially a leaf of Y. This is not exactly how it works. Ultimately, every decision is a binary decision, so you can build a binary tree that represents that decision space. Each node in the binary tree contains a random integer that corresponds (roughly) to the "skip list node level". I worked out the details of how to do this once and wrote it up in C++, but I discovered that someone else had already invented what appeared to be the same data structure. I can never find the link when I want it though... regardless, I think my implementation was better than his -- his required node pivoting. Kevin
May 20 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 Each node in the binary tree contains a random integer that corresponds
 (roughly) to the "skip list node level".

Hmm, i read something similar back then, about randomized binary trees... But that is some other issue, i think. Your approach reads interesting. Does it only save some space, or does it also speed the map up? Ciao uwe
May 20 2005
parent Kevin Bealer <Kevin_member pathlink.com> writes:
In article <op.sq3exlzr6yjbe6 sandmann.maerchenwald.net>, Uwe Salomon says...
 Each node in the binary tree contains a random integer that corresponds
 (roughly) to the "skip list node level".

Hmm, i read something similar back then, about randomized binary trees... But that is some other issue, i think. Your approach reads interesting. Does it only save some space, or does it also speed the map up? Ciao uwe

I would say it has the following advantages: 1. Easier memory management than a skip list, nodes are the same size. 2. I don't know if its faster than a skip list, but I'm fairly sure its faster than a balanced tree. You don't need to check for balance, and you don't need to balance the tree explicitely. 3. There is no worst-case input sequence. Most balanced trees do poorly with sequential input for example. Other notes: Disadvantages: 1. Computing and storing the random number. Should be very fast, but not instant of course. This is probably an overhead relative to a perfect skip list, but almost all balanced binaries trees need a little overhead. 2. No guarantee of balance -- random is random. 3. Less common and less studied. I have a partial implementation of it somewhere. I described it in more detail in other locations, which I could also track down. Other Notes: It should have properties that are very similar to the skip list -- after all, it is supposed to have the same "decision tree" that a skip list has, and the design of the balancing technique is based on the corresponding skip list. Compared to a red-black tree (for example), it trades some determinism for speed and other properties. A normal non-balanced binary tree has GC-friendly properties. If you have two pointers to one such binary tree and you add X nodes to each of them, the number of nodes that need to be rebuilt is approximately ln2(N) * X, where N is the size of the tree and X is the number of elements added. This is true because an ordinary binary tree adds elements in the downward direction, never "pivoting" parts of the tree. Each node is immutable. Now, in C++, trees are not handled this way, but in LISP they normally are (AFAIK), because GC is available. The randomized balancing preserves this property -- it adds about ln2(N) nodes and always builds nodes in the downward direction. A C++ implementation would not be concerned with this property because sharing is not std. operating procedure, but in GC languages this can be a really useful thing. For example, a text editor can get the "undo" feature very cheaply using an adaptation of this binary tree sharing. In practical terms, if you have 1000000 elements in a shared tree, and add 3 elements, you end up generating and GC-ing 60 elements, but the old million are still shared and only the modified tree has the new items. I'm not sure if this practice is still in vogue, or if not, if that is just because GC has been less mainstream for a few decades. Kevin
May 25 2005
prev sibling parent reply "Lionello Lunesu" <lio lunesu.removethis.com> writes:
Hey,

 Nope. Sorry, but if you program some data structure and try to optimize 
 it, it is really necessary to work with these pointers. Removing this 
 possibility would simply make a lot of things impossible. And one of the 
 goals of D is having the possibility to work at low level when necessary.

You make it sound as if I asked for the removal of pointers from the D spec! I said no such thing. The only thing I'm suggesting is that calling functions in old libraries that rely heavily on pointers (like the C run-time) should not be done implicitely. I should get a compiler error if I try passing a char[] to puts() cause that puts doesn't know about char[], it wants char* and wants the string to be zero terminated! Fixed by adding a simple toStringz(x), or a ~ '\0'.
 And to say something about the 0 after strings (i'm not sure if the +1 
 comes from that issue!) ... my containers do the same thing, and my 
 toUtfXXX converters too. It costs a byte (or a dchar for an UTF-32 
 string), and opens up the whole world of existing C APIs. Rewrite ICU & 
 friends for D, and i will not need that any more and delete it from my 
 containers...

Yes, sure, they can add a zero to the array, but shouldn't pretend it's not there. I mean, the array.length should grow by one, because of that extra element. Not assuming the GC/mem.manager added it and "just pass the pointer, it'll work". :-S
 Just a simple example for the necessity of low-level pointer access: A 
 skip-list is used to build a map-like structure. It is basically a linked 
 list, but some of the nodes not only link to their direct successor, but 
 also to some node much further in the list. The basic node looks like 
 this:

<snip>
 See what i mean?

Yes. First of all, you can use templates for that, using an integer template argument that defines how large the array should be. Furthermore, I really have nothing against that kind of coding. It is not what meant to 'attack' in my previous post. Sometimes it just feels that people want to be able to compile old C code with a D compiler without changing anything. Not only that, they _expect_ D to compile old C code, cursing if it doesn't. Of course, D should be able to interface with the C run-time, but should arrays therefor be implicetely castable to pointers? Or worse, char[] to char* ? L.
May 20 2005
parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 I said no such thing. The only thing I'm suggesting is that calling
 functions in old libraries that rely heavily on pointers (like the C
 run-time) should not be done implicitely.

Hm. This sounds like a good idea to me. Yes. I misunderstood you then, sorry.
 Yes. First of all, you can use templates for that, using an integer  
 template argument that defines how large the array should be.

Regrettably not, because the level of the node is only known at compile time... Ciao uwe
May 20 2005
parent "Uwe Salomon" <post uwesalomon.de> writes:
 Yes. First of all, you can use templates for that, using an integer  
 template argument that defines how large the array should be.

Regrettably not, because the level of the node is only known at compile time...

Err, it is only known at runtime. Sorry for that. uwe
May 20 2005
prev sibling next sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
 Can
 we get rid of those +1? I recompiled phobos without the +1 and it all 
 seemed to work ok. I think it will be common to use power-of-two sizes for 
 objects assuming those will fit nicely together when in fact those fit the 
 worst together.

note the +1 seems to have appeared in dmd.122 so it is fairly recent.
May 19 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <ben.hinkle gmail.com> wrote in message
news:d6i1ps$1osr$1 digitaldaemon.com...
 I was doing some experiments with comparing the MinTL container

 against the STL performance and MinTL was getting stomped because of the

 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1

 to the lengths, which means my nice power of two was being incremented to
 the next larger bucket and hence using twice the space of the C++ code.

 we get rid of those +1? I recompiled phobos without the +1 and it all

 to work ok. I think it will be common to use power-of-two sizes for

 assuming those will fit nicely together when in fact those fit the worst
 together.

The +1 was to support the following usage: T[] foo; T[] bar; ... foo = foo[length .. length]; ... foo ~= bar; Without the +1, if length is a power of 2, and the power of 2 is the bucket size, then foo[length..length] happens to cross the bucket boundary and point at the start of the *next* bucket. The array append code just sees that foo[] is pointing to the start of a bucket, and so happilly inserts bar overwritting whatever was already using that bucket. This would cause erratic, random crashes. I suggest for a fix that your code not preallocate arrays; it's redundant with what the runtime is already doing.
May 20 2005
next sibling parent reply "Uwe Salomon" <post uwesalomon.de> writes:
 I suggest for a fix that your code not preallocate arrays; it's redundant
 with what the runtime is already doing.

Is that for Linux, too? Because my code is much faster in doing that... ok, perhaps he's just much more aggressive. Ciao uwe
May 20 2005
parent "Walter" <newshound digitalmars.com> writes:
"Uwe Salomon" <post uwesalomon.de> wrote in message
news:op.sq3c0da86yjbe6 sandmann.maerchenwald.net...
 I suggest for a fix that your code not preallocate arrays; it's


 with what the runtime is already doing.

Is that for Linux, too? Because my code is much faster in doing that... ok, perhaps he's just much more aggressive.

The same code is used for both.
May 20 2005
prev sibling next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:d6lh2e$1keg$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote in message
 news:d6i1ps$1osr$1 digitaldaemon.com...
 I was doing some experiments with comparing the MinTL container

 against the STL performance and MinTL was getting stomped because of the

 It seemed to be allocating twice as much memory as the STL runs. The
 allocations I was making was for arrays with lengths a power of two. I
 looked at gc.d and noticed all the allocations were being made with +1

 to the lengths, which means my nice power of two was being incremented to
 the next larger bucket and hence using twice the space of the C++ code.

 we get rid of those +1? I recompiled phobos without the +1 and it all

 to work ok. I think it will be common to use power-of-two sizes for

 assuming those will fit nicely together when in fact those fit the worst
 together.

The +1 was to support the following usage: T[] foo; T[] bar; ... foo = foo[length .. length]; ... foo ~= bar; Without the +1, if length is a power of 2, and the power of 2 is the bucket size, then foo[length..length] happens to cross the bucket boundary and point at the start of the *next* bucket. The array append code just sees that foo[] is pointing to the start of a bucket, and so happilly inserts bar overwritting whatever was already using that bucket. This would cause erratic, random crashes.

I suggest foo ~= bar for arrays be roughly equivalent to size_t oldlen = foo.length; foo.length = foo.length + bar.length; foo[oldlen .. oldlen+bar.length] = bar[]; That way when foo.length is 0 a new pointer is allocated (as you probably already know _d_arraysetlength checks for 0 length). When foo.length is non-zero and the array can be resized in place then it behaves the same as what it does today. So basically to make the smallest change to gc.d I would change _d_arrayappend to check if length == 0 just like _d_arraysetlength and malloc a new block if that is the case. In fact it would be very odd if ~= had different semantics than changing the length and copying. I can imagine subtle bugs creeping into code if they are different. Of course ~= can be more efficient than the two step code but it shouldn't end up with a different result.
 I suggest for a fix that your code not preallocate arrays; it's redundant
 with what the runtime is already doing.

For my case I wasn't preallocating - I was allocating blocks for a deque. Each block has a fixed size that will never be sliced or resized. I would like to know how to efficiently allocate such a block of memory and initialize it to hold a paramtrizing type T. I'm starting to suspect a general-purpose deque in D will very hard to get competitive with C++.
May 20 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <ben.hinkle gmail.com> wrote in message
news:d6lrve$1rpt$1 digitaldaemon.com...
 I suggest foo ~= bar for arrays be roughly equivalent to
   size_t oldlen = foo.length;
   foo.length = foo.length + bar.length;
   foo[oldlen .. oldlen+bar.length] = bar[];
 That way when foo.length is 0 a new pointer is allocated (as you probably
 already know _d_arraysetlength checks for 0 length). When foo.length is
 non-zero and the array can be resized in place then it behaves the same as
 what it does today. So basically to make the smallest change to gc.d I

 change _d_arrayappend to check if length == 0 just like _d_arraysetlength
 and malloc a new block if that is the case.
 In fact it would be very odd if ~= had different semantics than changing

 length and copying. I can imagine subtle bugs creeping into code if they

 different. Of course ~= can be more efficient than the two step code but

 shouldn't end up with a different result.

The idea was to support the idea of 'reserving' a length for an array by setting the .length property to the reserved size, then setting .length to 0. This means that while filling the array, reallocation wouldn't be happening and it'd be more efficient.
 For my case I wasn't preallocating - I was allocating blocks for a deque.
 Each block has a fixed size that will never be sliced or resized.

Why were the sizes picked to be powers of 2?
 I would
 like to know how to efficiently allocate such a block of memory and
 initialize it to hold a paramtrizing type T. I'm starting to suspect a
 general-purpose deque in D will very hard to get competitive with C++.

May 20 2005
next sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:d6m0i1$1uoh$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote in message
 news:d6lrve$1rpt$1 digitaldaemon.com...
 I suggest foo ~= bar for arrays be roughly equivalent to
   size_t oldlen = foo.length;
   foo.length = foo.length + bar.length;
   foo[oldlen .. oldlen+bar.length] = bar[];
 That way when foo.length is 0 a new pointer is allocated (as you probably
 already know _d_arraysetlength checks for 0 length). When foo.length is
 non-zero and the array can be resized in place then it behaves the same 
 as
 what it does today. So basically to make the smallest change to gc.d I

 change _d_arrayappend to check if length == 0 just like _d_arraysetlength
 and malloc a new block if that is the case.
 In fact it would be very odd if ~= had different semantics than changing

 length and copying. I can imagine subtle bugs creeping into code if they

 different. Of course ~= can be more efficient than the two step code but

 shouldn't end up with a different result.

The idea was to support the idea of 'reserving' a length for an array by setting the .length property to the reserved size, then setting .length to 0. This means that while filling the array, reallocation wouldn't be happening and it'd be more efficient.

Then why when foo.length==0 does "foo.length = 1" reallocate but "foo ~= x" doesn't? Long ago I had argued that setting length shouldn't reallocate when resizing from 0. Now that I know the append behavior and the impact it has on the GC I'd rather have resizing from 0 reallocate both for consitency with setlength and to save the GC performance. Either that or enfore the rule that the first index of a slice must be a valid index within array bounds. After all the section on array slicing says "slicing an array means to specify a subarray of it" which to me implies the pointer refers to a valid location. I never knew one could slice off the end but I suppose that could come up if you are seaching for an item and it isn't there and so you wind up slicing off the end.
 For my case I wasn't preallocating - I was allocating blocks for a deque.
 Each block has a fixed size that will never be sliced or resized.

Why were the sizes picked to be powers of 2?

I'm open to suggestions. I'm trying to write a deque that competes with C++ implementations. Efficient memory usage is an important part of a deque and powers of 2 (or I suppose power-of-2*element.sizeof) are traditionally the most efficient packing I believe.
 I would
 like to know how to efficiently allocate such a block of memory and
 initialize it to hold a paramtrizing type T. I'm starting to suspect a
 general-purpose deque in D will very hard to get competitive with C++.


May 20 2005
prev sibling next sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

I should add that reserving using this technique is very inefficient (as I've posted about long ago) because of the 0 checks in setlength. More precisely, suppose foo.length is 0 and we want to reserve some space for it. Trace the memory allocations: foo.length = 100; // allocates and sets the ptr and length foo.length = 0; // nulls the ptr and sets length to 0 // let's try again... foo = new Foo[100]; foo = foo[0 .. 0]; // now ptr is set and length is 0 foo.length = 1; // throws away old ptr and allocates a new length 1 array So basically reserving length using 0 does not work. You lose the ptr when you resize from 0 and you lose the ptr when you resize to 0. That's why I always resize to/from 1 - it keeps the ptr. It makes the code uglier. I wouldn't mind resizing to/from 0 preserving the ptr, though.
May 20 2005
prev sibling parent reply "Lionello Lunesu" <lio lunesu.removethis.com> writes:
Hi.

 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

That really shouldn't be supported, since it depends heavily on the current implementation of the araray. For all you know (I mean, not _you_, since you wrote it :-S), setting length to zero will immediately free the memory. To reserve memory before filling an array, you simply keep track of a seperate 'count', resizing if it tends to get higher than array.length. And if this is too much work, use a container class for god's sake. Walter, I really love the way D is progressing. But I'm so scared that some strange things are left in the language, just because people have grown used to them, as they tend to do, with ALL side effects in a language. This 'length =0' is just one example. My fear also applies to casting arrays to pointer implicitely, and printf("%.*s",x), etc.. I really should write them down. It's just bad practice to let "length = 0" mean anything else than it says: empty array. Of course, it's just an optimization. Any implementation that immediately frees the array will reallocate it again when being filled, so it will always work. What about adding a 'member' function to arrays to allocate new memory. This member function would also not call any constructors, since it's just allocating memory, leaving the length 0. The current 'approach' of a = new someclass[x]; a.length=0; calls the constructors for all x elements, when these (probably) get overwritten later on. Wasting cycles. L.
May 23 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 23 May 2005 11:14:08 +0300, Lionello Lunesu  
<lio lunesu.removethis.com> wrote:
 The idea was to support the idea of 'reserving' a length for an array by
 setting the .length property to the reserved size, then setting .length  
 to
 0. This means that while filling the array, reallocation wouldn't be
 happening and it'd be more efficient.

That really shouldn't be supported, since it depends heavily on the current implementation of the araray. For all you know (I mean, not _you_, since you wrote it :-S), setting length to zero will immediately free the memory. To reserve memory before filling an array, you simply keep track of a seperate 'count', resizing if it tends to get higher than array.length. And if this is too much work, use a container class for god's sake. Walter, I really love the way D is progressing. But I'm so scared that some strange things are left in the language, just because people have grown used to them, as they tend to do, with ALL side effects in a language. This 'length =0' is just one example. My fear also applies to casting arrays to pointer implicitely, and printf("%.*s",x), etc.. I really should write them down. It's just bad practice to let "length = 0" mean anything else than it says: empty array. Of course, it's just an optimization. Any implementation that immediately frees the array will reallocate it again when being filled, so it will always work. What about adding a 'member' function to arrays to allocate new memory. This member function would also not call any constructors, since it's just allocating memory, leaving the length 0. The current 'approach' of a = new someclass[x]; a.length=0; calls the constructors for all x elements, when these (probably) get overwritten later on. Wasting cycles.

Like 'reserve' proposed several times over the last year or two... a.reserve(50); Regan
May 23 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <d6lh2e$1keg$1 digitaldaemon.com>, Walter says...
The +1 was to support the following usage:

    T[] foo;
    T[] bar;
    ...
    foo = foo[length .. length];
    ...
    foo ~= bar;

Okay, I must be missing something, because the above code looks to me like it should result in undefined behavior. Isn't the result of foo[length..length] a single nonexistent array element just beyond the end of the array? Assuming it is, why should the GC be modified to support such usage?
This would cause erratic, random crashes.

Perhaps this should just be done in debug builds, with the appropriate bounds checking in place. Then the bugs could be found and corrected without any impact on relese performance. Sean
May 20 2005
parent Sean Kelly <sean f4.ca> writes:
In article <d6m2k5$1vv6$1 digitaldaemon.com>, Sean Kelly says...
Isn't the result of foo[length..length] a single nonexistent array element just
beyond the end of the array?

I meant a zero length array at an invalid location. Sean
May 20 2005