www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Feature suggestion: in-place append to array

reply KF <kfleszar gmail.com> writes:
I hope this is the right place to post it.
In my work, I often need to add elements at the end of dynamic arrays and
remove them from the end. This incremental changes would most
conveniently be performed by a~=e for addition of e at the end of a, and say
a=a[0..$-1]. Unfortunately, ~= always creates a copy and
is thus too time consuming. What I would like to suggest is either a new
operator, ~~=, in-place append that does not create a copy if
possible. Alternatively, new properties/methods could be defined:
- a.push(element or array) - add elements at the end of the array, without
copying if possible
- a.pop(x) - remove x elements from the end of the array
- a.capacity - get or set the actual length allocated for the array
Currently, I am using appender(a).put from algorithm module, but it is very
obscure and makes code hard to understand.
Cheers,
KF
Mar 04 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Mar 2010 04:02:46 -0500, KF <kfleszar gmail.com> wrote:

 I hope this is the right place to post it.
 In my work, I often need to add elements at the end of dynamic arrays  
 and remove them from the end. This incremental changes would most
 conveniently be performed by a~=e for addition of e at the end of a, and  
 say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
 is thus too time consuming.
No, it does not create a copy. a = a ~ e always creates a copy, a ~= e appends in place if possible. This still will be slower than appender, but in the next release of DMD it will be much faster than the current release. With the next release of DMD, a = a[0..$-1] will shrink the array, but the next append to a will reallocate. There will be a function to get around this, but use it only if you know that the data removed from the end isn't referenced anywhere else. -Steve
Mar 04 2010
parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 04:02:46 -0500, KF <kfleszar gmail.com> wrote:
 
 I hope this is the right place to post it.
 In my work, I often need to add elements at the end of dynamic arrays 
 and remove them from the end. This incremental changes would most
 conveniently be performed by a~=e for addition of e at the end of a, 
 and say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
 is thus too time consuming.
No, it does not create a copy. a = a ~ e always creates a copy, a ~= e appends in place if possible. This still will be slower than appender, but in the next release of DMD it will be much faster than the current release. With the next release of DMD, a = a[0..$-1] will shrink the array, but the next append to a will reallocate. There will be a function to get around this, but use it only if you know that the data removed from the end isn't referenced anywhere else.
Some sort of "resetAndReuse" function to clear an array, but enabling to reuse the old memory would be nice: int[] a = data; a = null; a ~= 1; //reallocates (of course) a.length = 0; a ~= 1; //will reallocate (for safety), used not to reallocate resetAndReuse(a); assert(a.length == 0); a ~= 1; //doesn't reallocate This can be implemented by setting both the slice and the internal runtime length fields to 0. Additionally, another function is necessary to replace the old preallocation trick: //preallocate 1000 elements, but don't change actual slice length auto len = a.length; a.length = len + 1000; a.length = len; As I understood it, this won't work anymore after the change. This can be implemented by enlarging the array's memory block without touching any length fields. I'm sure the function you had in mind does one of those things or both.
 -Steve
Mar 04 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:

 Some sort of "resetAndReuse" function to clear an array, but enabling to  
 reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal  
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old  
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This can  
 be implemented by enlarging the array's memory block without touching  
 any length fields.

 I'm sure the function you had in mind does one of those things or both.
proposed usage (as checked in a couple days ago): int[] a; a.setCapacity(10000); // pre-allocate at least 10000 elements. foreach(i; 0..10000) a ~= i; // no reallocation a.length = 100; a.shrinkToFit(); // resize "allocated" length to 100 elements a ~= 5; // no reallocation. I will probably add a function to do the preallocation of a new array without having to write two statements. Also, one cool thing about this that was not available before is that increasing the capacity will not initialize the new data as long as "no pointers" flag is set. For pointer-containing elements, the contents must be initialized to zero to prevent false positives during collection. Note that setCapacity is supposed to be a property (i.e. a.capacity = 10000), but the compiler doesn't support this, I added a bug (feature) request for this (3857). There is also a readable capacity property to get the number of elements the array could grow to without reallocation, a feature that may be useful for tuning. Also the name "shrinkToFit" may not be final :) I wanted to call it minimize, but there were objections, and shrinkToFit is easy to search/replace later. -Steve
Mar 04 2010
next sibling parent reply Clemens <eriatarka84 gmail.com> writes:
Steven Schveighoffer Wrote:

 int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.
I would prefer the name reserve(). It has precedent in the STL, and the method doesn't actually always set the capacity, only if the current capacity is less then the argument. Even then it may conceivably allocate more than was asked for. In other words, the name setCapacity suggests this will always succeed: a.setCapacity(n); assert(a.capacity == n); ...which is not the case as far as I can tell.
Mar 04 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Mar 2010 12:29:25 -0500, Clemens <eriatarka84 gmail.com> wrote:

 Steven Schveighoffer Wrote:

 int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.
I would prefer the name reserve(). It has precedent in the STL, and the method doesn't actually always set the capacity, only if the current capacity is less then the argument. Even then it may conceivably allocate more than was asked for. In other words, the name setCapacity suggests this will always succeed: a.setCapacity(n); assert(a.capacity == n); ...which is not the case as far as I can tell.
You are correct, setCapacity ensures that *at least* the given number of elements will be available for appending. I planned on making the function a property (but a bug would not allow that), the original intended usage was: a.capacity = 10000; Reserve doesn't work in this context. Can you come up with a name that does? I'll bring up reserve (as a function) as an alternative on the phobos mailing list, and see what people say. I kind of liked the setter/getter idea, but you make a good point. -Steve
Mar 04 2010
parent reply Mike S <mikes notarealaddresslololololol.com> writes:
Steven Schveighoffer wrote:
 
 You are correct, setCapacity ensures that *at least* the given number of 
 elements will be available for appending.
 
 I planned on making the function a property (but a bug would not allow 
 that), the original intended usage was:
 
 a.capacity = 10000;
 
 Reserve doesn't work in this context.  Can you come up with a name that 
 does?
 
 I'll bring up reserve (as a function) as an alternative on the phobos 
 mailing list, and see what people say.  I kind of liked the 
 setter/getter idea, but you make a good point.
 
 -Steve
Sorry if resurrecting this thread is against netiquette, but it caught my eye, and this is my first newsgroup post in years. ;) Anyway, is there any compelling reason why setCapacity or modifying a.capacity should allocate a nondeterministic amount of storage? Depending on the application, programmers might require strict control over memory allocation patterns and strict accounting for allocated memory. Game programmers, especially console game programmers, tend to strongly prefer deterministic allocation patterns, and nondeterminism is one of the [several] common complaints about the C++ STL (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html is a good resource on these kind of issues). In the case of D (which I'm considering learning), this is especially important for dynamic arrays, partly because they're so useful by themselves, and partly because they may form the backbone of custom containers. Whereas it's easy to add "smart nondeterministic" behavior to a deterministic setCapacity function by providing a wrapper, ordinary language users can't do the opposite. Because of this, and because dynamic arrays are so central to the D language, a nondeterministic setCapacity function may deter game programmers, especially console programmers, from adopting D. Assuming you see this post, what are your thoughts here?
Mar 31 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Mike S:
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
It's a nice read. I don't see them switching to D soon. If you whisper them that D is based on a GC they will run away screaming :-) Bye, bearophile
Mar 31 2010
parent reply Mike S <mikes notarealaddresslololololol.com> writes:
bearophile wrote:
 Mike S:
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
It's a nice read. I don't see them switching to D soon. If you whisper them that D is based on a GC they will run away screaming :-) Bye, bearophile
Hah...well, there's a reason I'm still just looking into D rather than diving in headfirst! :p Actually though, I do believe the needs of game programmers should be taken seriously while considering D's evolution: Right now, D hasn't completely found its niche, but it seems to position itself as a sane successor to C++ for systems-level programming. As it stands, I believe there are only two major kinds of programmers who still use C++, and those are game programmers and masochists. ;) Odds are, D won't be replacing C anytime soon for operating system kernels and such. It's too low-level for scripting tasks, and most website designers and non-real-time applications programmers use higher-level unlikely to go back. I think D will eventually be used for writing other heavy-duty non-OS frameworks and software systems, but if it's really going to become the successor to C++, it's going to have to absorb most of C++'s user base...and that includes game programmers. You're right that the garbage collector is a major issue - probably the biggest one inherent to the language design - but I haven't determined it's a dealbreaker, at least not yet. After all, D also allows manual memory management, and freeing memory early apparently helps speed things up anyway (http://stackoverflow.com/questions/472133/turning-off-the-d garbage-collector). That part helps ensure control over how much memory is used/available, and the only other issue with the garbage collector is reconciling its running time with the soft real-time constraint that games have to satisfy. I can think of a few tactics which should help here: 1.) In addition to permitting better reasoning about memory allocations, freeing most memory manually should reduce the load on the garbage collector and reduce its runtime, right? 2.) On the simulation side of the game engine, I believe a constant timestep promotes a more robust design, and that means some frames (relatively idle ones) will have plenty of CPU time left over. If you can explicitly call the garbage collector to make it run during those times instead of at nondeterministic times (can you?), you can maintain a smooth framerate without any GC-induced spikes. 3.) Can user code execute simultaneously with the GC in other threads (on other cores), or does the GC halt the entire program for safety reasons? Assuming simultaneous threaded execution is permitted, it would also dramatically reduce the GC's impact on multi-core systems. Assuming these strategies work, the garbage collector by itself shouldn't be a showstopper. In the case of dynamic arrays, resizing capacity deterministically is one of those small things that would be really helpful to anal game programmers, and it probably wouldn't hurt anyone else, either. Plus, it's easier to implement than "smart nondeterministic" resizing anyway. :)
Mar 31 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Mike S:

the needs of game programmers should be taken seriously while considering D's
evolution:<
The short D1 history shows that designers of small games are willing to use D. Some game designers seem almost desperate to find an usable language simpler than C++. So I agree with you that D2 can be designed keeping an eye at game designers too. But that's very demanding people, it's not easy to satisfy them even with a mature language + compiler + std lib + dev tools. And currently nothing in D2 is mature. For them maybe not even the most mature thing you can find in D world, the back-end of ldc (llvm), is mature enough :-)
Right now, D hasn't completely found its niche, but it seems to position itself
as a sane successor to C++ for systems-level programming.<
I am not able to tell the future. Some parts of D design are already old-style: - Some early design decisions make hard to inline D virtual functions (so if you write D code in Java style, you see a significant slow down compared to similar Java code running with HotSpot). So far no one seems to care of this, we'll see if I am right to see a problem here; - Some of D unsafe characteristics are worked on to improve their safety, but there's lot of road to travel still, for example null-safety and integers-overflow-safety are far away still. People are trying to explain Walter still why null-safety has some importance. - D2 defaults to mutables. This can be acceptable, I don't know; - Currently D2 is not designed from the start to work with an IDE (but I think this problem can be fixed with not too much work); - The built-in unit testing and documentation are not fit for professional usage (but the documentation is easy to extend because they are just comments, so it's a smaller problem). - etc. A system language is something that you can use to write very small binaries, that can be used to write a kernel like Linux, device drivers for a smaller computer+CPU, etc. Such things are hard to do in D2, I don't see Linus using D2 to write his kernel, he even thinks C++ is unfit. So I see D2 more like a "low-level application language", on a level located somewhere between C and
As it stands, I believe there are only two major kinds of programmers who still
use C++, and those are game programmers and masochists. ;)<
There's also an army of legacy programmers that have to update and debug tons of C++ code. Part of the human society works thanks to a mountain of C++ code. Online programming competitions are usually won by C++ code. People will find ways to use C++ for many more years, it will probably outlast us all.
It's too low-level for scripting tasks,<
I have asked several times to have Python-style array/lazy comprehensions in D :-) They help. I think their introduction can reduce by 10-30% the length of D2 programs.
I think D will eventually be used for writing other heavy-duty non-OS
frameworks and software systems,<
From what I've seen so far I think D2 will appeal to some numerics folks too, so it can eat a bit of Fortran pie too. Some improvements can make D2 more appealing to them, Don is working on this too. (Some ideas from Chapel language can help here, but I think no one here has taken a serious look at it so far).
You're right that the garbage collector is a major issue - probably the biggest
one inherent to the language design - but I haven't determined it's a
dealbreaker, at least not yet.<
The situation with the D GC is interesting. First of all D GC is not refined, Java VM GCs are way more advanced. So D GC will need a much more modern GC. Another problem is that the current D GC is quite imprecise, this causes leaks when you use it in real programs that have to run for more than few minutes. Part of this problem can be solved using a better GC that's more precise (this can slow it down a bit, but avoids a good amount of memory leaks). The other problem is intrinsic of the language, that makes it hard or impossible to invent a fully precise GC for D. And D makes it hard to use a modern generational moving GC with D. You can't just adopt a JavaVM GC with D. Even the Mono GC (that knows the notion of pinned/unpinned memory) can be unfit (because it's designed for mostly unpinned memory). This is partially caused by D being a low level language with pointers, and it's partially caused by D2 type system unable to tell apart: 1) hand-managed pointers, to GC memory or C heap memory; 2) GC-managed pointers to pinned memory; 3) GC-managed pointers to unpinned memory. I think Walter think that telling them apart in the language makes D too much complex, and he can be right. But the current situation makes it hard to design a very efficient GC for D. So I don't think high-performance game designers will use D GC for the next few years, they will manage most or all the memory manually. I am ignorant, but I think D designers have to work a little harder in finding ways to allocate unpinned objects. This (with a refined GC able to move unpinned memory, that keeps a stack-like Eden plus two or three generations of objects) can help a lot for programs written in Java-style. But computer science history has shown that if enough people work on a problem they can often find some partial solution. At the beginning Java was very slow. So there's a bit of hope still. Of course enough GC experts will work on the D GC only if D will have some success.
In the case of dynamic arrays, resizing capacity deterministically is one of
those small things that would be really helpful to anal game programmers, and
it probably wouldn't hurt anyone else, either.  Plus, it's easier to implement
than "smart nondeterministic" resizing anyway. :)<
Don't nail your mind only on that problem, that's only one of many problems. You can think that dynamic arrays are simple, but from what I've seen there's nothing simple in D dynamic arrays, people have found a way to improve them a little only now, after years of discussions about them, and I am not fully sure yet the recent changes are worth it, I mean I am not sure yet that the current D2 arrays are better than the older ones + an appender helper struct. There is no good benchmarking suite yet to test if they are an improvement. Bye, bearophile
Mar 31 2010
parent reply Mike S <mikes notarealaddresslololololol.com> writes:
bearophile wrote:
 The short D1 history shows that designers of small games are willing to use D.
Some game designers seem almost desperate to find an usable language simpler
than C++. So I agree with you that D2 can be designed keeping an eye at game
designers too. But that's very demanding people, it's not easy to satisfy them
even with a mature language + compiler + std lib + dev tools. And currently
nothing in D2 is mature. For them maybe not even the most mature thing you can
find in D world, the back-end of ldc (llvm), is mature enough :-)
 
Yeah, you're right about the demanding tool and maturity requirements that game studios have, but assuming people continue working on D and other people adopt it and enhance the tools, those things will flesh out over time. I'm young enough that I look forward to seeing it overtake C++ in the game world someday.
 
 
 I am not able to tell the future. Some parts of D design are already old-style:
 - Some early design decisions make hard to inline D virtual functions (so if
you write D code in Java style, you see a significant slow down compared to
similar Java code running with HotSpot). So far no one seems to care of this,
we'll see if I am right to see a problem here;
Well, writing code Java-style is certainly no problem for game devs, considering they already minimize virtual function usage, at least in lower code layers. ;)
 <snip>
 
 A system language is something that you can use to write very small binaries,
that can be used to write a kernel like Linux, device drivers for a smaller
computer+CPU, etc. Such things are hard to do in D2, I don't see Linus using D2
to write his kernel, he even thinks C++ is unfit. So I see D2 more like a
"low-level application language", on a level located somewhere between C and

This is true, but I do recall seeing an executable size comparison somewhere, and the D version of a program (hello world?) beat out the C++ version by about a factor of two. The C version killed both, but still, perhaps D might not be eternally unfit even if C++ is. ;) Then again, maybe the C++ program was just including a superfluous amount of library code, and maybe D programs are generally larger than their C++ equivalents. Plus, if it was hello world, it obviously wasn't using a lot of higher-level features. Either way though, even if D does become fit for kernel/driver code someday, it'll still be a long time before someone actually starts from scratch to write a new kernel using it anyway.
 
 As it stands, I believe there are only two major kinds of programmers who
still use C++, and those are game programmers and masochists. ;)<
There's also an army of legacy programmers that have to update and debug tons of C++ code. Part of the human society works thanks to a mountain of C++ code. Online programming competitions are usually won by C++ code. People will find ways to use C++ for many more years, it will probably outlast us all.
You're right, and I actually realized I misspoke here a little bit ago while I was eating. The legacy code just might keep people using C++ until the sun dies. Still, I maintain that game developers and masochists probably comprise a large portion of programmers starting completely new projects in C++. :D
 
 It's too low-level for scripting tasks,<
I have asked several times to have Python-style array/lazy comprehensions in D :-) They help. I think their introduction can reduce by 10-30% the length of D2 programs.
How difficult do you think that would be for the compiler devs to implement in the semantic sense? Assuming it can be done without major hardship or compromising the design of the language, that would be really cool. Syntactically speaking, Python list comprehensions make the source so much more compact, expressive, and clean that a statically compiled language using them would really stand out. If they're implemented correctly, I can't see any reasons why the syntactic sugar would be any slower than spelling everything out explicitly, either. The syntax would have to be a bit different to feel at home in D, but the idea itself probably isn't too foreign. I also noticed a discussion about Python tuples from October 2009 I think, and native tuples in D would also be useful...more useful than in Python, in fact. After all, Python lists can contain mixed types (unlike arrays in D), so they make tuples largely redundant except for their different conventional meanings (and except for the ability to used named tuples). In comparison, built-in tuples in D with similarly elegant syntax would fill in a much larger gap. I suppose they'd work something like implicitly generated struct types, which could be hastily constructed as lvalues or rvalues, returned from functions, packed/unpacked and passed to functions, etc. Honestly, I think there's a lot to be learned from the expressiveness of scripting languages (especially Python, given its elegant syntax without all of the Perl/PHP applied to statically compiled languages without speed hits or design compromises. > From what I've seen so far I think D2 will appeal to some numerics folks too, so it can eat a bit of Fortran pie too. Some improvements can make D2 more appealing to them, Don is working on this too. (Some ideas from Chapel language can help here, but I think no one here has taken a serious look at it so far). It interested me when you mentioned numerics above, because game engine design is also moving more in the direction of dataflow-oriented programming, where the object hierarchy is more streamlined and there's more of a focus on transforming one type of data to another. This helps with both cache efficiency and exposing data-level parallelism. With all of the vector and matrix math involved in those transformation steps (physics engines, etc.), the same improvements that appeal to FORTRAN/numerics programmers might also find some use cases here. Of course, I could be talking out of my ass, since I don't quite know what D 2.0 improvements you're referring to, but I imagine they might apply.
 The situation with the D GC is interesting.
 First of all D GC is not refined, Java VM GCs are way more advanced. So D GC
will need a much more modern GC.
 
 Another problem is that the current D GC is quite imprecise, this causes leaks
when you use it in real programs that have to run for more than few minutes.
Part of this problem can be solved using a better GC that's more precise (this
can slow it down a bit, but avoids a good amount of memory leaks).
 
 The other problem is intrinsic of the language, that makes it hard or
impossible to invent a fully precise GC for D.
 
 And D makes it hard to use a modern generational moving GC with D. You can't
just adopt a JavaVM GC with D. Even the Mono GC (that knows the notion of
pinned/unpinned memory) can be unfit (because it's designed for mostly unpinned
memory).. This is partially caused by D being a low level language with
pointers, and it's partially caused by D2 type system unable to tell apart:
 1) hand-managed pointers, to GC memory or C heap memory;
 2) GC-managed pointers to pinned memory;
 3) GC-managed pointers to unpinned memory.
 I think Walter think that telling them apart in the language makes D too much
complex, and he can be right. But the current situation makes it hard to design
a very efficient GC for D. So I don't think high-performance game designers
will use D GC for the next few years, they will manage most or all the memory
manually. I am ignorant, but I think D designers have to work a little harder
in finding ways to allocate unpinned objects. This (with a refined GC able to
move unpinned memory, that keeps a stack-like Eden plus two or three
generations of objects) can help a lot for programs written in Java-style.
Yeah, that's really a shame about the current state of the garbage collector. Any memory leaks at all are the death knell of console games, and they're also the death of pretty much any long-running application. Are the memory leaks eternal and irrevocable, or are we just talking about memory that takes a long time for the garbage collector to figure out it should free? That said, my interest in D is less about its state today and more about its state tomorrow. I'm not planning on dying anytime soon, so I imagine I'll be coding for a long time...and I hope it's not just going to be C++, C++, and more C++ for the rest of my life!
 
 But computer science history has shown that if enough people work on a problem
they can often find some partial solution. At the beginning Java was very slow.
So there's a bit of hope still. Of course enough GC experts will work on the D
GC only if D will have some success.
That's pretty much the way I look at it too. Assuming people don't just abandon D, it's only a matter of time before the genius programmers of the world fix the rough spots.
 
 Don't nail your mind only on that problem, that's only one of many problems.
 You can think that dynamic arrays are simple, but from what I've seen there's
nothing simple in D dynamic arrays, people have found a way to improve them a
little only now, after years of discussions about them, and I am not fully sure
yet the recent changes are worth it, I mean I am not sure yet that the current
D2 arrays are better than the older ones + an appender helper struct. There is
no good benchmarking suite yet to test if they are an improvement.
You're right: It's just that when I saw this thread, I figured I could bring up this one problem of many which happens to be an easy fix. :)
 Bye,
 bearophile
Mar 31 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Mike S:

Well, writing code Java-style is certainly no problem for game devs,<
Right. Here I was talking about D uses more in general, sorry, like young programmers coming out of the university.
it'll still be a long time before someone actually starts from scratch to write
a new kernel using it anyway.<
People will try to use D2 for this purpose too, for example to teaching purposes.
How difficult do you think that would be for the compiler devs to implement in
the semantic sense?  Assuming it can be done without major hardship or
compromising the design of the language, that would be really cool.<
They are easy to implement. Even the lazy ones. See ShedSkin "compiler".
I also noticed a discussion about Python tuples from October 2009 I think, and
native tuples in D would also be useful...<
We can talk about them again for D3. At the moment D2 needs less new features and better implementation/debugging of the already present features.
I think there's a lot to be learned from the expressiveness of scripting
languages<
That was one of the original goals of D.
Are the memory leaks eternal and irrevocable, or are we just talking about
memory that takes a long time for the garbage collector to figure out it should
free?<
I am mostly talking about false pointers, values that the GC thinks are pointers, while they are not. They can keep alive blocks of memory.
Assuming people don't just abandon D, it's only a matter of time before the
genius programmers of the world fix the rough spots.<
But there are limits in what smart people can invent/solve. So the language designers have to work to allow them to find solutions. Bye, bearophile
Apr 01 2010
parent reply Mike S <mikes notarealaddresslololololol.com> writes:
bearophile wrote:
 How difficult do you think that would be for the compiler devs to implement in
the semantic sense?  Assuming it can be done without major hardship or
compromising the design of the language, that would be really cool.<
They are easy to implement. Even the lazy ones. See ShedSkin "compiler".
I figured the eager ones wouldn't be a problem, but I wondered whether the lazy ones might be a pain. Guess not, so cool. :)
 We can talk about them again for D3. At the moment D2 needs less new features
and better implementation/debugging of the already present features.
That's very true...I'm looking forward to Andrei's book, but I can't imagine how he's finishing it on schedule, considering how quickly both the language itself and the compiler are evolving. If the language specification and reference compiler are both as incomplete, volatile, and partially implemented as they are now, a June release might do some real damage to D's reputation. As far as D3 goes though: Obviously nothing about it has really been discussed at length, but is the general idea that it would be another backwards-incompatible overhaul, or is the plan to make D2 the target for backwards compatibility from here on out?
Apr 01 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Mike S wrote:
 bearophile wrote:
 How difficult do you think that would be for the compiler devs to 
 implement in the semantic sense?  Assuming it can be done without 
 major hardship or compromising the design of the language, that would 
 be really cool.<
They are easy to implement. Even the lazy ones. See ShedSkin "compiler".
I figured the eager ones wouldn't be a problem, but I wondered whether the lazy ones might be a pain. Guess not, so cool. :)
 We can talk about them again for D3. At the moment D2 needs less new 
 features and better implementation/debugging of the already present 
 features.
That's very true...I'm looking forward to Andrei's book, but I can't imagine how he's finishing it on schedule, considering how quickly both the language itself and the compiler are evolving.
The book is finished and is on schedule. It's been out of my hands for a while - currently in the final copyedit stage. (Walter, last chance to remove octal literals.) I'll publish a schedule on my website soon. The entire genesis of TDPL and its growth in conjunction with the language itself, was a very interesting process (in addition to being a near-death experience) that I hope I'll find time to write an article about soon. Bottom line, the thing looks like a million 1920 bucks. I fail to see many ways in which the people involved could have made it better. That being said, it's entirely unpredictable how the book's going to fare. I'll be very curious.
 If the language 
 specification and reference compiler are both as incomplete, volatile, 
 and partially implemented as they are now, a June release might do some 
 real damage to D's reputation.
Not at all. Consider the state of C++, Perl, Java, or Ruby definitions and implementations at the time when their first "hello, world" book was printed. If anything, TDPL is slightly later, not earlier, than average. Even K&R's classic does not cover the entire language (even as it was defined back in the time) and has things like prototype-less calls, which C has since dropped like a bad habit.
 As far as D3 goes though:  Obviously nothing about it has really been 
 discussed at length, but is the general idea that it would be another 
 backwards-incompatible overhaul, or is the plan to make D2 the target 
 for backwards compatibility from here on out?
The incompatibility with D1 was a one-off thing. D2 is the flagship of D going forward. If you ask me, I predict that the schism of D2 from D1 will go down as Walter's smartest gambit ever. Andrei
Apr 01 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 The book is finished and is on schedule. It's been out of my hands for a 
 while - currently in the final copyedit stage. (Walter, last chance to 
 remove octal literals.) I'll publish a schedule on my website soon.
Walter doesn't want to change octals, so I think it's a waste of time to keep talking about that. There are several other small problems in D2 that deserve a look. They are small but important things. Please talk about those. I list most of them here in approximate order of decreasing importance: Syntax & semantics for array assigns http://d.puremagic.com/issues/show_bug.cgi?id=3971 [module system] Tiding up the imports http://d.puremagic.com/issues/show_bug.cgi?id=3819 Signed lengths (and other built-in values) http://d.puremagic.com/issues/show_bug.cgi?id=3843 [missing error] Array literal length doesn't match Array literal assign to array of different length http://d.puremagic.com/issues/show_bug.cgi?id=3849 http://d.puremagic.com/issues/show_bug.cgi?id=3948 opCast(bool) in classes is bug-prone http://d.puremagic.com/issues/show_bug.cgi?id=3926 Require opEquals/opCmp in a class the defines toHash http://d.puremagic.com/issues/show_bug.cgi?id=3844 automatic joining of adjacent strings is bad http://d.puremagic.com/issues/show_bug.cgi?id=3827 pure/nothrow functions/delegates are a subtype of the nonpure/throw ones http://d.puremagic.com/issues/show_bug.cgi?id=3833 const arguments/instance attributes in conditions/invariants http://d.puremagic.com/issues/show_bug.cgi?id=3856 bool opEquals() for structs instead of int opEquals() http://d.puremagic.com/issues/show_bug.cgi?id=3967 byte ==> sbyte http://d.puremagic.com/issues/show_bug.cgi?id=3936 http://d.puremagic.com/issues/show_bug.cgi?id=3850 A bug-prone situation with AAs http://d.puremagic.com/issues/show_bug.cgi?id=3825 Arguments and attributes with the same name http://d.puremagic.com/issues/show_bug.cgi?id=3878 More useful and more clean 'is' http://d.puremagic.com/issues/show_bug.cgi?id=3981 Those things are small breaking changes, so it's much better to think about them sooner. If you want I can explain better each one of them. Bye, bearophile
Apr 01 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Mike S:

 I figured the eager ones wouldn't be a problem, but I wondered whether 
 the lazy ones might be a pain.  Guess not, so cool. :)
Lazy ones are just a struct/class instance that contains an opApply method (or that follows the Range protocol methods). Very easy.
 As far as D3 goes though:  Obviously nothing about it has really been 
 discussed at length, but is the general idea that it would be another 
 backwards-incompatible overhaul, or is the plan to make D2 the target 
 for backwards compatibility from here on out?
From what I've seen, D3 will probably be backwards compatible with D2. (But maybe D3 will feel free to fix few backwards incompatible warts/errors found in the meantime, I don't know, like un-flattening tuples). Bye, bearophile
Apr 01 2010
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 31 Mar 2010 17:57:07 -0400, Mike S  
<mikes notarealaddresslololololol.com> wrote:

 Steven Schveighoffer wrote:
  You are correct, setCapacity ensures that *at least* the given number  
 of elements will be available for appending.
  I planned on making the function a property (but a bug would not allow  
 that), the original intended usage was:
  a.capacity = 10000;
  Reserve doesn't work in this context.  Can you come up with a name  
 that does?
  I'll bring up reserve (as a function) as an alternative on the phobos  
 mailing list, and see what people say.  I kind of liked the  
 setter/getter idea, but you make a good point.
  -Steve
Sorry if resurrecting this thread is against netiquette, but it caught my eye, and this is my first newsgroup post in years. ;) Anyway, is there any compelling reason why setCapacity or modifying a.capacity should allocate a nondeterministic amount of storage?
What do you mean by nondeterministic? It's very deterministic, just not always easy to determine ;) However, given enough context, it's really easy to determine.
 Depending on the application, programmers might require strict control  
 over memory allocation patterns and strict accounting for allocated  
 memory.  Game programmers, especially console game programmers, tend to  
 strongly prefer deterministic allocation patterns, and nondeterminism is  
 one of the [several] common complaints about the C++ STL  
 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html is a  
 good resource on these kind of issues).  In the case of D (which I'm  
 considering learning), this is especially important for dynamic arrays,  
 partly because they're so useful by themselves, and partly because they  
 may form the backbone of custom containers.
The amount of memory given is determined by the GC, and ultimately by the OS. The currently supported OSes allocate in Page-sized chunks, so when you allocate any memory from the OS, you are allocating a page (4k). Most likely, you may not need a whole page for the data you are allocating, so the GC gives you more finely sized chunks by breaking up a page into smaller pieces. This strategy works well in some cases, and can be wasteful in others. The goal is to strike a balance that is "good enough" for everyday programming, but can be specialized when you need it. If you want to control memory allocation yourself, you can always do that by allocating page-sized chunks and doing the memory management on those chunks yourself. I do something very similar in dcollections to speed up allocation/destruction.
 Whereas it's easy to add "smart nondeterministic" behavior to a  
 deterministic setCapacity function by providing a wrapper, ordinary  
 language users can't do the opposite.  Because of this, and because  
 dynamic arrays are so central to the D language, a nondeterministic  
 setCapacity function may deter game programmers, especially console  
 programmers, from adopting D.

 Assuming you see this post, what are your thoughts here?
I think D has deterministic allocation, and better ability than C++ to make custom types that look and act like builtins. Therefore, you can make an array type that suits your needs and is almost exactly the same syntax as a builtin array (except for some things reserved for builtins, like literals). Such a thing is certainly possible, even with using the GC for your allocation. BTW, I made the change to the runtime renaming the function previously known as setCapacity to reserve. It won't be a property, even if that bug is fixed. -Steve
Mar 31 2010
parent reply Mike S <mikes notarealaddresslololololol.com> writes:
Steven Schveighoffer wrote:
  > What do you mean by nondeterministic?  It's very deterministic, just 
not
 always easy to determine ;)  However, given enough context, it's really 
 easy to determine.
When I say deterministic, I'm referring to determinism from the user's point of view, where the allocation behavior is affected solely by the parameter (the size request, e.g. 10000 objects) and not by some kind of internal state, hidden context, or arcane black magic. :p > The amount of memory given is determined by the GC, and ultimately by
 the OS.  The currently supported OSes allocate in Page-sized chunks, so 
 when you allocate any memory from the OS, you are allocating a page 
 (4k).  Most likely, you may not need a whole page for the data you are 
 allocating, so the GC gives you more finely sized chunks by breaking up 
 a page into smaller pieces.  This strategy works well in some cases, and 
 can be wasteful in others.  The goal is to strike a balance that is 
 "good enough" for everyday programming, but can be specialized when you 
 need it.
That's understandable, and it makes sense that the actual memory being allocated would correspond to some chunk size. It's really just opaque black box behavior that poses a problem; if users are given well-defined guidelines and chunk sizes, that would work just fine. For instance, a spec like, "reserve a multiple of 512 bytes and that's exactly what you will be given," would allow users to minimize wastefulness and know precisely how much memory they're allocating.
 
 If you want to control memory allocation yourself, you can always do 
 that by allocating page-sized chunks and doing the memory management on 
 those chunks yourself.  I do something very similar in dcollections to 
 speed up allocation/destruction.
 
 <snip>
 
 I think D has deterministic allocation, and better ability than C++ to 
 make custom types that look and act like builtins.  Therefore, you can 
 make an array type that suits your needs and is almost exactly the same 
 syntax as a builtin array (except for some things reserved for builtins, 
 like literals).  Such a thing is certainly possible, even with using the 
 GC for your allocation.
That parallels what game devs do in C++: They tend to use custom allocators a lot, and they're likely to follow the same basic strategy in D too, if/when it becomes a suitable replacement. I'm still just browsing though, and I'm not all that familiar with D. If you can't actually use the built-in dynamic arrays for this purpose, how difficult would it be to reimplement a contiguously stored dynamic container using custom allocation? I suppose you'd have to build it from the ground up using a void pointer to a custom allocated block of memory, right? Do user-defined types in D have any/many performance disadvantages compared to built-ins?
 
 BTW, I made the change to the runtime renaming the function previously 
 known as setCapacity to reserve.  It won't be a property, even if that 
 bug is fixed.
 
 -Steve
That's a bit of a downer, since a capacity property would have nice symmetry with the length property. I suppose there were good reasons though. Considering the name change, does that mean reserve can only reserve new space, i.e. it can't free any that's already been allocated? (That makes me wonder: Out of curiosity, how does the garbage collector know how much space is allocated to a dynamic array or especially to a void pointer? I suppose it's registered somewhere?)
Mar 31 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 01 Apr 2010 01:41:02 -0400, Mike S  
<mikes notarealaddresslololololol.com> wrote:

 Steven Schveighoffer wrote:
   > What do you mean by nondeterministic?  It's very deterministic, just  
 not
 always easy to determine ;)  However, given enough context, it's really  
 easy to determine.
When I say deterministic, I'm referring to determinism from the user's point of view, where the allocation behavior is affected solely by the parameter (the size request, e.g. 10000 objects) and not by some kind of internal state, hidden context, or arcane black magic. :p
Its abstracted to the GC, but the current GC is well defined. If you request to allocate blocks with length of a power of 2 under a page, you will get exactly that length, all the way down to 16 bytes. If you request to allocate a page or greater, you get a contiguous block of memory that is a multiple of a page. With that definition, is the allocator deterministic enough for your needs?
   > The amount of memory given is determined by the GC, and ultimately by
 the OS.  The currently supported OSes allocate in Page-sized chunks, so  
 when you allocate any memory from the OS, you are allocating a page  
 (4k).  Most likely, you may not need a whole page for the data you are  
 allocating, so the GC gives you more finely sized chunks by breaking up  
 a page into smaller pieces.  This strategy works well in some cases,  
 and can be wasteful in others.  The goal is to strike a balance that is  
 "good enough" for everyday programming, but can be specialized when you  
 need it.
That's understandable, and it makes sense that the actual memory being allocated would correspond to some chunk size. It's really just opaque black box behavior that poses a problem; if users are given well-defined guidelines and chunk sizes, that would work just fine. For instance, a spec like, "reserve a multiple of 512 bytes and that's exactly what you will be given," would allow users to minimize wastefulness and know precisely how much memory they're allocating.
I think in the interest of allowing innovative freedom, such requirements should be left up to the GC implementor, not the spec or runtime. Anyone who wants to closely control memory usage should just understand how the GC they are using works.
  If you want to control memory allocation yourself, you can always do  
 that by allocating page-sized chunks and doing the memory management on  
 those chunks yourself.  I do something very similar in dcollections to  
 speed up allocation/destruction.
  <snip>
  I think D has deterministic allocation, and better ability than C++ to  
 make custom types that look and act like builtins.  Therefore, you can  
 make an array type that suits your needs and is almost exactly the same  
 syntax as a builtin array (except for some things reserved for  
 builtins, like literals).  Such a thing is certainly possible, even  
 with using the GC for your allocation.
That parallels what game devs do in C++: They tend to use custom allocators a lot, and they're likely to follow the same basic strategy in D too, if/when it becomes a suitable replacement. I'm still just browsing though, and I'm not all that familiar with D. If you can't actually use the built-in dynamic arrays for this purpose, how difficult would it be to reimplement a contiguously stored dynamic container using custom allocation? I suppose you'd have to build it from the ground up using a void pointer to a custom allocated block of memory, right? Do user-defined types in D have any/many performance disadvantages compared to built-ins?
No, you would most likely use templates, not void pointers. D's template system is far advanced past C++, and I used it to implement my custom allocators. It works great. User-defined types are as high performance as builtins as long as the compiler inlines properly.
  BTW, I made the change to the runtime renaming the function previously  
 known as setCapacity to reserve.  It won't be a property, even if that  
 bug is fixed.
  -Steve
That's a bit of a downer, since a capacity property would have nice symmetry with the length property. I suppose there were good reasons though. Considering the name change, does that mean reserve can only reserve new space, i.e. it can't free any that's already been allocated?
Capacity still exists as a read-only property. I did like the symmetry, but the point was well taken that the act of setting the capacity was not exact. It does mean that reserving space can only grow, not shrink. In fact, the capacity property calls the same runtime function as reserve, just passing 0 as the amount requested to get the currently reserved space. You can't use capacity to free space because that could result in dangling pointers. Freeing space is done through the delete keyword. We do not want to make it easy to accidentally free space.
   (That makes me wonder:  Out of curiosity, how does the garbage  
 collector know how much space is allocated to a dynamic array or  
 especially to a void pointer?  I suppose it's registered somewhere?)
The GC can figure out what page an interior pointer belongs to, and therefore how much memory that block uses. There is a GC function to get the block info of an interior pointer, which returns a struct that contains the pointer to the block, the length of the block, and its flags (whether it contains pointers or not). This function is what the array append feature uses to determine how much capacity can be used. I believe this lookup is logarithmic in complexity. -Steve
Apr 01 2010
parent reply Mike S <mikes notarealaddresslololololol.com> writes:
Steven Schveighoffer wrote:
 Its abstracted to the GC, but the current GC is well defined.  If you 
 request to allocate blocks with length of a power of 2 under a page, you 
 will get exactly that length, all the way down to 16 bytes.  If you 
 request to allocate a page or greater, you get a contiguous block of 
 memory that is a multiple of a page.
 
 With that definition, is the allocator deterministic enough for your needs?
With respect to the current GC, yup. :) Sticking to powers of 2 (or integral numbers of pages) >= 16 bytes is an easy enough rule. Of course, I'm just starting out, so experienced game programmers might disagree, but it sounds perfectly reasonable to me. I suppose the abstraction makes it a QOI though, so depending on the compiler and GC used in the future it could become a question again. As it stands, I'm less interested in its precise current state and more interested in keeping an eye on its direction and how the spec/compiler/tools will mature over the next few years. > I think in the interest of allowing innovative freedom, such
 requirements should be left up to the GC implementor, not the spec or 
 runtime.  Anyone who wants to closely control memory usage should just 
 understand how the GC they are using works.
It's definitely a tradeoff...leaving things unspecified opens the door to innovation and better implementations, but it simultaneously raises issues of inconsistent behavior across multiple platforms where the same compiler/GC might not be able to be used. To give an extreme example, look where unspecified behavior regarding static initialization order got C++! ;) I guess I'll just have to see where things go over time, but the predictable allocation with the current GC is a good sign at least.
 No, you would most likely use templates, not void pointers.  D's 
 template system is far advanced past C++, and I used it to implement my 
 custom allocators.  It works great.
 
 User-defined types are as high performance as builtins as long as the 
 compiler inlines properly.
D's template system is pretty intriguing. I don't really know anything about it, but I've read that it's less intimidating and tricky than C++'s, and that can only be a good thing for people wanting to harness its power.
 Capacity still exists as a read-only property.  I did like the symmetry, 
 but the point was well taken that the act of setting the capacity was 
 not exact.  It does mean that reserving space can only grow, not 
 shrink.  In fact, the capacity property calls the same runtime function 
 as reserve, just passing 0 as the amount requested to get the currently 
 reserved space.
 
 You can't use capacity to free space because that could result in 
 dangling pointers.  Freeing space is done through the delete keyword.  
 We do not want to make it easy to accidentally free space.
It's a shame the asymmetry was necessary, but I agree it makes sense.
 The GC can figure out what page an interior pointer belongs to, and 
 therefore how much memory that block uses.  There is a GC function to 
 get the block info of an interior pointer, which returns a struct that 
 contains the pointer to the block, the length of the block, and its 
 flags (whether it contains pointers or not).  This function is what the 
 array append feature uses to determine how much capacity can be used.  I 
 believe this lookup is logarithmic in complexity.
 
 -Steve
Thanks!
Apr 01 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 01 Apr 2010 11:01:38 -0400, Mike S  
<mikes notarealaddresslololololol.com> wrote:

 I suppose the abstraction makes it a QOI though, so depending on the  
 compiler and GC used in the future it could become a question again.  As  
 it stands, I'm less interested in its precise current state and more  
 interested in keeping an eye on its direction and how the  
 spec/compiler/tools will mature over the next few years.
As an aside, because the GC is implemented all in runtime, you can swap out the GC for something else that suits your needs. Therefore, the issue of having to deal with different GCs on different platforms can be circumvented by inventing a GC that works the way you want on all platforms :) Also, I don't see the major attributes of the GC changing anytime soon. -Steve
Apr 01 2010
prev sibling parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:
 
 Some sort of "resetAndReuse" function to clear an array, but enabling 
 to reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal 
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old 
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This can 
 be implemented by enlarging the array's memory block without touching 
 any length fields.

 I'm sure the function you had in mind does one of those things or both.
proposed usage (as checked in a couple days ago): int[] a; a.setCapacity(10000); // pre-allocate at least 10000 elements. foreach(i; 0..10000) a ~= i; // no reallocation a.length = 100; a.shrinkToFit(); // resize "allocated" length to 100 elements a ~= 5; // no reallocation.
What shrinkToFit() does is not really clear. Does it reallocate the memory block of the array such, that no space is wasted? Or does it provide (almost) the same functionality as my resetAndReuse(), and make the superfluous trailing memory available for appending without reallocation? I think a resetAndReuse is really needed. I have found it can prevent "GC thrashing" in many cases. E.g. when caching frequently re-evaluated data in form of arrays (free previous array, then allocate array that isn't larger than the previous array).
Mar 04 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Mar 2010 12:43:32 -0500, grauzone <none example.net> wrote:

 Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:

 Some sort of "resetAndReuse" function to clear an array, but enabling  
 to reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal  
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old  
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This can  
 be implemented by enlarging the array's memory block without touching  
 any length fields.

 I'm sure the function you had in mind does one of those things or both.
proposed usage (as checked in a couple days ago): int[] a; a.setCapacity(10000); // pre-allocate at least 10000 elements. foreach(i; 0..10000) a ~= i; // no reallocation a.length = 100; a.shrinkToFit(); // resize "allocated" length to 100 elements a ~= 5; // no reallocation.
What shrinkToFit() does is not really clear. Does it reallocate the memory block of the array such, that no space is wasted? Or does it provide (almost) the same functionality as my resetAndReuse(), and make the superfluous trailing memory available for appending without reallocation?
Sorry, should have added: assert(a.length == 101); Basically, shrinkToFit shrinks the "allocated" space to the length of the array. To put it another way, you could write your resetAndReuse function as follows: void resetAndReuse(T)(ref T[] arr) { arr.length = 0; arr.shrinkToFit(); } I want to avoid assuming that shrinking the length to 0 is the only usable idiom.
 I think a resetAndReuse is really needed. I have found it can prevent  
 "GC thrashing" in many cases. E.g. when caching frequently re-evaluated  
 data in form of arrays (free previous array, then allocate array that  
 isn't larger than the previous array).
Yes, it is useful in such cases. The only questionable part about it is that it allows for stomping in cases like: auto str = "hello".idup; auto str2 = str[3..4]; str2.shrinkToFit(); str2 ~= "a"; assert(str == "hella"); So it probably should be marked as unsafe. Again, the name shrinkToFit isn't my favorite, ideas welcome. -Steve
Mar 04 2010
parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 12:43:32 -0500, grauzone <none example.net> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:

 Some sort of "resetAndReuse" function to clear an array, but 
 enabling to reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal 
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old 
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This 
 can be implemented by enlarging the array's memory block without 
 touching any length fields.

 I'm sure the function you had in mind does one of those things or both.
proposed usage (as checked in a couple days ago): int[] a; a.setCapacity(10000); // pre-allocate at least 10000 elements. foreach(i; 0..10000) a ~= i; // no reallocation a.length = 100; a.shrinkToFit(); // resize "allocated" length to 100 elements a ~= 5; // no reallocation.
What shrinkToFit() does is not really clear. Does it reallocate the memory block of the array such, that no space is wasted? Or does it provide (almost) the same functionality as my resetAndReuse(), and make the superfluous trailing memory available for appending without reallocation?
Sorry, should have added: assert(a.length == 101); Basically, shrinkToFit shrinks the "allocated" space to the length of the array. To put it another way, you could write your resetAndReuse function as follows: void resetAndReuse(T)(ref T[] arr) { arr.length = 0; arr.shrinkToFit(); } I want to avoid assuming that shrinking the length to 0 is the only usable idiom.
Ah, great.
 I think a resetAndReuse is really needed. I have found it can prevent 
 "GC thrashing" in many cases. E.g. when caching frequently 
 re-evaluated data in form of arrays (free previous array, then 
 allocate array that isn't larger than the previous array).
Yes, it is useful in such cases. The only questionable part about it is that it allows for stomping in cases like: auto str = "hello".idup; auto str2 = str[3..4]; str2.shrinkToFit(); str2 ~= "a"; assert(str == "hella"); So it probably should be marked as unsafe.
Doesn't it conform to Andrei's ideas about memory safety? It can stomp over other data, but it can't be used to subvert the type system. Also, it's not the default behavior: if you don't use this function, stomping can never happen. But it must be disabled for arrays of immutable types (i.e. strings).
 Again, the name shrinkToFit isn't my favorite, ideas welcome.
stompOnAppend()? uniqueSlice()? trashRemainder()? I don't know either, all sound a bit silly. PS: I'm glad to hear that your patch is supposed to be included in the next release. Still waiting for dsimcha's one.
 -Steve
Mar 04 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 04 Mar 2010 13:38:30 -0500, grauzone <none example.net> wrote:

 Steven Schveighoffer wrote:

  So it probably should be marked as unsafe.
Doesn't it conform to Andrei's ideas about memory safety? It can stomp over other data, but it can't be used to subvert the type system.
I think the idea is that anything that *could* result in undefined behavior is marked as unsafe. It just means it's not available from safe functions. It still can be called from safe functions via a trusted wrapper. I'd call it undefined behavior, because the memory is effectively released. Future GCs may make assumptions about that unallocated space that would cause problems. I feel like safeD has a slight lean towards sacrificing performance for the sake of memory safety. This is not a bad thing, and in most cases inconsequential.
 Also, it's not the default behavior: if you don't use this function,  
 stomping can never happen.
What's nice about capturing it into a function is you *can* classify it as unsafe, versus the current method which is harder to detect. I'd categorize it like casting.
 But it must be disabled for arrays of immutable types (i.e. strings).
This is tricky, you could have partially immutable types. The function would have to examine the type of all the contained items and make sure there were no immutable bytes in the element. Plus, it's not truly unsafe to use on immutable types if the given array is the only reference to that data. You could potentially implement a stack using builtin array appending and shrinkToFit, even on immutable elements (although I'd advise against it :)
 Again, the name shrinkToFit isn't my favorite, ideas welcome.
stompOnAppend()? uniqueSlice()? trashRemainder()? I don't know either, all sound a bit silly.
I think I like shrinkToFit better than all those :) I still like minimize the best, but the objection was that it has mathematical connotations. -Steve
Mar 04 2010