digitalmars.D - Feature suggestion: in-place append to array

KF (11/11) Mar 04 2010 I hope this is the right place to post it.

Steven Schveighoffer (10/16) Mar 04 2010 No, it does not create a copy. a = a ~ e always creates a copy, a ~= e ...

grauzone (23/42) Mar 04 2010 Some sort of "resetAndReuse" function to clear an array, but enabling to...

Steven Schveighoffer (25/47) Mar 04 2010 proposed usage (as checked in a couple days ago):

Clemens (6/8) Mar 04 2010 I would prefer the name reserve(). It has precedent in the STL, and the ...

Steven Schveighoffer (12/23) Mar 04 2010 You are correct, setCapacity ensures that *at least* the given number of...

Mike S (22/39) Mar 31 2010 Sorry if resurrecting this thread is against netiquette, but it caught

bearophile (4/5) Mar 31 2010 It's a nice read. I don't see them switching to D soon. If you whisper t...

Mike S (45/52) Mar 31 2010 Hah...well, there's a reason I'm still just looking into D rather than

bearophile (28/35) Mar 31 2010 I am not able to tell the future. Some parts of D design are already old...

Mike S (80/117) Mar 31 2010 Yeah, you're right about the demanding tool and maturity requirements

bearophile (10/17) Apr 01 2010 People will try to use D2 for this purpose too, for example to teaching ...

Mike S (13/17) Apr 01 2010 I figured the eager ones wouldn't be a problem, but I wondered whether

Andrei Alexandrescu (21/47) Apr 01 2010 The book is finished and is on schedule. It's been out of my hands for a...

bearophile (39/42) Apr 01 2010 Walter doesn't want to change octals, so I think it's a waste of time to...

bearophile (6/12) Apr 01 2010 From what I've seen, D3 will probably be backwards compatible with D2.

Steven Schveighoffer (27/60) Mar 31 2010 What do you mean by nondeterministic? It's very deterministic, just not...

Mike S (32/62) Mar 31 2010 Steven Schveighoffer wrote:

Steven Schveighoffer (33/90) Apr 01 2010 Its abstracted to the GC, but the current GC is well defined. If you

Mike S (24/59) Apr 01 2010 With respect to the current GC, yup. :) Sticking to powers of 2 (or

Steven Schveighoffer (9/14) Apr 01 2010 As an aside, because the GC is implemented all in runtime, you can swap ...

grauzone (10/50) Mar 04 2010 What shrinkToFit() does is not really clear. Does it reallocate the

Steven Schveighoffer (23/71) Mar 04 2010 Sorry, should have added:

grauzone (11/93) Mar 04 2010 Doesn't it conform to Andrei's ideas about memory safety? It can stomp

Steven Schveighoffer (22/32) Mar 04 2010 I think the idea is that anything that *could* result in undefined

KF <kfleszar gmail.com> writes:

I hope this is the right place to post it.
In my work, I often need to add elements at the end of dynamic arrays and
remove them from the end. This incremental changes would most
conveniently be performed by a~=e for addition of e at the end of a, and say
a=a[0..$-1]. Unfortunately, ~= always creates a copy and
is thus too time consuming. What I would like to suggest is either a new
operator, ~~=, in-place append that does not create a copy if
possible. Alternatively, new properties/methods could be defined:
- a.push(element or array) - add elements at the end of the array, without
copying if possible
- a.pop(x) - remove x elements from the end of the array
- a.capacity - get or set the actual length allocated for the array
Currently, I am using appender(a).put from algorithm module, but it is very
obscure and makes code hard to understand.
Cheers,
KF

Mar 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Mar 2010 04:02:46 -0500, KF <kfleszar gmail.com> wrote:

 I hope this is the right place to post it.
 In my work, I often need to add elements at the end of dynamic arrays  
 and remove them from the end. This incremental changes would most
 conveniently be performed by a~=e for addition of e at the end of a, and  
 say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
 is thus too time consuming.

No, it does not create a copy.  a = a ~ e always creates a copy, a ~= e  
appends in place if possible.  This still will be slower than appender,  
but in the next release of DMD it will be much faster than the current  
release.

With the next release of DMD, a = a[0..$-1] will shrink the array, but the  
next append to a will reallocate.  There will be a function to get around  
this, but use it only if you know that the data removed from the end isn't  
referenced anywhere else.

-Steve

Mar 04 2010

grauzone <none example.net> writes:

Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 04:02:46 -0500, KF <kfleszar gmail.com> wrote:
 
 I hope this is the right place to post it.
 In my work, I often need to add elements at the end of dynamic arrays 
 and remove them from the end. This incremental changes would most
 conveniently be performed by a~=e for addition of e at the end of a, 
 and say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
 is thus too time consuming.

 
 No, it does not create a copy.  a = a ~ e always creates a copy, a ~= e 
 appends in place if possible.  This still will be slower than appender, 
 but in the next release of DMD it will be much faster than the current 
 release.
 
 With the next release of DMD, a = a[0..$-1] will shrink the array, but 
 the next append to a will reallocate.  There will be a function to get 
 around this, but use it only if you know that the data removed from the 
 end isn't referenced anywhere else.

Some sort of "resetAndReuse" function to clear an array, but enabling to 
reuse the old memory would be nice:

int[] a = data;
a = null;
a ~= 1; //reallocates (of course)
a.length = 0;
a ~= 1; //will reallocate (for safety), used not to reallocate
resetAndReuse(a);
assert(a.length == 0);
a ~= 1; //doesn't reallocate

This can be implemented by setting both the slice and the internal 
runtime length fields to  0.

Additionally, another function is necessary to replace the old 
preallocation trick:

//preallocate 1000 elements, but don't change actual slice length
auto len = a.length;
a.length = len + 1000;
a.length = len;

As I understood it, this won't work anymore after the change. This can 
be implemented by enlarging the array's memory block without touching 
any length fields.

I'm sure the function you had in mind does one of those things or both.

 -Steve

Mar 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:

 Some sort of "resetAndReuse" function to clear an array, but enabling to  
 reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal  
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old  
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This can  
 be implemented by enlarging the array's memory block without touching  
 any length fields.

 I'm sure the function you had in mind does one of those things or both.

proposed usage (as checked in a couple days ago):

int[] a;
a.setCapacity(10000); // pre-allocate at least 10000 elements.
foreach(i; 0..10000)
    a ~= i; // no reallocation
a.length = 100;
a.shrinkToFit(); // resize "allocated" length to 100 elements
a ~= 5; // no reallocation.

I will probably add a function to do the preallocation of a new array  
without having to write two statements.  Also, one cool thing about this  
that was not available before is that increasing the capacity will not  
initialize the new data as long as "no pointers" flag is set.  For  
pointer-containing elements, the contents must be initialized to zero to  
prevent false positives during collection.

Note that setCapacity is supposed to be a property (i.e. a.capacity =  
10000), but the compiler doesn't support this, I added a bug (feature)  
request for this (3857).

There is also a readable capacity property to get the number of elements  
the array could grow to without reallocation, a feature that may be useful  
for tuning.

Also the name "shrinkToFit" may not be final :)  I wanted to call it  
minimize, but there were objections, and shrinkToFit is easy to  
search/replace later.

-Steve

Mar 04 2010

Clemens <eriatarka84 gmail.com> writes:

Steven Schveighoffer Wrote:

 int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.

I would prefer the name reserve(). It has precedent in the STL, and the method
doesn't actually always set the capacity, only if the current capacity is less
then the argument. Even then it may conceivably allocate more than was asked
for.

In other words, the name setCapacity suggests this will always succeed:

a.setCapacity(n);
assert(a.capacity == n);

...which is not the case as far as I can tell.

Mar 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Mar 2010 12:29:25 -0500, Clemens <eriatarka84 gmail.com> wrote:

 Steven Schveighoffer Wrote:

 int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.

 I would prefer the name reserve(). It has precedent in the STL, and the  
 method doesn't actually always set the capacity, only if the current  
 capacity is less then the argument. Even then it may conceivably  
 allocate more than was asked for.

 In other words, the name setCapacity suggests this will always succeed:

 a.setCapacity(n);
 assert(a.capacity == n);

 ...which is not the case as far as I can tell.

You are correct, setCapacity ensures that *at least* the given number of  
elements will be available for appending.

I planned on making the function a property (but a bug would not allow  
that), the original intended usage was:

a.capacity = 10000;

Reserve doesn't work in this context.  Can you come up with a name that  
does?

I'll bring up reserve (as a function) as an alternative on the phobos  
mailing list, and see what people say.  I kind of liked the setter/getter  
idea, but you make a good point.

-Steve

Mar 04 2010

Mike S <mikes notarealaddresslololololol.com> writes:

Steven Schveighoffer wrote:
 
 You are correct, setCapacity ensures that *at least* the given number of 
 elements will be available for appending.
 
 I planned on making the function a property (but a bug would not allow 
 that), the original intended usage was:
 
 a.capacity = 10000;
 
 Reserve doesn't work in this context.  Can you come up with a name that 
 does?
 
 I'll bring up reserve (as a function) as an alternative on the phobos 
 mailing list, and see what people say.  I kind of liked the 
 setter/getter idea, but you make a good point.
 
 -Steve

Sorry if resurrecting this thread is against netiquette, but it caught 
my eye, and this is my first newsgroup post in years. ;)

Anyway, is there any compelling reason why setCapacity or modifying 
a.capacity should allocate a nondeterministic amount of storage?

Depending on the application, programmers might require strict control 
over memory allocation patterns and strict accounting for allocated 
memory.  Game programmers, especially console game programmers, tend to 
strongly prefer deterministic allocation patterns, and nondeterminism is 
one of the [several] common complaints about the C++ STL 
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html is a 
good resource on these kind of issues).  In the case of D (which I'm 
considering learning), this is especially important for dynamic arrays, 
partly because they're so useful by themselves, and partly because they 
may form the backbone of custom containers.

Whereas it's easy to add "smart nondeterministic" behavior to a 
deterministic setCapacity function by providing a wrapper, ordinary 
language users can't do the opposite.  Because of this, and because 
dynamic arrays are so central to the D language, a nondeterministic 
setCapacity function may deter game programmers, especially console 
programmers, from adopting D.

Assuming you see this post, what are your thoughts here?

Mar 31 2010

bearophile <bearophileHUGS lycos.com> writes:

Mike S:
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html

It's a nice read. I don't see them switching to D soon. If you whisper them
that D is based on a GC they will run away screaming :-)

Bye,
bearophile

Mar 31 2010

Mike S <mikes notarealaddresslololololol.com> writes:

bearophile wrote:
 Mike S:
 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html

 
 It's a nice read. I don't see them switching to D soon. If you whisper them
that D is based on a GC they will run away screaming :-)
 
 Bye,
 bearophile

Hah...well, there's a reason I'm still just looking into D rather than 
diving in headfirst! :p  Actually though, I do believe the needs of game 
programmers should be taken seriously while considering D's evolution: 
Right now, D hasn't completely found its niche, but it seems to position 
itself as a sane successor to C++ for systems-level programming.  As it 
stands, I believe there are only two major kinds of programmers who 
still use C++, and those are game programmers and masochists. ;)  Odds 
are, D won't be replacing C anytime soon for operating system kernels 
and such.  It's too low-level for scripting tasks, and most website 
designers and non-real-time applications programmers use higher-level 

unlikely to go back.  I think D will eventually be used for writing 
other heavy-duty non-OS frameworks and software systems, but if it's 
really going to become the successor to C++, it's going to have to 
absorb most of C++'s user base...and that includes game programmers.

You're right that the garbage collector is a major issue - probably the 
biggest one inherent to the language design - but I haven't determined 
it's a dealbreaker, at least not yet.  After all, D also allows manual 
memory management, and freeing memory early apparently helps speed 
things up anyway 
(http://stackoverflow.com/questions/472133/turning-off-the-d
garbage-collector). 
  That part helps ensure control over how much memory is used/available, 
and the only other issue with the garbage collector is reconciling its 
running time with the soft real-time constraint that games have to 
satisfy.  I can think of a few tactics which should help here:
1.)  In addition to permitting better reasoning about memory 
allocations, freeing most memory manually should reduce the load on the 
garbage collector and reduce its runtime, right?
2.)  On the simulation side of the game engine, I believe a constant 
timestep promotes a more robust design, and that means some frames 
(relatively idle ones) will have plenty of CPU time left over.  If you 
can explicitly call the garbage collector to make it run during those 
times instead of at nondeterministic times (can you?), you can maintain 
a smooth framerate without any GC-induced spikes.
3.)  Can user code execute simultaneously with the GC in other threads 
(on other cores), or does the GC halt the entire program for safety 
reasons?  Assuming simultaneous threaded execution is permitted, it 
would also dramatically reduce the GC's impact on multi-core systems.

Assuming these strategies work, the garbage collector by itself 
shouldn't be a showstopper.  In the case of dynamic arrays, resizing 
capacity deterministically is one of those small things that would be 
really helpful to anal game programmers, and it probably wouldn't hurt 
anyone else, either.  Plus, it's easier to implement than "smart 
nondeterministic" resizing anyway. :)

Mar 31 2010

bearophile <bearophileHUGS lycos.com> writes:

Mike S:

the needs of game programmers should be taken seriously while considering D's
evolution:<

The short D1 history shows that designers of small games are willing to use D.
Some game designers seem almost desperate to find an usable language simpler
than C++. So I agree with you that D2 can be designed keeping an eye at game
designers too. But that's very demanding people, it's not easy to satisfy them
even with a mature language + compiler + std lib + dev tools. And currently
nothing in D2 is mature. For them maybe not even the most mature thing you can
find in D world, the back-end of ldc (llvm), is mature enough :-)


Right now, D hasn't completely found its niche, but it seems to position itself
as a sane successor to C++ for systems-level programming.<

I am not able to tell the future. Some parts of D design are already old-style:
- Some early design decisions make hard to inline D virtual functions (so if
you write D code in Java style, you see a significant slow down compared to
similar Java code running with HotSpot). So far no one seems to care of this,
we'll see if I am right to see a problem here;
- Some of D unsafe characteristics are worked on to improve their safety, but
there's lot of road to travel still, for example null-safety and
integers-overflow-safety are far away still. People are trying to explain
Walter still why null-safety has some importance.
- D2 defaults to mutables. This can be acceptable, I don't know;
- Currently D2 is not designed from the start to work with an IDE (but I think
this problem can be fixed with not too much work);
- The built-in unit testing and documentation are not fit for professional
usage (but the documentation is easy to extend because they are just comments,
so it's a smaller problem).
- etc.

A system language is something that you can use to write very small binaries,
that can be used to write a kernel like Linux, device drivers for a smaller
computer+CPU, etc. Such things are hard to do in D2, I don't see Linus using D2
to write his kernel, he even thinks C++ is unfit. So I see D2 more like a
"low-level application language", on a level located somewhere between C and



As it stands, I believe there are only two major kinds of programmers who still
use C++, and those are game programmers and masochists. ;)<

There's also an army of legacy programmers that have to update and debug tons
of C++ code. Part of the human society works thanks to a mountain of C++ code.
Online programming competitions are usually won by C++ code.
People will find ways to use C++ for many more years, it will probably outlast
us all.


It's too low-level for scripting tasks,<

I have asked several times to have Python-style array/lazy comprehensions in D
:-) They help. I think their introduction can reduce by 10-30% the length of D2
programs.


I think D will eventually be used for writing other heavy-duty non-OS
frameworks and software systems,<

From what I've seen so far I think D2 will appeal to some numerics folks too,
so it can eat a bit of Fortran pie too. Some improvements can make D2 more
appealing to them, Don is working on this too. (Some ideas from Chapel language
can help here, but I think no one here has taken a serious look at it so far).


You're right that the garbage collector is a major issue - probably the biggest
one inherent to the language design - but I haven't determined it's a
dealbreaker, at least not yet.<

The situation with the D GC is interesting.
First of all D GC is not refined, Java VM GCs are way more advanced. So D GC
will need a much more modern GC.

Another problem is that the current D GC is quite imprecise, this causes leaks
when you use it in real programs that have to run for more than few minutes.
Part of this problem can be solved using a better GC that's more precise (this
can slow it down a bit, but avoids a good amount of memory leaks).

The other problem is intrinsic of the language, that makes it hard or
impossible to invent a fully precise GC for D.

And D makes it hard to use a modern generational moving GC with D. You can't
just adopt a JavaVM GC with D. Even the Mono GC (that knows the notion of
pinned/unpinned memory) can be unfit (because it's designed for mostly unpinned
memory). This is partially caused by D being a low level language with
pointers, and it's partially caused by D2 type system unable to tell apart:
1) hand-managed pointers, to GC memory or C heap memory;
2) GC-managed pointers to pinned memory;
3) GC-managed pointers to unpinned memory.
I think Walter think that telling them apart in the language makes D too much
complex, and he can be right. But the current situation makes it hard to design
a very efficient GC for D. So I don't think high-performance game designers
will use D GC for the next few years, they will manage most or all the memory
manually. I am ignorant, but I think D designers have to work a little harder
in finding ways to allocate unpinned objects. This (with a refined GC able to
move unpinned memory, that keeps a stack-like Eden plus two or three
generations of objects) can help a lot for programs written in Java-style.

But computer science history has shown that if enough people work on a problem
they can often find some partial solution. At the beginning Java was very slow.
So there's a bit of hope still. Of course enough GC experts will work on the D
GC only if D will have some success.


In the case of dynamic arrays, resizing capacity deterministically is one of
those small things that would be really helpful to anal game programmers, and
it probably wouldn't hurt anyone else, either.  Plus, it's easier to implement
than "smart nondeterministic" resizing anyway. :)<

Don't nail your mind only on that problem, that's only one of many problems.
You can think that dynamic arrays are simple, but from what I've seen there's
nothing simple in D dynamic arrays, people have found a way to improve them a
little only now, after years of discussions about them, and I am not fully sure
yet the recent changes are worth it, I mean I am not sure yet that the current
D2 arrays are better than the older ones + an appender helper struct. There is
no good benchmarking suite yet to test if they are an improvement.

Bye,
bearophile

Mar 31 2010

Mike S <mikes notarealaddresslololololol.com> writes:

bearophile wrote:
 The short D1 history shows that designers of small games are willing to use D.
Some game designers seem almost desperate to find an usable language simpler
than C++. So I agree with you that D2 can be designed keeping an eye at game
designers too. But that's very demanding people, it's not easy to satisfy them
even with a mature language + compiler + std lib + dev tools. And currently
nothing in D2 is mature. For them maybe not even the most mature thing you can
find in D world, the back-end of ldc (llvm), is mature enough :-)
 

Yeah, you're right about the demanding tool and maturity requirements 
that game studios have, but assuming people continue working on D and 
other people adopt it and enhance the tools, those things will flesh out 
over time.  I'm young enough that I look forward to seeing it overtake 
C++ in the game world someday.

 
 
 I am not able to tell the future. Some parts of D design are already old-style:
 - Some early design decisions make hard to inline D virtual functions (so if
you write D code in Java style, you see a significant slow down compared to
similar Java code running with HotSpot). So far no one seems to care of this,
we'll see if I am right to see a problem here;

Well, writing code Java-style is certainly no problem for game devs, 
considering they already minimize virtual function usage, at least in 
lower code layers. ;)

 <snip>
 
 A system language is something that you can use to write very small binaries,
that can be used to write a kernel like Linux, device drivers for a smaller
computer+CPU, etc. Such things are hard to do in D2, I don't see Linus using D2
to write his kernel, he even thinks C++ is unfit. So I see D2 more like a
"low-level application language", on a level located somewhere between C and


This is true, but I do recall seeing an executable size comparison 
somewhere, and the D version of a program (hello world?) beat out the 
C++ version by about a factor of two.  The C version killed both, but 
still, perhaps D might not be eternally unfit even if C++ is. ;)  Then 
again, maybe the C++ program was just including a superfluous amount of 
library code, and maybe D programs are generally larger than their C++ 
equivalents.  Plus, if it was hello world, it obviously wasn't using a 
lot of higher-level features.

Either way though, even if D does become fit for kernel/driver code 
someday, it'll still be a long time before someone actually starts from 
scratch to write a new kernel using it anyway.

 
 As it stands, I believe there are only two major kinds of programmers who
still use C++, and those are game programmers and masochists. ;)<

 
 There's also an army of legacy programmers that have to update and debug tons
of C++ code. Part of the human society works thanks to a mountain of C++ code.
Online programming competitions are usually won by C++ code.
 People will find ways to use C++ for many more years, it will probably outlast
us all.

You're right, and I actually realized I misspoke here a little bit ago 
while I was eating.  The legacy code just might keep people using C++ 
until the sun dies.  Still, I maintain that game developers and 
masochists probably comprise a large portion of programmers starting 
completely new projects in C++. :D

 
 It's too low-level for scripting tasks,<

 
 I have asked several times to have Python-style array/lazy comprehensions in D
:-) They help. I think their introduction can reduce by 10-30% the length of D2
programs.

How difficult do you think that would be for the compiler devs to 
implement in the semantic sense?  Assuming it can be done without major 
hardship or compromising the design of the language, that would be 
really cool.  Syntactically speaking, Python list comprehensions make 
the source so much more compact, expressive, and clean that a statically 
compiled language using them would really stand out.  If they're 
implemented correctly, I can't see any reasons why the syntactic sugar 
would be any slower than spelling everything out explicitly, either. 
The syntax would have to be a bit different to feel at home in D, but 
the idea itself probably isn't too foreign.

I also noticed a discussion about Python tuples from October 2009 I 
think, and native tuples in D would also be useful...more useful than in 
Python, in fact.  After all, Python lists can contain mixed types 
(unlike arrays in D), so they make tuples largely redundant except for 
their different conventional meanings (and except for the ability to 
used named tuples).  In comparison, built-in tuples in D with similarly 
elegant syntax would fill in a much larger gap.  I suppose they'd work 
something like implicitly generated struct types, which could be hastily 
constructed as lvalues or rvalues, returned from functions, 
packed/unpacked and passed to functions, etc.  Honestly, I think there's 
a lot to be learned from the expressiveness of scripting languages 
(especially Python, given its elegant syntax without all of the Perl/PHP 

applied to statically compiled languages without speed hits or design 
compromises.

  > From what I've seen so far I think D2 will appeal to some numerics 
folks too, so it can eat a bit of Fortran pie too. Some improvements can 
make D2 more appealing to them, Don is working on this too. (Some ideas 
from Chapel language can help here, but I think no one here has taken a 
serious look at it so far).

It interested me when you mentioned numerics above, because game engine 
design is also moving more in the direction of dataflow-oriented 
programming, where the object hierarchy is more streamlined and there's 
more of a focus on transforming one type of data to another.  This helps 
with both cache efficiency and exposing data-level parallelism.  With 
all of the vector and matrix math involved in those transformation steps 
(physics engines, etc.), the same improvements that appeal to 
FORTRAN/numerics programmers might also find some use cases here.  Of 
course, I could be talking out of my ass, since I don't quite know what 
D 2.0 improvements you're referring to, but I imagine they might apply.

 The situation with the D GC is interesting.
 First of all D GC is not refined, Java VM GCs are way more advanced. So D GC
will need a much more modern GC.
 
 Another problem is that the current D GC is quite imprecise, this causes leaks
when you use it in real programs that have to run for more than few minutes.
Part of this problem can be solved using a better GC that's more precise (this
can slow it down a bit, but avoids a good amount of memory leaks).
 
 The other problem is intrinsic of the language, that makes it hard or
impossible to invent a fully precise GC for D.
 
 And D makes it hard to use a modern generational moving GC with D. You can't
just adopt a JavaVM GC with D. Even the Mono GC (that knows the notion of
pinned/unpinned memory) can be unfit (because it's designed for mostly unpinned
memory).. This is partially caused by D being a low level language with
pointers, and it's partially caused by D2 type system unable to tell apart:
 1) hand-managed pointers, to GC memory or C heap memory;
 2) GC-managed pointers to pinned memory;
 3) GC-managed pointers to unpinned memory.
 I think Walter think that telling them apart in the language makes D too much
complex, and he can be right. But the current situation makes it hard to design
a very efficient GC for D. So I don't think high-performance game designers
will use D GC for the next few years, they will manage most or all the memory
manually. I am ignorant, but I think D designers have to work a little harder
in finding ways to allocate unpinned objects. This (with a refined GC able to
move unpinned memory, that keeps a stack-like Eden plus two or three
generations of objects) can help a lot for programs written in Java-style.

Yeah, that's really a shame about the current state of the garbage 
collector.  Any memory leaks at all are the death knell of console 
games, and they're also the death of pretty much any long-running 
application.  Are the memory leaks eternal and irrevocable, or are we 
just talking about memory that takes a long time for the garbage 
collector to figure out it should free?

That said, my interest in D is less about its state today and more about 
its state tomorrow.  I'm not planning on dying anytime soon, so I 
imagine I'll be coding for a long time...and I hope it's not just going 
to be C++, C++, and more C++ for the rest of my life!

 
 But computer science history has shown that if enough people work on a problem
they can often find some partial solution. At the beginning Java was very slow.
So there's a bit of hope still. Of course enough GC experts will work on the D
GC only if D will have some success.

That's pretty much the way I look at it too.  Assuming people don't just 
abandon D, it's only a matter of time before the genius programmers of 
the world fix the rough spots.

 
 Don't nail your mind only on that problem, that's only one of many problems.
 You can think that dynamic arrays are simple, but from what I've seen there's
nothing simple in D dynamic arrays, people have found a way to improve them a
little only now, after years of discussions about them, and I am not fully sure
yet the recent changes are worth it, I mean I am not sure yet that the current
D2 arrays are better than the older ones + an appender helper struct. There is
no good benchmarking suite yet to test if they are an improvement.

You're right:  It's just that when I saw this thread, I figured I could 
bring up this one problem of many which happens to be an easy fix. :)

 Bye,
 bearophile

Mar 31 2010

bearophile <bearophileHUGS lycos.com> writes:

Mike S:

Well, writing code Java-style is certainly no problem for game devs,<

Right. Here I was talking about D uses more in general, sorry, like young
programmers coming out of the university.


it'll still be a long time before someone actually starts from scratch to write
a new kernel using it anyway.<

People will try to use D2 for this purpose too, for example to teaching
purposes.


How difficult do you think that would be for the compiler devs to implement in
the semantic sense?  Assuming it can be done without major hardship or
compromising the design of the language, that would be really cool.<

They are easy to implement. Even the lazy ones. See ShedSkin "compiler".


I also noticed a discussion about Python tuples from October 2009 I think, and
native tuples in D would also be useful...<

We can talk about them again for D3. At the moment D2 needs less new features
and better implementation/debugging of the already present features.


I think there's a lot to be learned from the expressiveness of scripting
languages<

That was one of the original goals of D.


Are the memory leaks eternal and irrevocable, or are we just talking about
memory that takes a long time for the garbage collector to figure out it should
free?<

I am mostly talking about false pointers, values that the GC thinks are
pointers, while they are not. They can keep alive blocks of memory.


Assuming people don't just abandon D, it's only a matter of time before the
genius programmers of the world fix the rough spots.<

But there are limits in what smart people can invent/solve. So the language
designers have to work to allow them to find solutions.

Bye,
bearophile

Apr 01 2010

Mike S <mikes notarealaddresslololololol.com> writes:

bearophile wrote:
 How difficult do you think that would be for the compiler devs to implement in
the semantic sense?  Assuming it can be done without major hardship or
compromising the design of the language, that would be really cool.<

 
 They are easy to implement. Even the lazy ones. See ShedSkin "compiler".

I figured the eager ones wouldn't be a problem, but I wondered whether 
the lazy ones might be a pain.  Guess not, so cool. :)

 We can talk about them again for D3. At the moment D2 needs less new features
and better implementation/debugging of the already present features.

That's very true...I'm looking forward to Andrei's book, but I can't 
imagine how he's finishing it on schedule, considering how quickly both 
the language itself and the compiler are evolving.  If the language 
specification and reference compiler are both as incomplete, volatile, 
and partially implemented as they are now, a June release might do some 
real damage to D's reputation.

As far as D3 goes though:  Obviously nothing about it has really been 
discussed at length, but is the general idea that it would be another 
backwards-incompatible overhaul, or is the plan to make D2 the target 
for backwards compatibility from here on out?

Apr 01 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Mike S wrote:
 bearophile wrote:
 How difficult do you think that would be for the compiler devs to 
 implement in the semantic sense?  Assuming it can be done without 
 major hardship or compromising the design of the language, that would 
 be really cool.<

 They are easy to implement. Even the lazy ones. See ShedSkin "compiler".

 
 I figured the eager ones wouldn't be a problem, but I wondered whether 
 the lazy ones might be a pain.  Guess not, so cool. :)
 
 We can talk about them again for D3. At the moment D2 needs less new 
 features and better implementation/debugging of the already present 
 features.

 
 That's very true...I'm looking forward to Andrei's book, but I can't 
 imagine how he's finishing it on schedule, considering how quickly both 
 the language itself and the compiler are evolving.

The book is finished and is on schedule. It's been out of my hands for a 
while - currently in the final copyedit stage. (Walter, last chance to 
remove octal literals.) I'll publish a schedule on my website soon.

The entire genesis of TDPL and its growth in conjunction with the 
language itself, was a very interesting process (in addition to being a 
near-death experience) that I hope I'll find time to write an article 
about soon. Bottom line, the thing looks like a million 1920 bucks. I 
fail to see many ways in which the people involved could have made it 
better. That being said, it's entirely unpredictable how the book's 
going to fare. I'll be very curious.

 If the language 
 specification and reference compiler are both as incomplete, volatile, 
 and partially implemented as they are now, a June release might do some 
 real damage to D's reputation.

Not at all. Consider the state of C++, Perl, Java, or Ruby definitions 
and implementations at the time when their first "hello, world" book was 
printed. If anything, TDPL is slightly later, not earlier, than average. 
Even K&R's classic does not cover the entire language (even as it was 
defined back in the time) and has things like prototype-less calls, 
which C has since dropped like a bad habit.

 As far as D3 goes though:  Obviously nothing about it has really been 
 discussed at length, but is the general idea that it would be another 
 backwards-incompatible overhaul, or is the plan to make D2 the target 
 for backwards compatibility from here on out?

The incompatibility with D1 was a one-off thing. D2 is the flagship of D 
going forward. If you ask me, I predict that the schism of D2 from D1 
will go down as Walter's smartest gambit ever.


Andrei

Apr 01 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 The book is finished and is on schedule. It's been out of my hands for a 
 while - currently in the final copyedit stage. (Walter, last chance to 
 remove octal literals.) I'll publish a schedule on my website soon.

Walter doesn't want to change octals, so I think it's a waste of time to keep
talking about that.
There are several other small problems in D2 that deserve a look. They are
small but important things. Please talk about those.
I list most of them here in approximate order of decreasing importance:

Syntax & semantics for array assigns
http://d.puremagic.com/issues/show_bug.cgi?id=3971

[module system] Tiding up the imports
http://d.puremagic.com/issues/show_bug.cgi?id=3819

Signed lengths (and other built-in values)
http://d.puremagic.com/issues/show_bug.cgi?id=3843

[missing error] Array literal length doesn't match
Array literal assign to array of different length
http://d.puremagic.com/issues/show_bug.cgi?id=3849
http://d.puremagic.com/issues/show_bug.cgi?id=3948

opCast(bool) in classes is bug-prone
http://d.puremagic.com/issues/show_bug.cgi?id=3926

Require opEquals/opCmp in a class the defines toHash
http://d.puremagic.com/issues/show_bug.cgi?id=3844

automatic joining of adjacent strings is bad
http://d.puremagic.com/issues/show_bug.cgi?id=3827

pure/nothrow functions/delegates are a subtype of the nonpure/throw ones
http://d.puremagic.com/issues/show_bug.cgi?id=3833

const arguments/instance attributes in conditions/invariants
http://d.puremagic.com/issues/show_bug.cgi?id=3856

bool opEquals() for structs instead of int opEquals()
http://d.puremagic.com/issues/show_bug.cgi?id=3967

byte ==> sbyte
http://d.puremagic.com/issues/show_bug.cgi?id=3936
http://d.puremagic.com/issues/show_bug.cgi?id=3850

A bug-prone situation with AAs
http://d.puremagic.com/issues/show_bug.cgi?id=3825

Arguments and attributes with the same name
http://d.puremagic.com/issues/show_bug.cgi?id=3878

More useful and more clean 'is'
http://d.puremagic.com/issues/show_bug.cgi?id=3981

Those things are small breaking changes, so it's much better to think about
them sooner.
If you want I can explain better each one of them.

Bye,
bearophile

Apr 01 2010

bearophile <bearophileHUGS lycos.com> writes:

Mike S:

 I figured the eager ones wouldn't be a problem, but I wondered whether 
 the lazy ones might be a pain.  Guess not, so cool. :)

Lazy ones are just a struct/class instance that contains an opApply method (or
that follows the Range protocol methods). Very easy.


 As far as D3 goes though:  Obviously nothing about it has really been 
 discussed at length, but is the general idea that it would be another 
 backwards-incompatible overhaul, or is the plan to make D2 the target 
 for backwards compatibility from here on out?

From what I've seen, D3 will probably be backwards compatible with D2.
(But maybe D3 will feel free to fix few backwards incompatible warts/errors
found in the meantime, I don't know, like un-flattening tuples).

Bye,
bearophile

Apr 01 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 31 Mar 2010 17:57:07 -0400, Mike S  
<mikes notarealaddresslololololol.com> wrote:

 Steven Schveighoffer wrote:
  You are correct, setCapacity ensures that *at least* the given number  
 of elements will be available for appending.
  I planned on making the function a property (but a bug would not allow  
 that), the original intended usage was:
  a.capacity = 10000;
  Reserve doesn't work in this context.  Can you come up with a name  
 that does?
  I'll bring up reserve (as a function) as an alternative on the phobos  
 mailing list, and see what people say.  I kind of liked the  
 setter/getter idea, but you make a good point.
  -Steve

 Sorry if resurrecting this thread is against netiquette, but it caught  
 my eye, and this is my first newsgroup post in years. ;)

 Anyway, is there any compelling reason why setCapacity or modifying  
 a.capacity should allocate a nondeterministic amount of storage?

What do you mean by nondeterministic?  It's very deterministic, just not  
always easy to determine ;)  However, given enough context, it's really  
easy to determine.

 Depending on the application, programmers might require strict control  
 over memory allocation patterns and strict accounting for allocated  
 memory.  Game programmers, especially console game programmers, tend to  
 strongly prefer deterministic allocation patterns, and nondeterminism is  
 one of the [several] common complaints about the C++ STL  
 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html is a  
 good resource on these kind of issues).  In the case of D (which I'm  
 considering learning), this is especially important for dynamic arrays,  
 partly because they're so useful by themselves, and partly because they  
 may form the backbone of custom containers.

The amount of memory given is determined by the GC, and ultimately by the  
OS.  The currently supported OSes allocate in Page-sized chunks, so when  
you allocate any memory from the OS, you are allocating a page (4k).  Most  
likely, you may not need a whole page for the data you are allocating, so  
the GC gives you more finely sized chunks by breaking up a page into  
smaller pieces.  This strategy works well in some cases, and can be  
wasteful in others.  The goal is to strike a balance that is "good enough"  
for everyday programming, but can be specialized when you need it.

If you want to control memory allocation yourself, you can always do that  
by allocating page-sized chunks and doing the memory management on those  
chunks yourself.  I do something very similar in dcollections to speed up  
allocation/destruction.

 Whereas it's easy to add "smart nondeterministic" behavior to a  
 deterministic setCapacity function by providing a wrapper, ordinary  
 language users can't do the opposite.  Because of this, and because  
 dynamic arrays are so central to the D language, a nondeterministic  
 setCapacity function may deter game programmers, especially console  
 programmers, from adopting D.

 Assuming you see this post, what are your thoughts here?

I think D has deterministic allocation, and better ability than C++ to  
make custom types that look and act like builtins.  Therefore, you can  
make an array type that suits your needs and is almost exactly the same  
syntax as a builtin array (except for some things reserved for builtins,  
like literals).  Such a thing is certainly possible, even with using the  
GC for your allocation.

BTW, I made the change to the runtime renaming the function previously  
known as setCapacity to reserve.  It won't be a property, even if that bug  
is fixed.

-Steve

Mar 31 2010

Mike S <mikes notarealaddresslololololol.com> writes:

Steven Schveighoffer wrote:
  > What do you mean by nondeterministic?  It's very deterministic, just 
not
 always easy to determine ;)  However, given enough context, it's really 
 easy to determine.

When I say deterministic, I'm referring to determinism from the user's 
point of view, where the allocation behavior is affected solely by the 
parameter (the size request, e.g. 10000 objects) and not by some kind of 
  internal state, hidden context, or arcane black magic. :p

  > The amount of memory given is determined by the GC, and ultimately by
 the OS.  The currently supported OSes allocate in Page-sized chunks, so 
 when you allocate any memory from the OS, you are allocating a page 
 (4k).  Most likely, you may not need a whole page for the data you are 
 allocating, so the GC gives you more finely sized chunks by breaking up 
 a page into smaller pieces.  This strategy works well in some cases, and 
 can be wasteful in others.  The goal is to strike a balance that is 
 "good enough" for everyday programming, but can be specialized when you 
 need it.

That's understandable, and it makes sense that the actual memory being 
allocated would correspond to some chunk size.  It's really just opaque 
black box behavior that poses a problem; if users are given well-defined 
guidelines and chunk sizes, that would work just fine.  For instance, a 
spec like, "reserve a multiple of 512 bytes and that's exactly what you 
will be given," would allow users to minimize wastefulness and know 
precisely how much memory they're allocating.

 If you want to control memory allocation yourself, you can always do 
 that by allocating page-sized chunks and doing the memory management on 
 those chunks yourself.  I do something very similar in dcollections to 
 speed up allocation/destruction.

 <snip>

 I think D has deterministic allocation, and better ability than C++ to 
 make custom types that look and act like builtins.  Therefore, you can 
 make an array type that suits your needs and is almost exactly the same 
 syntax as a builtin array (except for some things reserved for builtins, 
 like literals).  Such a thing is certainly possible, even with using the 
 GC for your allocation.

That parallels what game devs do in C++:  They tend to use custom 
allocators a lot, and they're likely to follow the same basic strategy 
in D too, if/when it becomes a suitable replacement.  I'm still just 
browsing though, and I'm not all that familiar with D.  If you can't 
actually use the built-in dynamic arrays for this purpose, how difficult 
would it be to reimplement a contiguously stored dynamic container using 
custom allocation?  I suppose you'd have to build it from the ground up 
using a void pointer to a custom allocated block of memory, right?  Do 
user-defined types in D have any/many performance disadvantages compared 
to built-ins?

 BTW, I made the change to the runtime renaming the function previously 
 known as setCapacity to reserve.  It won't be a property, even if that 
 bug is fixed.

 -Steve

That's a bit of a downer, since a capacity property would have nice 
symmetry with the length property.  I suppose there were good reasons 
though.  Considering the name change, does that mean reserve can only 
reserve new space, i.e. it can't free any that's already been allocated? 
  (That makes me wonder:  Out of curiosity, how does the garbage 
collector know how much space is allocated to a dynamic array or 
especially to a void pointer?  I suppose it's registered somewhere?)

Mar 31 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 01 Apr 2010 01:41:02 -0400, Mike S  
<mikes notarealaddresslololololol.com> wrote:

 Steven Schveighoffer wrote:
   > What do you mean by nondeterministic?  It's very deterministic, just  
 not
 always easy to determine ;)  However, given enough context, it's really  
 easy to determine.

 When I say deterministic, I'm referring to determinism from the user's  
 point of view, where the allocation behavior is affected solely by the  
 parameter (the size request, e.g. 10000 objects) and not by some kind of  
   internal state, hidden context, or arcane black magic. :p

Its abstracted to the GC, but the current GC is well defined.  If you  
request to allocate blocks with length of a power of 2 under a page, you  
will get exactly that length, all the way down to 16 bytes.  If you  
request to allocate a page or greater, you get a contiguous block of  
memory that is a multiple of a page.

With that definition, is the allocator deterministic enough for your needs?

   > The amount of memory given is determined by the GC, and ultimately by
 the OS.  The currently supported OSes allocate in Page-sized chunks, so  
 when you allocate any memory from the OS, you are allocating a page  
 (4k).  Most likely, you may not need a whole page for the data you are  
 allocating, so the GC gives you more finely sized chunks by breaking up  
 a page into smaller pieces.  This strategy works well in some cases,  
 and can be wasteful in others.  The goal is to strike a balance that is  
 "good enough" for everyday programming, but can be specialized when you  
 need it.

 That's understandable, and it makes sense that the actual memory being  
 allocated would correspond to some chunk size.  It's really just opaque  
 black box behavior that poses a problem; if users are given well-defined  
 guidelines and chunk sizes, that would work just fine.  For instance, a  
 spec like, "reserve a multiple of 512 bytes and that's exactly what you  
 will be given," would allow users to minimize wastefulness and know  
 precisely how much memory they're allocating.

I think in the interest of allowing innovative freedom, such requirements  
should be left up to the GC implementor, not the spec or runtime.  Anyone  
who wants to closely control memory usage should just understand how the  
GC they are using works.

  If you want to control memory allocation yourself, you can always do  
 that by allocating page-sized chunks and doing the memory management on  
 those chunks yourself.  I do something very similar in dcollections to  
 speed up allocation/destruction.
  <snip>
  I think D has deterministic allocation, and better ability than C++ to  
 make custom types that look and act like builtins.  Therefore, you can  
 make an array type that suits your needs and is almost exactly the same  
 syntax as a builtin array (except for some things reserved for  
 builtins, like literals).  Such a thing is certainly possible, even  
 with using the GC for your allocation.

 That parallels what game devs do in C++:  They tend to use custom  
 allocators a lot, and they're likely to follow the same basic strategy  
 in D too, if/when it becomes a suitable replacement.  I'm still just  
 browsing though, and I'm not all that familiar with D.  If you can't  
 actually use the built-in dynamic arrays for this purpose, how difficult  
 would it be to reimplement a contiguously stored dynamic container using  
 custom allocation?  I suppose you'd have to build it from the ground up  
 using a void pointer to a custom allocated block of memory, right?  Do  
 user-defined types in D have any/many performance disadvantages compared  
 to built-ins?

No, you would most likely use templates, not void pointers.  D's template  
system is far advanced past C++, and I used it to implement my custom  
allocators.  It works great.

User-defined types are as high performance as builtins as long as the  
compiler inlines properly.

  BTW, I made the change to the runtime renaming the function previously  
 known as setCapacity to reserve.  It won't be a property, even if that  
 bug is fixed.
  -Steve

 That's a bit of a downer, since a capacity property would have nice  
 symmetry with the length property.  I suppose there were good reasons  
 though.  Considering the name change, does that mean reserve can only  
 reserve new space, i.e. it can't free any that's already been allocated?

Capacity still exists as a read-only property.  I did like the symmetry,  
but the point was well taken that the act of setting the capacity was not  
exact.  It does mean that reserving space can only grow, not shrink.  In  
fact, the capacity property calls the same runtime function as reserve,  
just passing 0 as the amount requested to get the currently reserved space.

You can't use capacity to free space because that could result in dangling  
pointers.  Freeing space is done through the delete keyword.  We do not  
want to make it easy to accidentally free space.

   (That makes me wonder:  Out of curiosity, how does the garbage  
 collector know how much space is allocated to a dynamic array or  
 especially to a void pointer?  I suppose it's registered somewhere?)

The GC can figure out what page an interior pointer belongs to, and  
therefore how much memory that block uses.  There is a GC function to get  
the block info of an interior pointer, which returns a struct that  
contains the pointer to the block, the length of the block, and its flags  
(whether it contains pointers or not).  This function is what the array  
append feature uses to determine how much capacity can be used.  I believe  
this lookup is logarithmic in complexity.

-Steve

Apr 01 2010

Mike S <mikes notarealaddresslololololol.com> writes:

Steven Schveighoffer wrote:
 Its abstracted to the GC, but the current GC is well defined.  If you 
 request to allocate blocks with length of a power of 2 under a page, you 
 will get exactly that length, all the way down to 16 bytes.  If you 
 request to allocate a page or greater, you get a contiguous block of 
 memory that is a multiple of a page.
 
 With that definition, is the allocator deterministic enough for your needs?

With respect to the current GC, yup. :)  Sticking to powers of 2 (or 
integral numbers of pages) >= 16 bytes is an easy enough rule.  Of 
course, I'm just starting out, so experienced game programmers might 
disagree, but it sounds perfectly reasonable to me.  I suppose the 
abstraction makes it a QOI though, so depending on the compiler and GC 
used in the future it could become a question again.  As it stands, I'm 
less interested in its precise current state and more interested in 
keeping an eye on its direction and how the spec/compiler/tools will 
mature over the next few years.

  > I think in the interest of allowing innovative freedom, such
 requirements should be left up to the GC implementor, not the spec or 
 runtime.  Anyone who wants to closely control memory usage should just 
 understand how the GC they are using works.

It's definitely a tradeoff...leaving things unspecified opens the door 
to innovation and better implementations, but it simultaneously raises 
issues of inconsistent behavior across multiple platforms where the same 
compiler/GC might not be able to be used.  To give an extreme example, 
look where unspecified behavior regarding static initialization order 
got C++! ;)  I guess I'll just have to see where things go over time, 
but the predictable allocation with the current GC is a good sign at least.

 No, you would most likely use templates, not void pointers.  D's 
 template system is far advanced past C++, and I used it to implement my 
 custom allocators.  It works great.
 
 User-defined types are as high performance as builtins as long as the 
 compiler inlines properly.

D's template system is pretty intriguing.  I don't really know anything 
about it, but I've read that it's less intimidating and tricky than 
C++'s, and that can only be a good thing for people wanting to harness 
its power.

 Capacity still exists as a read-only property.  I did like the symmetry, 
 but the point was well taken that the act of setting the capacity was 
 not exact.  It does mean that reserving space can only grow, not 
 shrink.  In fact, the capacity property calls the same runtime function 
 as reserve, just passing 0 as the amount requested to get the currently 
 reserved space.
 
 You can't use capacity to free space because that could result in 
 dangling pointers.  Freeing space is done through the delete keyword.  
 We do not want to make it easy to accidentally free space.

It's a shame the asymmetry was necessary, but I agree it makes sense.

 The GC can figure out what page an interior pointer belongs to, and 
 therefore how much memory that block uses.  There is a GC function to 
 get the block info of an interior pointer, which returns a struct that 
 contains the pointer to the block, the length of the block, and its 
 flags (whether it contains pointers or not).  This function is what the 
 array append feature uses to determine how much capacity can be used.  I 
 believe this lookup is logarithmic in complexity.
 
 -Steve

Thanks!

Apr 01 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 01 Apr 2010 11:01:38 -0400, Mike S  
<mikes notarealaddresslololololol.com> wrote:

 I suppose the abstraction makes it a QOI though, so depending on the  
 compiler and GC used in the future it could become a question again.  As  
 it stands, I'm less interested in its precise current state and more  
 interested in keeping an eye on its direction and how the  
 spec/compiler/tools will mature over the next few years.

As an aside, because the GC is implemented all in runtime, you can swap  
out the GC for something else that suits your needs.  Therefore, the issue  
of having to deal with different GCs on different platforms can be  
circumvented by inventing a GC that works the way you want on all  
platforms :)

Also, I don't see the major attributes of the GC changing anytime soon.

-Steve

Apr 01 2010

grauzone <none example.net> writes:

Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:
 
 Some sort of "resetAndReuse" function to clear an array, but enabling 
 to reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal 
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old 
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This can 
 be implemented by enlarging the array's memory block without touching 
 any length fields.

 I'm sure the function you had in mind does one of those things or both.

 
 proposed usage (as checked in a couple days ago):
 
 int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.
 foreach(i; 0..10000)
    a ~= i; // no reallocation
 a.length = 100;
 a.shrinkToFit(); // resize "allocated" length to 100 elements
 a ~= 5; // no reallocation.

What shrinkToFit() does is not really clear. Does it reallocate the 
memory block of the array such, that no space is wasted? Or does it 
provide (almost) the same functionality as my resetAndReuse(), and make 
the superfluous trailing memory available for appending without 
reallocation?

I think a resetAndReuse is really needed. I have found it can prevent 
"GC thrashing" in many cases. E.g. when caching frequently re-evaluated 
data in form of arrays (free previous array, then allocate array that 
isn't larger than the previous array).

Mar 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Mar 2010 12:43:32 -0500, grauzone <none example.net> wrote:

 Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:

 Some sort of "resetAndReuse" function to clear an array, but enabling  
 to reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal  
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old  
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This can  
 be implemented by enlarging the array's memory block without touching  
 any length fields.

 I'm sure the function you had in mind does one of those things or both.

  proposed usage (as checked in a couple days ago):
  int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.
 foreach(i; 0..10000)
    a ~= i; // no reallocation
 a.length = 100;
 a.shrinkToFit(); // resize "allocated" length to 100 elements
 a ~= 5; // no reallocation.

 What shrinkToFit() does is not really clear. Does it reallocate the  
 memory block of the array such, that no space is wasted? Or does it  
 provide (almost) the same functionality as my resetAndReuse(), and make  
 the superfluous trailing memory available for appending without  
 reallocation?

Sorry, should have added:

assert(a.length == 101);

Basically, shrinkToFit shrinks the "allocated" space to the length of the  
array. To put it another way, you could write your resetAndReuse function  
as follows:

void resetAndReuse(T)(ref T[] arr)
{
   arr.length = 0;
   arr.shrinkToFit();
}

I want to avoid assuming that shrinking the length to 0 is the only usable  
idiom.

 I think a resetAndReuse is really needed. I have found it can prevent  
 "GC thrashing" in many cases. E.g. when caching frequently re-evaluated  
 data in form of arrays (free previous array, then allocate array that  
 isn't larger than the previous array).

Yes, it is useful in such cases.  The only questionable part about it is  
that it allows for stomping in cases like:

auto str = "hello".idup;
auto str2 = str[3..4];
str2.shrinkToFit();
str2 ~= "a";
assert(str == "hella");

So it probably should be marked as unsafe.

Again, the name shrinkToFit isn't my favorite, ideas welcome.

-Steve

Mar 04 2010

grauzone <none example.net> writes:

Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 12:43:32 -0500, grauzone <none example.net> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none example.net> wrote:

 Some sort of "resetAndReuse" function to clear an array, but 
 enabling to reuse the old memory would be nice:

 int[] a = data;
 a = null;
 a ~= 1; //reallocates (of course)
 a.length = 0;
 a ~= 1; //will reallocate (for safety), used not to reallocate
 resetAndReuse(a);
 assert(a.length == 0);
 a ~= 1; //doesn't reallocate

 This can be implemented by setting both the slice and the internal 
 runtime length fields to  0.

 Additionally, another function is necessary to replace the old 
 preallocation trick:

 //preallocate 1000 elements, but don't change actual slice length
 auto len = a.length;
 a.length = len + 1000;
 a.length = len;

 As I understood it, this won't work anymore after the change. This 
 can be implemented by enlarging the array's memory block without 
 touching any length fields.

 I'm sure the function you had in mind does one of those things or both.

  proposed usage (as checked in a couple days ago):
  int[] a;
 a.setCapacity(10000); // pre-allocate at least 10000 elements.
 foreach(i; 0..10000)
    a ~= i; // no reallocation
 a.length = 100;
 a.shrinkToFit(); // resize "allocated" length to 100 elements
 a ~= 5; // no reallocation.

 What shrinkToFit() does is not really clear. Does it reallocate the 
 memory block of the array such, that no space is wasted? Or does it 
 provide (almost) the same functionality as my resetAndReuse(), and 
 make the superfluous trailing memory available for appending without 
 reallocation?

 
 Sorry, should have added:
 
 assert(a.length == 101);
 
 Basically, shrinkToFit shrinks the "allocated" space to the length of 
 the array. To put it another way, you could write your resetAndReuse 
 function as follows:
 
 void resetAndReuse(T)(ref T[] arr)
 {
   arr.length = 0;
   arr.shrinkToFit();
 }
 
 I want to avoid assuming that shrinking the length to 0 is the only 
 usable idiom.

Ah, great.

 I think a resetAndReuse is really needed. I have found it can prevent 
 "GC thrashing" in many cases. E.g. when caching frequently 
 re-evaluated data in form of arrays (free previous array, then 
 allocate array that isn't larger than the previous array).

 
 Yes, it is useful in such cases.  The only questionable part about it is 
 that it allows for stomping in cases like:
 
 auto str = "hello".idup;
 auto str2 = str[3..4];
 str2.shrinkToFit();
 str2 ~= "a";
 assert(str == "hella");
 
 So it probably should be marked as unsafe.

Doesn't it conform to Andrei's ideas about memory safety? It can stomp 
over other data, but it can't be used to subvert the type system. Also, 
it's not the default behavior: if you don't use this function, stomping 
can never happen.

But it must be disabled for arrays of immutable types (i.e. strings).

 Again, the name shrinkToFit isn't my favorite, ideas welcome.

stompOnAppend()? uniqueSlice()? trashRemainder()?

I don't know either, all sound a bit silly.

PS: I'm glad to hear that your patch is supposed to be included in the 
next release. Still waiting for dsimcha's one.

 -Steve

Mar 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Mar 2010 13:38:30 -0500, grauzone <none example.net> wrote:

 Steven Schveighoffer wrote:

  So it probably should be marked as unsafe.

 Doesn't it conform to Andrei's ideas about memory safety? It can stomp  
 over other data, but it can't be used to subvert the type system.

I think the idea is that anything that *could* result in undefined  
behavior is marked as unsafe.  It just means it's not available from safe  
functions.  It still can be called from safe functions via a trusted  
wrapper.  I'd call it undefined behavior, because the memory is  
effectively released.  Future GCs may make assumptions about that  
unallocated space that would cause problems.  I feel like safeD has a  
slight lean towards sacrificing performance for the sake of memory  
safety.  This is not a bad thing, and in most cases inconsequential.

 Also, it's not the default behavior: if you don't use this function,  
 stomping can never happen.

What's nice about capturing it into a function is you *can* classify it as  
unsafe, versus the current method which is harder to detect.  I'd  
categorize it like casting.

 But it must be disabled for arrays of immutable types (i.e. strings).

This is tricky, you could have partially immutable types.  The function  
would have to examine the type of all the contained items and make sure  
there were no immutable bytes in the element.

Plus, it's not truly unsafe to use on immutable types if the given array  
is the only reference to that data.  You could potentially implement a  
stack using builtin array appending and shrinkToFit, even on immutable  
elements (although I'd advise against it :)

 Again, the name shrinkToFit isn't my favorite, ideas welcome.

 stompOnAppend()? uniqueSlice()? trashRemainder()?

 I don't know either, all sound a bit silly.

I think I like shrinkToFit better than all those :)  I still like minimize  
the best, but the objection was that it has mathematical connotations.

-Steve

Mar 04 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Feature suggestion: in-place append to array