digitalmars.D - Linus with some good observations on garbage collection

Walter Bright (1/1) Apr 22 2011 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Andrej Mitrovic (1/1) Apr 22 2011 Just a reminder: that post is 9 years old.
Alvaro (16/17) Apr 22 2011 I've always been surprised when discussions usually just bring garbage

Steven Schveighoffer (8/15) Apr 22 2011 Because you then have to update potentially two reference counts every
Michael Stover (18/35) Apr 22 2011 This sort of reference count with cyclic dependency detector is how a lo...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/26) Apr 23 2011 y

Brad Roberts (4/35) Apr 22 2011 Also add to it that in many cases you're dealing with a threaded environ...

Timon Gehr (62/69) Apr 22 2011 expensive than non-atomic. More so

bearophile (5/10) Apr 22 2011 D used to have scoped class instances, scoped classes, and delete, their...

Iain Buclaw (15/25) Apr 22 2011 to variable length arrays.

bearophile (16/19) Apr 22 2011 I have an enhancement request in Bugzilla on VLA, with longer discussion...

bearophile (3/4) Apr 22 2011 Never mind, that semantics can't use that syntax, otherwise you have hid...
Iain Buclaw (22/26) Apr 23 2011 deallocate at the end of the function or to deallocate at the end of the...

Andrei Alexandrescu (11/20) Apr 23 2011 Well in fairness they can be a fair amount more elaborate. The trouble

dsimcha (10/36) Apr 23 2011 Right. This is exactly the kind of thing TempAlloc was meant to solve.

bearophile (4/5) Apr 23 2011 I think TempAlloc is meant for larger allocations. D-VLAs are meant to a...

dsimcha (5/10) Apr 23 2011 Why would TempAlloc not work for this? It's slightly slower than VLAs,

bearophile (5/9) Apr 23 2011 I have to compare the performance of both for my use cases (fast allocat...

Andrei Alexandrescu (4/6) Apr 23 2011 They are converted to pointers to functions. Not something I'd recommend...

Bruno Medeiros (11/17) Apr 29 2011 And that's since the C days btw. The function call is just an operation,...

Andrei Alexandrescu (3/20) Apr 29 2011 It's an indirect call instead of a direct call.

Bruno Medeiros (10/33) May 03 2011 Well, yes, that's kinda of what I was thinking already (although I

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/53) May 03 2011 ion

Bruno Medeiros (5/50) May 04 2011 Ah right, that makes sense, forgot about that one (I see it's the same
Don (3/44) May 04 2011 That was true ten years ago, but not any more. Modern CPUs can do branch...

dsimcha (7/54) May 04 2011 Yeh, I tested this a while back and can dig up/recreate the benchmark if...

bearophile (10/21) Apr 23 2011 See this too:
bearophile (4/11) Apr 23 2011 As I have explained in my old enhancement request about VLAs and in a re...

Ulrik Mikaelsson (26/42) Apr 22 2011 not need

Timon Gehr (8/21) Apr 22 2011 That is very true. GC is almost always fast enough or even faster. And i...

Andrei Alexandrescu (4/6) Apr 23 2011 The functionality is not going away.

Timon Gehr (7/13) Apr 23 2011 Yes I realize that. It is a matter of syntax and aesthetics. Handling al...

Andrei Alexandrescu (10/26) Apr 23 2011 Allocation and deallocation are not symmetric and as such handling them

Timon Gehr (15/26) Apr 23 2011 'char' is completely orthogonal to memory allocation and it still gets h...

Piotr Szturmaj (3/32) Apr 23 2011 Why not allow "delete" only in classes with custom allocators? AFAIK

Piotr Szturmaj (3/5) Apr 24 2011 Nevermind. Andrei informed that class allocators are being deprecated,

Andrei Alexandrescu (13/41) Apr 23 2011 The point is that some naively believe "new" and "delete" are some

Timon Gehr (21/33) Apr 24 2011 Virtually no good application uses them? The _DMD compiler implementatio...

Andrei Alexandrescu (33/69) Apr 24 2011 A class-specific freelist-based allocator in fact stands in the way of a...

Andrej Mitrovic (2/2) Apr 24 2011 Since this is definite we should remove them from the docs as well.

Andrei Alexandrescu (4/6) Apr 24 2011 Please do. BTW I just fixed the core-related documentation bug on

Andrej Mitrovic (2/8) Apr 24 2011 Fantastic, thank you.

bearophile (5/6) Apr 24 2011 Just a note: you have not wasted your time discussing this, because I am...

Francisco Almeida (21/27) Apr 24 2011 the design purpose, design constraints and design compromises of every f...

Timon Gehr (74/81) Apr 24 2011 Ok, lets stop. Custom allocators and delete should probably be removed f...

Iain Buclaw (6/40) Apr 24 2011 Is it correct that the current D GC implementation puts delete'd allocat...

Andrei Alexandrescu (6/46) Apr 24 2011 I'd appreciate if you provided more detail on that. Far as I can tell

Robert Jacques (5/24) Apr 24 2011 To the best of my knowledge, for 'small' allocations, the current GC use...

Sean Kelly (4/6) Apr 25 2011 uses free-lists of power-of-2 sized blocks. (I think small is defined as...

Iain Buclaw (14/62) Apr 25 2011 I was thinking of where one implements an alternative runtime that may n...

Andrei Alexandrescu (4/17) Apr 25 2011 I understand. The right response to this is defining a good custom

Kim D (6/35) Apr 25 2011 Sounds like a good way forward. However, certain operations are currentl...

Andrei Alexandrescu (7/44) Apr 25 2011 An entity relying on GC will by definition have more freedom and more

Andrei Alexandrescu (56/95) Apr 24 2011 I agree. There are two other things that I don't agree with:

Sean Kelly (10/16) Apr 25 2011 semantics independent of Widget, such that modular and generic code can ...

Walter Bright (3/5) Apr 23 2011 I believe a GC is required if one wishes to prove memory safety, which i...

Walter Bright <newshound2 digitalmars.com> writes:

http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Apr 22 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Just a reminder: that post is 9 years old.

Apr 22 2011

Alvaro <alvaro.segura gmail.com> writes:

El 22/04/2011 19:36, Walter Bright escribi�:
 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

I've always been surprised when discussions usually just bring garbage 
collection as the only alternative to explicit manual memory management. 
I imagined it as a garbage truck that has its own schedule and may let a 
lot of trash pile up before passing by. I always naively thought, why 
not just free immediately when an object gets no references?

Not an expert, so there may be reasons I don't see, but now that Linus 
says somethnig along the lines, I'll ask. Why not? Isn't it much easier 
to do refcount++ and refcount--, and if refcount==0 immediately 
"free()"? Memory will be available to other needs faster, no need for an 
additional thread, or a lot of memory consumed before the advanced 
garbage truck decides to come in, or slight pauses when collecting trash 
(maybe only in old implementations), and the implementation is much 
simpler...

OK, I knew about that "cyclic references" problem. But Linus doesn't 
seem to see a big problem and solutions can be found with care...

Apr 22 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 22 Apr 2011 14:32:06 -0400, Alvaro <alvaro.segura gmail.com> wrote:

 El 22/04/2011 19:36, Walter Bright escribió:
 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

 I've always been surprised when discussions usually just bring garbage  
 collection as the only alternative to explicit manual memory management.  
 I imagined it as a garbage truck that has its own schedule and may let a  
 lot of trash pile up before passing by. I always naively thought, why  
 not just free immediately when an object gets no references?

Because you then have to update potentially two reference counts every  
time you assign a pointer.  GC's save you from doing that.

I know way way less than Torvalds, but my naive brain says GC's still win  
because often times, slightly noticeable drops in performance are worth  
having code that doesn't corrupt memory.  This may not be true for kernel  
development, but then again, we aren't all developing kernels ;)

-Steve

Apr 22 2011

Michael Stover <michael.r.stover gmail.com> writes:

This sort of reference count with cyclic dependency detector is how a lot o=
f
scripting languages do it, or did it in the past.  The problem was that laz=
y
generational GCs are believed to have better throughput in general.

I'd like to say "were proved" rather than "are believed", but I don't
actually know where to go for such evidence.  However, I do believe many
scripting languages, such as python, eventually ditched the reference
counting technique for generational, and Java has very fast GC, so I am
inclined to believe those real-life solutions than Linus.

Mike

On Fri, Apr 22, 2011 at 2:32 PM, Alvaro <alvaro.segura gmail.com> wrote:

 El 22/04/2011 19:36, Walter Bright escribi=F3:

  http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

 I've always been surprised when discussions usually just bring garbage
 collection as the only alternative to explicit manual memory management. =

I
 imagined it as a garbage truck that has its own schedule and may let a lo=

t
 of trash pile up before passing by. I always naively thought, why not jus=

t
 free immediately when an object gets no references?

 Not an expert, so there may be reasons I don't see, but now that Linus sa=

ys
 somethnig along the lines, I'll ask. Why not? Isn't it much easier to do
 refcount++ and refcount--, and if refcount=3D=3D0 immediately "free()"? M=

emory
 will be available to other needs faster, no need for an additional thread=

,
 or a lot of memory consumed before the advanced garbage truck decides to
 come in, or slight pauses when collecting trash (maybe only in old
 implementations), and the implementation is much simpler...

 OK, I knew about that "cyclic references" problem. But Linus doesn't seem
 to see a big problem and solutions can be found with care...

Apr 22 2011

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Michael Stover wrote:
 This sort of reference count with cyclic dependency detector is how a
 lot of scripting languages do it, or did it in the past.  The problem
 was that lazy generational GCs are believed to have better throughput i=

n
 general.

 I'd like to say "were proved" rather than "are believed", but I don't
 actually know where to go for such evidence.  However, I do believe man=

y
 scripting languages, such as python, eventually ditched the reference
 counting technique for generational, and Java has very fast GC, so I am=

 inclined to believe those real-life solutions than Linus.

        This is a single benchmark on a single obscure language, so
take it with a grain of salt, but look at the "Nimrod" and "Nimrod
Boehm" lines here: https://bitbucket.org/jmb/shootout/wiki/Home

        By default, Nimrod uses a reference counter but there is a
compiler switch to use the Boehm GC instead. Of particular interest
is the benchmark "Binary trees", since it is specially designed to
exercise memory management. On that benchmark, the Boehm version is
about twice as fast as the refcounting version, while on other
benchmarks, performance is comparable.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Apr 23 2011

Brad Roberts <braddr puremagic.com> writes:

Also add to it that in many cases you're dealing with a threaded environment,
so those refcounts have to be locked
(either via mutexes, or more commonly just atomic) operations which are far
more expensive than non-atomic.  More so
when there's actual contention for the refcounted resource.

On 4/22/2011 11:53 AM, Michael Stover wrote:
 This sort of reference count with cyclic dependency detector is how a lot of
scripting languages do it, or did it in the
 past.  The problem was that lazy generational GCs are believed to have better
throughput in general.
 
 I'd like to say "were proved" rather than "are believed", but I don't actually
know where to go for such evidence.
  However, I do believe many scripting languages, such as python, eventually
ditched the reference counting technique for
 generational, and Java has very fast GC, so I am inclined to believe those
real-life solutions than Linus.
 
 Mike
 
 On Fri, Apr 22, 2011 at 2:32 PM, Alvaro <alvaro.segura gmail.com
<mailto:alvaro.segura gmail.com>> wrote:
 
     El 22/04/2011 19:36, Walter Bright escribi�:
 
         http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
 
 
     I've always been surprised when discussions usually just bring garbage
collection as the only alternative to
     explicit manual memory management. I imagined it as a garbage truck that
has its own schedule and may let a lot of
     trash pile up before passing by. I always naively thought, why not just
free immediately when an object gets no
     references?
 
     Not an expert, so there may be reasons I don't see, but now that Linus
says somethnig along the lines, I'll ask. Why
     not? Isn't it much easier to do refcount++ and refcount--, and if
refcount==0 immediately "free()"? Memory will be
     available to other needs faster, no need for an additional thread, or a
lot of memory consumed before the advanced
     garbage truck decides to come in, or slight pauses when collecting trash
(maybe only in old implementations), and
     the implementation is much simpler...
 
     OK, I knew about that "cyclic references" problem. But Linus doesn't seem
to see a big problem and solutions can be
     found with care...

Apr 22 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Brad Roberts wrote:
 Also add to it that in many cases you're dealing with a threaded environment, >

so those refcounts have to be locked
 (either via mutexes, or more commonly just atomic) operations which are far
more

expensive than non-atomic.  More so
 when there's actual contention for the refcounted resource.

That is only a problem if the reference count of that resource changes at a very
high frequency. The described problem also implies that the threads would not
need
any form of synchronization for the data (otherwise the reference count
certainly
would not be a bottleneck.)

I cannot, at the moment, think of a real-world example where this would not
imply
bad design. Can you help me out?

Michael Stover wrote:
 I'd like to say "were proved" rather than "are believed", but I don't actually

know where to go for such evidence.
  However, I do believe many scripting languages, such as python, eventually

ditched the reference counting technique for
 generational, and Java has very fast GC, so I am inclined to believe those

real-life solutions than Linus.
 Mike

Well, the GC may be fast when compared to other GCs, but it has to be designed
to
run on general data whose reference structure can be arbitrary. Often, the
objects/references have a somewhat specialized structure that a smart programmer
can exploit, especially if the references point to private data. But that only
matters if performance is of prime interest, and the gains may be not very big.

But, as pointed out by Linus, the prime performance problem is _not_ the GC, but
the mindset that comes with it. Most programmers that "grew up" in a managed
environment tend to use very many "new" keywords all over their code, instead of

this.)
When they then try to write a C++ program, they do the same. The resulting
memory
bugs are then blamed on the lack of a GC (you can have GC in C/C++, but most of
the people I have talked to do not know this.) They then happily change back to
Java, that has a "very fast GC".

The important thing to note here is that the work required to deallocate all
these
many memory locations does not magically disappear, but it is delegated to the
GC,
which will mostly do it faster and more reliable than a programmer which has to
do
it manually. But the problem does not lie in the deallocations, it's in the
allocations.

Consider this analogy:

Scenario 1: Many people like candy. Those are wrapped in colorful little pieces
of
paper. Every day, every person buys one piece of candy in the candy shop (new
Candy()!) and on the way back home they throw away the wrapping paper somewhere
on
the street (reassign reference). Those are garbage. In the evening, some creepy
guy comes to search the whole street for those small pieces of paper. Would you
call that guy a garbage collector? He collects garbage, but in the real world
garbage collectors work more like this:

Scenario 2: Still, many people like candy. Every person buys a bag of candy in
the
candy shop once a year (new Candy[365]). When all the candy is eaten, they put
all
the garbage in one bag and put it to their front door (reassign reference to
whole
array). A very handsome guy collects all those bags. He is very much more
efficient than the guy in example 1. (Arguably, memory usage is bigger in that
particular example, but in computer programs, the allocating process can reuse
the
memory. The analogy breaks down here.)

Note that I am not saying that garbage collection is bad. If the references
form a
very complicated structure, or if a reference to the containing object is not
necessarily kept (iE. array slicing), it can be very useful as well as faster
than
manual memory management. Also, custom allocators can reduce the
lots-of-small-allocations problem, but they have some overhead too. Advanced GCs
may do this automatically, I don't know.

The reason Java is garbage collected is _not_ performance, but primarily
reliability. In big programming projects, it can be hard to know where a
reference
to an object may be kept, if there are circular references etc, as programs keep
expanding without the programmers understanding the whole project in detail. GC
also removes bugs related to memory management. I think true and reliable OO is
not possible without a GC. The downside is, that many Java/... programmers don't
concern themselves much with the internals of memory allocations, and are _very_
proud of it. This is also the reason I think it is a bad idea to deprecate D's
'delete'.


-Timon

Apr 22 2011

bearophile <bearophileHUGS lycos.com> writes:

Timon Gehr:

 But, as pointed out by Linus, the prime performance problem is _not_ the GC,
but
 the mindset that comes with it. Most programmers that "grew up" in a managed
 environment tend to use very many "new" keywords all over their code, instead
of

this.)

In C99 (and Ada) you avoid the allocation of some dynamic arrays with new
thanks to variable length arrays.


 This is also the reason I think it is a bad idea to deprecate D's 'delete'.

D used to have scoped class instances, scoped classes, and delete, their
replacements are not good enough yet. In CommonLisp you have hints for the GC,
they are safe and they help you help speedup the work of the GC. Such hints
probably need to be integrated with the type system, so they may need to be
built-ins as scope/delete were. I am not seeing enough discussion about this.

Bye,
bearophile

Apr 22 2011

Iain Buclaw <ibuclaw ubuntu.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Timon Gehr:
 But, as pointed out by Linus, the prime performance problem is _not_ the GC,
but
 the mindset that comes with it. Most programmers that "grew up" in a managed
 environment tend to use very many "new" keywords all over their code, instead
of

this.)

 In C99 (and Ada) you avoid the allocation of some dynamic arrays with new
thanks

to variable length arrays.

Variable length arrays are just sugary syntax for a call to alloca.

 This is also the reason I think it is a bad idea to deprecate D's 'delete'.

 D used to have scoped class instances, scoped classes, and delete, their

replacements are not good enough yet. In CommonLisp you have hints for the GC,
they are safe and they help you help speedup the work of the GC. Such hints
probably need to be integrated with the type system, so they may need to be
built-ins as scope/delete were. I am not seeing enough discussion about this.
 Bye,
 bearophile


I've always felt that Vala's system is better thought out, which is incidentally
based on a reference counting system. This makes destructors in Vala
deterministic
and can be used to implement an RAII pattern for resource management.
To get around the common pitfalls of reference counting systems, they introduce
two keywords which alter the relationship between the allocated object and the
GC,
'weak' and 'unowned'.

Rather than bore you with the gritty details here, see link:
http://live.gnome.org/Vala/ReferenceHandling

Apr 22 2011

bearophile <bearophileHUGS lycos.com> writes:

Iain Buclaw:

 Variable length arrays are just sugary syntax for a call to alloca.

I have an enhancement request in Bugzilla on VLA, with longer discussions. Just
two comments:
- It seems alloca() can be implemented with two different semantics: to
deallocate at the end of the function or to deallocate at the end of the scope.
Usually alloca() deallocates at the end of the function, but that semantic
confusion is dangerous. VLA deallocate at the end of the scope, just like any
other static array.
- To use alloca you need to use pointers, maybe even slices, it's not DRY, etc.
So syntax sugar helps.

In the meantime I've changed my mind a little. Now D I prefer something better
than C99 VLAs. I'd like D-VLAs with the same syntax as C99 VLAs but with a
safer semantics, closer to this one (but the "alloca" used here must deallocate
at the end of the scope):

enum size_t MAX_VLA_SIZE = 1024;
static assert (is(typeof(size) == size_t));
T* ptr = null;
if ((size * T.sizeof) < MAX_VLA_SIZE)
    ptr = cast(T*)alloca(size * T.sizeof);
T[] array = (ptr == null) ? new T[size] : ptr[0 .. size];
array[] = T.init;

This has some advantages: when alloca returns null, or when the array is large,
it uses the GC. This allows to both avoid some stack overflows and reduce the
risk of the stack memory from becoming too much cold.


 Rather than bore you with the gritty details here, see link:
 http://live.gnome.org/Vala/ReferenceHandling

This is interesting.

Bye,
bearophile

Apr 22 2011

bearophile <bearophileHUGS lycos.com> writes:

 Now D I prefer something better than C99 VLAs. I'd like D-VLAs with the same
syntax as C99 VLAs but with a safer semantics,

Never mind, that semantics can't use that syntax, otherwise you have hidden
heap allocations... The syntax has to change (because the D-VLA semantics seems
OK to me).

Bye,
bearophile

Apr 22 2011

Iain Buclaw <ibuclaw ubuntu.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Iain Buclaw:
 Variable length arrays are just sugary syntax for a call to alloca.

 I have an enhancement request in Bugzilla on VLA, with longer discussions. Just

two comments:
 - It seems alloca() can be implemented with two different semantics: to

deallocate at the end of the function or to deallocate at the end of the scope.
Usually alloca() deallocates at the end of the function, but that semantic
confusion is dangerous. VLA deallocate at the end of the scope, just like any
other static array.

Yep, my understanding of the way C99 goes about it (I might be wrong here, so
don't hurt me :), the stack is saved for the lifetime of the alloca'd array, and
restored once it goes out of scope.

ie:

int sz = 42;
{
  void * saved_stack = __stack_save (); // start of scope
  int array[0 : D.range];
  uint D.range;  // automatic variable
  void * D.ptr;  // automatic variable

  D.range = sz + -1;
  D.ptr = __builtin_alloca (sz * 4);
  array = (int[0 : D.range] *) D.ptr;

  // later...

  __stack_restore (saved_stack);        // end of scope
}

Apr 23 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/22/11 5:21 PM, Iain Buclaw wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 Timon Gehr:
 But, as pointed out by Linus, the prime performance problem is _not_ the GC,
but
 the mindset that comes with it. Most programmers that "grew up" in a managed
 environment tend to use very many "new" keywords all over their code, instead
of

this.)

 In C99 (and Ada) you avoid the allocation of some dynamic arrays with new
thanks

 to variable length arrays.

 Variable length arrays are just sugary syntax for a call to alloca.

Well in fairness they can be a fair amount more elaborate. The trouble 
with alloca is that it's impossible to compose with. So in general you 
need to write something like:

bool onStack = smallEnough(length * sizeof(T));
auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T)));
scope(exit) if (!onStack) free(a.ptr);
initialize(enforce(a));
scope(exit) clear(a);

This block is difficult to factor out because of alloca.


Andrei

Apr 23 2011

dsimcha <dsimcha yahoo.com> writes:

On 4/23/2011 9:48 AM, Andrei Alexandrescu wrote:
 On 4/22/11 5:21 PM, Iain Buclaw wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 Timon Gehr:
 But, as pointed out by Linus, the prime performance problem is _not_
 the GC, but
 the mindset that comes with it. Most programmers that "grew up" in a
 managed
 environment tend to use very many "new" keywords all over their
 code, instead of

 you to do this.)

 In C99 (and Ada) you avoid the allocation of some dynamic arrays with
 new thanks

 to variable length arrays.

 Variable length arrays are just sugary syntax for a call to alloca.

 Well in fairness they can be a fair amount more elaborate. The trouble
 with alloca is that it's impossible to compose with. So in general you
 need to write something like:

 bool onStack = smallEnough(length * sizeof(T));
 auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T)));
 scope(exit) if (!onStack) free(a.ptr);
 initialize(enforce(a));
 scope(exit) clear(a);

 This block is difficult to factor out because of alloca.


 Andrei

Right.  This is exactly the kind of thing TempAlloc was meant to solve. 
  (Actually, to give credit, the initial rough idea for TempAlloc was 
proposed by Andrei, though I fleshed out the details and implemented 
it.)  With TempAlloc, you have a segmented stack (i.e. one that 
allocates a new segment instead of crashing the program if it's full) 
that is independent of the call stack (so you can return 
TempAlloc-allocated memory from functions).

BTW, since when does the ternary operator work with functions, as 
opposed to variables?

Apr 23 2011

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 Right.  This is exactly the kind of thing TempAlloc was meant to solve. 

I think TempAlloc is meant for larger allocations. D-VLAs are meant to allocate
little arrays (like 1 KB or less) on the normal stack very quickly.

Bye,
bearophile

Apr 23 2011

dsimcha <dsimcha yahoo.com> writes:

On 4/23/2011 10:24 AM, bearophile wrote:
 dsimcha:

 Right.  This is exactly the kind of thing TempAlloc was meant to solve.

 I think TempAlloc is meant for larger allocations. D-VLAs are meant to
allocate little arrays (like 1 KB or less) on the normal stack very quickly.

 Bye,
 bearophile

Why would TempAlloc not work for this?  It's slightly slower than VLAs, 
but the difference is negligible since it's still almost never a 
threading bottleneck and you still have to do something with the array 
you allocated.

Apr 23 2011

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 It's slightly slower than VLAs, 
 but the difference is negligible since it's still almost never a 
 threading bottleneck and you still have to do something with the array 
 you allocated.

I have to compare the performance of both for my use cases (fast allocations of
very small arrays, even a just a small number of chars).
I think an advantage of stack allocations is that most times the top of the
current stack is already in cache L1, so it's warm memory.

Bye,
bearophile

Apr 23 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?

They are converted to pointers to functions. Not something I'd recommend 
because it makes the call slower.

Andrei

Apr 23 2011

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


And that's since the C days btw. The function call is just an operation, 
the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd recommend
 because it makes the call slower.

 Andrei

Hum, I didn't know that. Why does it make the call slower (other than 
the actual ternary operation), is it because it has to load the function 
address from a register/variable, instead of being a constant value? 
And/or not being able to inline the call?
I mean, the first aspect seems like a very minor impact in performance, 
almost negligible.


-- 
Bruno Medeiros - Software Engineer

Apr 29 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


 And that's since the C days btw. The function call is just an operation,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd recommend
 because it makes the call slower.

 Andrei

 Hum, I didn't know that. Why does it make the call slower (other than
 the actual ternary operation), is it because it has to load the function
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in performance,
 almost negligible.

It's an indirect call instead of a direct call.

Andrei

Apr 29 2011

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


 And that's since the C days btw. The function call is just an operation,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd recommend
 because it makes the call slower.

 Andrei

 Hum, I didn't know that. Why does it make the call slower (other than
 the actual ternary operation), is it because it has to load the function
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in performance,
 almost negligible.

 It's an indirect call instead of a direct call.

 Andrei

Well, yes, that's kinda of what I was thinking already (although I 
referred to more low-level, assembler terms).
But my question remains, why is an indirect function call slower than a 
direct one, and is that difference significant? (excluding of course the 
possibilities of inlining and other static analysis that direct 
invocations offer) There is no extra overhead other than loading the 
function address from a register/variable, right?


-- 
Bruno Medeiros - Software Engineer

May 03 2011

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


 And that's since the C days btw. The function call is just an operati=



on,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei

 Hum, I didn't know that. Why does it make the call slower (other than=



 the actual ternary operation), is it because it has to load the funct=



ion
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in performanc=



e,
 almost negligible.

 It's an indirect call instead of a direct call.

 Andrei

=20
 Well, yes, that's kinda of what I was thinking already (although I
 referred to more low-level, assembler terms).
 But my question remains, why is an indirect function call slower than a=

 direct one, and is that difference significant? (excluding of course th=

e
 possibilities of inlining and other static analysis that direct
 invocations offer) There is no extra overhead other than loading the
 function address from a register/variable, right?
=20

	AIUI, with a direct call the CPU can anticipate the call (because
it knows the target address) and start filling the pipeline
immediately. OTOH, with an indirect call the CPU cannot know the
target until it reads the register, which means that the pipeline
will empty itself before the call is executed. With the pipeline
length of some modern CPUs (Pentium 4 being amongst the worst), this
can result in very large slowdowns.

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

May 03 2011

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 03/05/2011 22:41, "Jérôme M. Berger" wrote:
 Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


 And that's since the C days btw. The function call is just an operation,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei

 Hum, I didn't know that. Why does it make the call slower (other than
 the actual ternary operation), is it because it has to load the function
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in performance,
 almost negligible.

 It's an indirect call instead of a direct call.

 Andrei

 Well, yes, that's kinda of what I was thinking already (although I
 referred to more low-level, assembler terms).
 But my question remains, why is an indirect function call slower than a
 direct one, and is that difference significant? (excluding of course the
 possibilities of inlining and other static analysis that direct
 invocations offer) There is no extra overhead other than loading the
 function address from a register/variable, right?

 	AIUI, with a direct call the CPU can anticipate the call (because
 it knows the target address) and start filling the pipeline
 immediately. OTOH, with an indirect call the CPU cannot know the
 target until it reads the register, which means that the pipeline
 will empty itself before the call is executed. With the pipeline
 length of some modern CPUs (Pentium 4 being amongst the worst), this
 can result in very large slowdowns.

 		Jerome

Ah right, that makes sense, forgot about that one (I see it's the same 
issue with branch prediction).

-- 
Bruno Medeiros - Software Engineer

May 04 2011

Don <nospam nospam.com> writes:

Jérôme M. Berger wrote:
 Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


 And that's since the C days btw. The function call is just an operation,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei

 Hum, I didn't know that. Why does it make the call slower (other than
 the actual ternary operation), is it because it has to load the function
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in performance,
 almost negligible.

 It's an indirect call instead of a direct call.

 Andrei

 Well, yes, that's kinda of what I was thinking already (although I
 referred to more low-level, assembler terms).
 But my question remains, why is an indirect function call slower than a
 direct one, and is that difference significant? (excluding of course the
 possibilities of inlining and other static analysis that direct
 invocations offer) There is no extra overhead other than loading the
 function address from a register/variable, right?

 	AIUI, with a direct call the CPU can anticipate the call (because
 it knows the target address) and start filling the pipeline
 immediately. OTOH, with an indirect call the CPU cannot know the
 target until it reads the register, which means that the pipeline
 will empty itself before the call is executed. With the pipeline
 length of some modern CPUs (Pentium 4 being amongst the worst), this
 can result in very large slowdowns.
 
 		Jerome

That was true ten years ago, but not any more. Modern CPUs can do branch 
prediction of indirect jumps.

May 04 2011

dsimcha <dsimcha yahoo.com> writes:

On 5/4/2011 7:38 PM, Don wrote:
 Jérôme M. Berger wrote:
 Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?


 And that's since the C days btw. The function call is just an
 operation,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei

 Hum, I didn't know that. Why does it make the call slower (other than
 the actual ternary operation), is it because it has to load the
 function
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in
 performance,
 almost negligible.

 It's an indirect call instead of a direct call.

 Andrei

 Well, yes, that's kinda of what I was thinking already (although I
 referred to more low-level, assembler terms).
 But my question remains, why is an indirect function call slower than a
 direct one, and is that difference significant? (excluding of course the
 possibilities of inlining and other static analysis that direct
 invocations offer) There is no extra overhead other than loading the
 function address from a register/variable, right?

 AIUI, with a direct call the CPU can anticipate the call (because
 it knows the target address) and start filling the pipeline
 immediately. OTOH, with an indirect call the CPU cannot know the
 target until it reads the register, which means that the pipeline
 will empty itself before the call is executed. With the pipeline
 length of some modern CPUs (Pentium 4 being amongst the worst), this
 can result in very large slowdowns.

 Jerome

 That was true ten years ago, but not any more. Modern CPUs can do branch
 prediction of indirect jumps.

Yeh, I tested this a while back and can dig up/recreate the benchmark if 
need be.  A virtual function call that's always resolving to the same 
address (so the branch is highly predictable) is not measurably slower 
than a _non-inlined_ direct function call.  However, if the branch is 
not predictable, it's substantially slower and virtual functions usually 
can't be inlined.

May 04 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrei:

 Well in fairness they can be a fair amount more elaborate. The trouble 
 with alloca is that it's impossible to compose with. So in general you 
 need to write something like:
 
 bool onStack = smallEnough(length * sizeof(T));
 auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T)));
 scope(exit) if (!onStack) free(a.ptr);
 initialize(enforce(a));
 scope(exit) clear(a);
 
 This block is difficult to factor out because of alloca.

See this too:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=135208

Keep in mind that alloca() returns null more often than malloc, and when
alloca() return null, it's still meaningful to try to use malloc(). So the
semantics you have written is not good enough yet (as mine. A merge of the two
seems better).

My idea of D-style VLAs is something built-in that encapsulates that semantics
(that's more complex than C99 VLA semantics).

I am suggesting a built-in because there is some more semantics to keep into
account for a good implementation:
- Probably you want the arrays created on the stack to have an immutable length
(otherwise you have to switch to a normal fully GC-heap dynamic array, but I
don't like this).
- You have to return such arrays from functions, and to give them to function
arguments. As fixed-sized arrays they probably need to be managed by value
(unless you use "ref"), so they are different from dynamic arrays, and closer
to fixed-sized ones.

Bye,
bearophile

Apr 23 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrei:

 bool onStack = smallEnough(length * sizeof(T));
 auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T)));
 scope(exit) if (!onStack) free(a.ptr);
 initialize(enforce(a));
 scope(exit) clear(a);
 
 This block is difficult to factor out because of alloca.

As I have explained in my old enhancement request about VLAs and in a recent
post, this code is dangerous, because alloca() in some implementations
deallocates at the end of the function, while in other cases it deallocates at
the end of the scope. A well implemented D-VLA needs be sure to free the stack
space at the end of the scope.

Bye,
bearophile

Apr 23 2011

Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> writes:

2011/4/22 Timon Gehr <timon.gehr gmx.ch>:
 That is only a problem if the reference count of that resource changes at=

 a very
 high frequency. The described problem also implies that the threads would=

 not need
 any form of synchronization for the data (otherwise the reference count c=

ertainly
 would not be a bottleneck.)

....
 Michael Stover wrote:
 I'd like to say "were proved" rather than "are believed", but I don't ac=


tually
 know where to go for such evidence.
 =C2=A0However, I do believe many scripting languages, such as python, ev=


entually
 ditched the reference counting technique for
 generational, and Java has very fast GC, so I am inclined to believe tho=


se
 real-life solutions than Linus.

 Well, the GC may be fast when compared to other GCs, but it has to be des=

igned to
 run on general data whose reference structure can be arbitrary. Often, th=

e
 objects/references have a somewhat specialized structure that a smart pro=

grammer
 can exploit, especially if the references point to private data. But that=

 only
 matters if performance is of prime interest, and the gains may be not ver=

y big.

All in all, I think the best approach is a pragmatic one, where
different types of resources can be handled according to different
schemes.

I.E. default to GC-manage everything. After profiling, determining
what resources are mostly used, and where, optimize allocation for
those resources, preferably to scoped allocation, or if not possible,
reference-counted.

Premature optimization is a root of much evil, for instance, the
malloc-paranoid might very well resort to abuse of struct:s, leading
either to lots of manual pointers, or excessive memory copying.

Incidentally, this was the main thing that attracted me to D. Be
lazy/productive where performance doesn't matter much, and focus
optimization on where it does.

Apr 22 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Ulrik Mikaelsson wrote:
 All in all, I think the best approach is a pragmatic one, where
 different types of resources can be handled according to different
 schemes.

 I.E. default to GC-manage everything. After profiling, determining
 what resources are mostly used, and where, optimize allocation for
 those resources, preferably to scoped allocation, or if not possible,
 reference-counted.

 Premature optimization is a root of much evil, for instance, the
 malloc-paranoid might very well resort to abuse of struct:s, leading
 either to lots of manual pointers, or excessive memory copying.

 Incidentally, this was the main thing that attracted me to D. Be
 lazy/productive where performance doesn't matter much, and focus
 optimization on where it does.

That is very true. GC is almost always fast enough or even faster. And it is
clearly most convenient. And yes, identify bottlenecks first, optimize later.
But
I also think programs that have some concern about efficient memory allocation
(with GC or without GC) tend to be better designed in general. This actually
increases productivity. Plus, it reduces the need for complicated optimizations
later on. This in order increases maintainability.

-Timon

Apr 22 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/22/11 4:04 PM, Timon Gehr wrote:
[snip]
 This is also the reason I think it is a bad idea to deprecate D's
 'delete'.

The functionality is not going away.

Andrei

Apr 23 2011

Timon Gehr <timon.gehr gmx.ch> writes:

 On 4/22/11 4:04 PM, Timon Gehr wrote:
 [snip]
 This is also the reason I think it is a bad idea to deprecate D's
 'delete'.

 The functionality is not going away.

 Andrei

Yes I realize that. It is a matter of syntax and aesthetics. Handling allocation
and deallocation non-uniformly renders the language ugly (or are you considering
to remove the 'new' keyword too?). And it discourages people to get informed
about
custom allocators etc. I mean, how will custom allocators look like? The cost of
having the delete keyword in the language is low. Before removing that, why not
consider the 'body' keyword instead?

Timon

Apr 23 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/23/11 11:33 AM, Timon Gehr wrote:
 On 4/22/11 4:04 PM, Timon Gehr wrote:
 [snip]
 This is also the reason I think it is a bad idea to deprecate D's
 'delete'.

 The functionality is not going away.

 Andrei

 Yes I realize that. It is a matter of syntax and aesthetics. Handling
allocation
 and deallocation non-uniformly renders the language ugly (or are you
considering
 to remove the 'new' keyword too?).

Allocation and deallocation are not symmetric and as such handling them 
in a uniform way would be a mistake that perpetuates a poor 
understanding of the underlying realities of memory allocation and 
object creation. I suggest you reconsider.

 And it discourages people to get informed about
 custom allocators etc.

I don't see the relationship here.

 I mean, how will custom allocators look like? The cost of
 having the delete keyword in the language is low. Before removing that, why not
 consider the 'body' keyword instead?

The cost of keeping "delete" in the language is making a rarely-used, 
dangerous facility with loosely-defined semantics straight inside the 
language. It reflects poor language design for many reasons.


Andrei

Apr 23 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Andrei Alexandrescu wrote:
 Allocation and deallocation are not symmetric and as such handling them
 in a uniform way would be a mistake that perpetuates a poor
 understanding of the underlying realities of memory allocation and
 object creation. I suggest you reconsider.

'char' is completely orthogonal to memory allocation and it still gets handled
uniformly to it syntactically. So I do not really get the point here.

 And it discourages people to get informed about
 custom allocators etc.

I don't see the relationship here.

I cannot imagine that the syntax for them will be particularly nice after the
removal of 'delete'. Who likes features with unnecessarily clumsy syntax?

The cost of keeping "delete" in the language is making a rarely-used,
dangerous facility with loosely-defined semantics straight inside the
language. It reflects poor language design for many reasons.

Andrei

I think we are talking about different things. I do _not_ want to use the
keyword
to somehow try to deallocate GC allocated memory. (who would actually want to do
this anyways?) I want it for custom allocators as in
http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would
be
completely specified by me. Removing the keyword means ruining that use case.
You
can also find a nice, clean and strict semantic specification for delete when
called on GC memory that works on all platforms and with all GC implementations:
"do nothing, GC memory is managed automatically." As it is done now, I agree it
is
not particularly nice.

Timon

Apr 23 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Allocation and deallocation are not symmetric and as such handling them
 in a uniform way would be a mistake that perpetuates a poor
 understanding of the underlying realities of memory allocation and
 object creation. I suggest you reconsider.

 'char' is completely orthogonal to memory allocation and it still gets handled
 uniformly to it syntactically. So I do not really get the point here.

 And it discourages people to get informed about
 custom allocators etc.

 I don't see the relationship here.

 I cannot imagine that the syntax for them will be particularly nice after the
 removal of 'delete'. Who likes features with unnecessarily clumsy syntax?

 The cost of keeping "delete" in the language is making a rarely-used,
 dangerous facility with loosely-defined semantics straight inside the
 language. It reflects poor language design for many reasons.

 Andrei

 I think we are talking about different things. I do _not_ want to use the
keyword
 to somehow try to deallocate GC allocated memory. (who would actually want to
do
 this anyways?) I want it for custom allocators as in
 http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics
would be
 completely specified by me. Removing the keyword means ruining that use case.
You
 can also find a nice, clean and strict semantic specification for delete when
 called on GC memory that works on all platforms and with all GC
implementations:
 "do nothing, GC memory is managed automatically." As it is done now, I agree
it is
 not particularly nice.

 Timon

Why not allow "delete" only in classes with custom allocators? AFAIK 
they could be determined at compile time. Please let me know if I'm wrong.

Apr 23 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Piotr Szturmaj wrote:
 Why not allow "delete" only in classes with custom allocators? AFAIK
 they could be determined at compile time. Please let me know if I'm wrong.

Nevermind. Andrei informed that class allocators are being deprecated, 
so my question is no longer relevant.

Apr 24 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/23/11 1:05 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Allocation and deallocation are not symmetric and as such handling them
 in a uniform way would be a mistake that perpetuates a poor
 understanding of the underlying realities of memory allocation and
 object creation. I suggest you reconsider.

 'char' is completely orthogonal to memory allocation and it still gets handled
 uniformly to it syntactically. So I do not really get the point here.

The point is that some naively believe "new" and "delete" are some 
simple do and undo actions and as such they somehow deserve the same 
level of support. In reality they have dramatically different issues and 
are profoundly asymmetric.

 And it discourages people to get informed about
 custom allocators etc.

 I don't see the relationship here.

 I cannot imagine that the syntax for them will be particularly nice after the
 removal of 'delete'. Who likes features with unnecessarily clumsy syntax?

 The cost of keeping "delete" in the language is making a rarely-used,
 dangerous facility with loosely-defined semantics straight inside the
 language. It reflects poor language design for many reasons.

 Andrei

 I think we are talking about different things. I do _not_ want to use the
keyword
 to somehow try to deallocate GC allocated memory. (who would actually want to
do
 this anyways?) I want it for custom allocators as in
 http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics
would be
 completely specified by me. Removing the keyword means ruining that use case.
You
 can also find a nice, clean and strict semantic specification for delete when
 called on GC memory that works on all platforms and with all GC
implementations:
 "do nothing, GC memory is managed automatically." As it is done now, I agree
it is
 not particularly nice.

 Timon

The class-specific allocators are an exceedingly bad design carried over 
from C++. They will be deprecated from D as well.

Even C++, where the features you are talking about have existed for many 
years, is shunning their use. Good applications make very sparingly use 
of "delete" and virtually none uses class-specific allocators. 
Experience with C++ is the best proof that the delete keyword and 
class-specific allocators have no place in the D programming language.


Andrei

Apr 23 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Andrei Alexandrescu wrote:
 The point is that some naively believe "new" and "delete" are some
 simple do and undo actions and as such they somehow deserve the same
 level of support. In reality they have dramatically different issues and
 are profoundly asymmetric.


 The class-specific allocators are an exceedingly bad design carried over
 from C++. They will be deprecated from D as well.

Does that imply that the feature is going away after all?

 Even C++, where the features you are talking about have existed for many
 years, is shunning their use. Good applications make very sparingly use
 of "delete" and virtually none uses class-specific allocators.
 Experience with C++ is the best proof that the delete keyword and
 class-specific allocators have no place in the D programming language.


 Andrei

Virtually no good application uses them? The _DMD compiler implementation_ uses
them. Classes 'Token' and 'Scope' have class-specific optimized allocators. I
don't think Walter would have done that if it did not give quite good benefits.
D
is a systems programming language after all. It is not Java or some scripting
language. I don't think D should/can get rid of class-specific allocators if
there
is no good alternative for sane uses. Even if they are not used often and should
not be used often.

C++ is not garbage collected by default. Implementing a GC root class for
example
is another legitimate use case for class-specific allocators. In D, things are
somewhat reverse, class-specific allocators would be used for manual memory
management. D should support that.

So if class-specific allocators go away, what is the alternative? Sure, it is
always possible to allocate using custom functions (for D structs, at least.)
But
then, if allocations of one struct/class turn out to be a bottleneck somewhere
late in the development, the whole code will have to be changed instead of just
the addition of some simple custom class-specific allocator. (This is even worse
when developing a library). Also, just testing the impact of a custom allocator
will be a pain for the same reason. Why is that better?


Timon

Apr 24 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 04/24/2011 01:21 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 The point is that some naively believe "new" and "delete" are some
 simple do and undo actions and as such they somehow deserve the same
 level of support. In reality they have dramatically different issues and
 are profoundly asymmetric.


 The class-specific allocators are an exceedingly bad design carried over
 from C++. They will be deprecated from D as well.

 Does that imply that the feature is going away after all?

 Even C++, where the features you are talking about have existed for many
 years, is shunning their use. Good applications make very sparingly use
 of "delete" and virtually none uses class-specific allocators.
 Experience with C++ is the best proof that the delete keyword and
 class-specific allocators have no place in the D programming language.


 Andrei

 Virtually no good application uses them? The _DMD compiler implementation_ uses
 them. Classes 'Token' and 'Scope' have class-specific optimized allocators. I
 don't think Walter would have done that if it did not give quite good benefits.

A class-specific freelist-based allocator in fact stands in the way of a 
good general-purpose allocator. At Facebook we did extensive 
measurements on a variety of applications and workloads and the best 
approach was to just disable all STL class-specific allocators (by 
defining GLIBCXX_FORCE_NEW) and let jemalloc take care of all 
allocations, to great effect.

Today's good general-purpose allocators are as good or usually 
considerably better than custom allocators. The one category of custom 
allocators that is still significantly better are regions, but they have 
quite specific and restricted semantics.

 D
 is a systems programming language after all. It is not Java or some scripting
 language. I don't think D should/can get rid of class-specific allocators if
there
 is no good alternative for sane uses. Even if they are not used often and
should
 not be used often.

 C++ is not garbage collected by default. Implementing a GC root class for
example
 is another legitimate use case for class-specific allocators. In D, things are
 somewhat reverse, class-specific allocators would be used for manual memory
 management. D should support that.

 So if class-specific allocators go away, what is the alternative? Sure, it is
 always possible to allocate using custom functions (for D structs, at least.)
But
 then, if allocations of one struct/class turn out to be a bottleneck somewhere
 late in the development, the whole code will have to be changed instead of just
 the addition of some simple custom class-specific allocator. (This is even
worse
 when developing a library). Also, just testing the impact of a custom allocator
 will be a pain for the same reason. Why is that better?


 Timon

It is much better in very many ways, in particular in D. For one simple 
argument, consider:

auto obj = new Widget;

Simple modularity considerations would require this operator to have 
semantics independent of Widget, such that modular and generic code can 
use it properly. Custom class allocators ruin all that, because they 
would require the use of delete paired with new, whereas the general 
operator requires no action.

Memory allocation and object construction are difficult topics, 
particularly in high-performance settings. A seasoned engineer 
understands that and does not naively believe that a smattering of 
well-intended but very badly executed language support will make that 
all easier. (Contrast that with e.g. reference counting semantics using 
RAII, which relies on a much better part of the language and does offer 
reasonable guarantees.)

I am sorry, you simply have no case - each and every argument you put 
forth has no strength or is just wrong. We could spend time on debating 
each, but I suspect that would do little toward convincing you of what 
is after all a simple fact, but one with many subtle facets. It might 
not a good use of our time to further engage in a diatribe on this. The 
delete keyword will go, as will class-specific new and delete.


Andrei

Apr 24 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Since this is definite we should remove them from the docs as well.
Want me to file it to bugzilla?

Apr 24 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 04/24/2011 03:35 PM, Andrej Mitrovic wrote:
 Since this is definite we should remove them from the docs as well.
 Want me to file it to bugzilla?

Please do. BTW I just fixed the core-related documentation bug on 
d-programming-language.org.

Andrei

Apr 24 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 4/24/11, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 04/24/2011 03:35 PM, Andrej Mitrovic wrote:
 Since this is definite we should remove them from the docs as well.
 Want me to file it to bugzilla?

 Please do. BTW I just fixed the core-related documentation bug on
 d-programming-language.org.

 Andrei

Fantastic, thank you.

Apr 24 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrei:

 It might not a good use of our time to further engage in a diatribe on this.

Just a note: you have not wasted your time discussing this, because I am not
expert on such matters, so I learn something from informed discussions :-)

If a book that explains C language is just 200 pages long, a book that explains
the design purpose, design constraints and design compromises of every feature
of C language requires a much longer book (800 pages?), and it's probably also
much more interesting :-)

Bye,
bearophile

Apr 24 2011

Francisco Almeida <francisco.m.almeida gmail.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Andrei:
 It might not a good use of our time to further engage in a diatribe on this.

 Just a note: you have not wasted your time discussing this, because I am not

expert on such matters, so I learn something from informed discussions :-)
 If a book that explains C language is just 200 pages long, a book that explains

the design purpose, design constraints and design compromises of every feature
of
C language requires a much longer book (800 pages?), and it's probably also much
more interesting :-)
 Bye,
 bearophile

As one of those who were the most reluctant about abandoning delete, I have come
to realize that it is better to make D classes strictly garbage collected, and
drop delete.

It should be noted that D supports RAII, as long as one uses struct (thus
avoiding
dynamic polymorphism, which shouldn't make a big difference once you think about
it), and this gives D an important semantic separation lacking in C++ (or even
those languages on the other side of the fence such as Java, for that matter).

Making a habit of declaring struct instead of class when I intend to create
"objects" on the stack instead of the heap, while resorting to garbage collected
classes when it makes sense to do so, has made my life simpler.

I do believe, however, that there is a recurrent communication error going on
here.

I would kindly ask the language designers to please update the documentation and
write an article explaining the advantages of the approach (without making the
mistake of turning it into a blind praise of garbage collection, that is).
Newcomers need to know how to obtain garbage collected or RAII data structures
from D, whenever they want to.

Apr 24 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei

Ok, lets stop. Custom allocators and delete should probably be removed from D:

Some of my facts:
* freelist is faster if most allocation and deallocation concentrates on only
one
class. (see attachment) I think this is also the reason it increases DMDs speed.
Token and Scope are quite obvious bottlenecks..
* such bottlenecks will most of the time only be detected when development is
almost finished. (Almost never for library code, so I don't get why STL uses
custom allocators at all.)
* premature optimization is evil (see STL custom allocators)
* late optimization: it is easy if there are custom allocators. In C++ it is
even
trivial. Nice design. Still no clue why STL is abusing it.
* that form of custom allocators does not play nicely with Ds Garbage collector
though, because it changes the semantics of 'new', ergo builtin custom
allocators
should probably indeed be removed from D.
* this makes this form of optimization painful in D without custom allocators.
This is what I was instinctively scared about. Fact is: it is also painful with
them, because you don't get the 'delete's for free as in C++. I was somewhat
inconsiderate on this.
* it is possible that hinting the GC is only about twice as bad (again, see
attachment).


And some notes:
* the fact that you think all my points were worthless or wrong implies that you
either did not read all of them or have not considered them yourself when making
your decision. I think that is scary.
* There are many ways in which I am always right. The most dominant being that
any
sufficiently smart guy with a PhD in rocket science would totally agree with all
my statements. No pun/offense intended.
* this discussion would have been a waste of your time, had it not been public.


Thanks for discussing. It was helpful.


Timon

PS: The current GC dies without hints on my benchmark, so it is indeed a nice
idea
to keep hints:
version=GC:       1m45s
version=GC_hints: ~6.8s
version=freelist: ~3.7s
version=c_heap:   ~4.5s
begin 644 benchmark.d



M<VEO;CU'0SH ("` ("` ,6TT-7,*("H =F5R<VEO;CU'0U]H:6YT<SH ?C8N
M.',*("H +R]U<VEN9R!C=7-T;VT 86QL;V-A=&]R<SH*("H =F5R<VEO;CUF
M<F5E;&ES=#H ?C,N-W,*("H =F5R<VEO;CUC7VAE87`Z("` ?C0N-7,*("HO
M" H*:6UP;W)T('-T9"YS=&1I;SL*:6UP;W)T('-T9"YC+G-T9&QI8CL*:6UP
M;W)T('-T9"YC+G-T<FEN9SL*:6UP;W)T('-T9"YA;&=O<FET:&TZ<W=A<#L*

M=F5R<VEO;CUF<F5E;&ES=#L*+R]V97)S:6]N/6-?:&5A<#L*" IV97)S:6]N
M*&9R965L:7-T*2!V97)S:6]N/6-A;&QD96P["G9E<G-I;VXH1T-?:&EN=',I
M('9E<G-I;VX]8V%L;&1E;#L*=F5R<VEO;BAC7VAE87`I('9E<G-I;VX]8V%L
M;&1E;#L
M=&EC(&9O;R!F<F5E;&ES=#L*"69O;R!N97AT.PH)=F5R<VEO;BAF<F5E;&ES
M="E[" D);F5W*'-I>F5?="!S*7L*"0D):68H9G)E96QI<W0 (6ES(&YU;&PI
M>PH)"0D)=F]I9"`J<CUC87-T*'9O:60J*69R965L:7-T.PH)"0D)9G)E96QI
M<W0]9G)E96QI<W0N;F5X=#L*"0D)"7)E='5R;B!R.PH)"0E]" D)"7)E='5R
M;B!C87-T*'9O:60J*6YE=R!C:&%R6W-=.R\O8V]U;&0 =7-E(&UA;&QO8RAS
M*3L 969F96-T<R!A<F4 <VEM:6QA< H)"7T*"0ED96QE=&4H=F]I9"`J<"E[
M" D)"6EF*'`I>PH)"0D)9F]O(&]L9&9L/69R965L:7-T.PH)"0D)9G)E96QI
M<W0]8V%S="AF;V\I<#L*"0D)"69R965L:7-T+FYE>'0];VQD9FP[" D)"7T*
M"0E]" E]" EV97)S:6]N*&-?:&5A<"E[" D);F5W*'-I>F5?="!S*7MR971U
M<FX ;6%L;&]C*',I.WT*"0ED96QE=&4H=F]I9"`J<"E[:68H<"D 9G)E92AP


M:6YG+"`Q.B!E>'!E8W1E9"!S:')I;FMI;F<*"69O<F5A8V H=#LP

M<CTP.PH)"7UE;'-E(&EF*'1O<#T
M<G-I;VXH8V%L;&1E;"D 9&5L971E(&%;=&]P73L*"0D)9&ER/3$[" D)?65L
M<V4 :68H9&ER7B$H<F%N9" I)3,I*7LO+VEF*"%D:7(F)B$H<F%N9" I)3,I
M?'QD:7(F)G)A;F0H*24S*7L*"0D)=&]P+2T[" D)"79E<G-I;VXH8V%L;&1E
M;"D 9&5L971E(&%;=&]P73L*"0E]96QS92!A6W1O<"LK73UN97< 9F]O.PH)
M"6EF*"%R86YD*"DE,3`P*2!F;W(H:6YT(&D],#MI
985LP72QA6W)A;F0H*25T;W!=*3L*"7T*?0``
`
end

Apr 24 2011

Iain Buclaw <ibuclaw ubuntu.com> writes:

== Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei


Is it correct that the current D GC implementation puts delete'd allocations
into
a freelist (or bucket) to be reused when 'new' is invoked again?

 Ok, lets stop. Custom allocators and delete should probably be removed from D:
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.

I find it nuts that someone considers blaming/shunning language features just to
mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance
on the subject that professes that view.

Apr 24 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 04/24/2011 08:08 PM, Iain Buclaw wrote:
 == Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei


 Is it correct that the current D GC implementation puts delete'd allocations
into
 a freelist (or bucket) to be reused when 'new' is invoked again?

I don't know. If it does, an additional level of freelists only hurts.

 Ok, lets stop. Custom allocators and delete should probably be removed from D:
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.

 I find it nuts that someone considers blaming/shunning language features just
to
 mask bad runtime semantics they refuse to fix - but maybe it's just my
ignorance
 on the subject that professes that view.

I'd appreciate if you provided more detail on that. Far as I can tell 
the problem is with the feature under discussion, not some alleged 
unwillingness to do work.


Andrei

Apr 24 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 24 Apr 2011 21:13:43 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 04/24/2011 08:08 PM, Iain Buclaw wrote:
 == Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on  
 debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this.  
 The
 delete keyword will go, as will class-specific new and delete.


 Andrei


 Is it correct that the current D GC implementation puts delete'd  
 allocations into
 a freelist (or bucket) to be reused when 'new' is invoked again?

 I don't know. If it does, an additional level of freelists only hurts.

To the best of my knowledge, for 'small' allocations, the current GC uses  
free-lists of power-of-2 sized blocks. (I think small is defined as  
anything under 4k)

Apr 24 2011

Sean Kelly <sean invisibleduck.org> writes:

On Apr 24, 2011, at 7:25 PM, Robert Jacques wrote:
=20
 To the best of my knowledge, for 'small' allocations, the current GC =

uses free-lists of power-of-2 sized blocks. (I think small is defined as =
anything under 4k)

That's correct.=

Apr 25 2011

Iain Buclaw <ibuclaw ubuntu.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 04/24/2011 08:08 PM, Iain Buclaw wrote:
 == Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei


 Is it correct that the current D GC implementation puts delete'd allocations
into
 a freelist (or bucket) to be reused when 'new' is invoked again?

 I don't know. If it does, an additional level of freelists only hurts.
 Ok, lets stop. Custom allocators and delete should probably be removed from D:
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.

 I find it nuts that someone considers blaming/shunning language features just
to
 mask bad runtime semantics they refuse to fix - but maybe it's just my
ignorance
 on the subject that professes that view.

 I'd appreciate if you provided more detail on that. Far as I can tell
 the problem is with the feature under discussion, not some alleged
 unwillingness to do work.
 Andrei

I was thinking of where one implements an alternative runtime that may not
necessarily interface with a GC, or has different semantics on the use of the
keyword (the compiler, after all, doesn't know/care what exactly 'delete' does,
it
just does it's job at emitting the correct action we expect from it). Though it
was late night and I perhaps didn't quite put it right anyway.

Having literally slept on it though, a runtime with a good GC is one that
doesn't
offer a choice to 'delete' memory - so I am on your side.  Though reflecting
from
my experience with the D2 GC (thinking from a compiler implementers POV), there
isn't much I can really praise about it.

Custom (de)allocators I can also see a very good point for removal too, however
much a stab in the back for those who can use them in the - perhaps
'one-and-only'
- case for a small amount of speed gain.

Iain

Apr 25 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 04/25/2011 10:06 AM, Iain Buclaw wrote:
 I was thinking of where one implements an alternative runtime that may not
 necessarily interface with a GC, or has different semantics on the use of the
 keyword (the compiler, after all, doesn't know/care what exactly 'delete'
does, it
 just does it's job at emitting the correct action we expect from it). Though it
 was late night and I perhaps didn't quite put it right anyway.

 Having literally slept on it though, a runtime with a good GC is one that
doesn't
 offer a choice to 'delete' memory - so I am on your side.  Though reflecting
from
 my experience with the D2 GC (thinking from a compiler implementers POV), there
 isn't much I can really praise about it.

 Custom (de)allocators I can also see a very good point for removal too, however
 much a stab in the back for those who can use them in the - perhaps
'one-and-only'
 - case for a small amount of speed gain.

 Iain

I understand. The right response to this is defining a good custom 
allocator API and providing good implementations of it.

Andrei

Apr 25 2011

"Kim D" <eonwe linux.nu> writes:

 On 04/25/2011 10:06 AM, Iain Buclaw wrote:
 I was thinking of where one implements an alternative runtime that may 
 not
 necessarily interface with a GC, or has different semantics on the use of 
 the
 keyword (the compiler, after all, doesn't know/care what exactly 'delete' 
 does, it
 just does it's job at emitting the correct action we expect from it). 
 Though it
 was late night and I perhaps didn't quite put it right anyway.

 Having literally slept on it though, a runtime with a good GC is one that 
 doesn't
 offer a choice to 'delete' memory - so I am on your side.  Though 
 reflecting from
 my experience with the D2 GC (thinking from a compiler implementers POV), 
 there
 isn't much I can really praise about it.

 Custom (de)allocators I can also see a very good point for removal too, 
 however
 much a stab in the back for those who can use them in the - perhaps 
 'one-and-only'
 - case for a small amount of speed gain.

 Iain

 I understand. The right response to this is defining a good custom 
 allocator API and providing good implementations of it.

 Andrei

Sounds like a good way forward. However, certain operations are currently 
not supported unless the GC is enabled, for instance 'Associative Arrays of 
Strings'... do you imagine the future allocator API beeing flexible enough 
to facilitate this use-case in a GC-less environment? Or at least allowing 
more of the functionality which is currently unsupported without GC?

Kim

Apr 25 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/25/11 4:20 PM, Kim D wrote:
 On 04/25/2011 10:06 AM, Iain Buclaw wrote:
 I was thinking of where one implements an alternative runtime that may
 not
 necessarily interface with a GC, or has different semantics on the use of
 the
 keyword (the compiler, after all, doesn't know/care what exactly 'delete'
 does, it
 just does it's job at emitting the correct action we expect from it).
 Though it
 was late night and I perhaps didn't quite put it right anyway.

 Having literally slept on it though, a runtime with a good GC is one that
 doesn't
 offer a choice to 'delete' memory - so I am on your side.  Though
 reflecting from
 my experience with the D2 GC (thinking from a compiler implementers POV),
 there
 isn't much I can really praise about it.

 Custom (de)allocators I can also see a very good point for removal too,
 however
 much a stab in the back for those who can use them in the - perhaps
 'one-and-only'
 - case for a small amount of speed gain.

 Iain

 I understand. The right response to this is defining a good custom
 allocator API and providing good implementations of it.

 Andrei

 Sounds like a good way forward. However, certain operations are currently
 not supported unless the GC is enabled, for instance 'Associative Arrays of
 Strings'... do you imagine the future allocator API beeing flexible enough
 to facilitate this use-case in a GC-less environment? Or at least allowing
 more of the functionality which is currently unsupported without GC?

 Kim

An entity relying on GC will by definition have more freedom and more 
safety than one relying on an alternative allocator. It is possible to 
make that transparent to some extent (e.g. define a type to be either 
refcounted or not depending on the allocator used). Total transparency 
would be difficult to achieve.

Andrei

Apr 25 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 04/24/2011 07:10 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei

 Ok, lets stop. Custom allocators and delete should probably be removed from D:

Thank you for agreeing.

 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..

I agree. There are two other things that I don't agree with:

1. This particular point does nothing to help the original argument, 
which was to keep the delete keyword (and in particular class 
allocators) in the D programming language.

2. Freelists are not all smiles - have their big problems too. When 
deciding to go with freelist the implementer must also decide on a 
strategy for releasing that memory back to the backend allocator on some 
schedule. Choosing a "good" schedule is already a full-blown problem 
that is best left to the the specialists implementing the backend 
allocator. My colleagues discovered with horror that unless the 
environment variable GLIBCXX_FORCE_NEW was defined, the STL node-based 
containers would _never_ give back memory once allocated.

Memory allocation is difficult, and for any allocator some workloads 
exist that makes it worse than all others. Nevertheless, recent 
developments in general-purpose memory allocators (e.g. the 
Low-Fragmentation Heap on Windows and a slew of allocators on Unix 
including ptmalloc, tcmalloc, jemalloc, and others) have shown that it's 
best to just use a good general-purpose allocator instead of fiddling 
with custom allocation.

 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).

OK.

Allow me to add a speculation here. I believe that one mistake that both 
class-level new/delete and STL allocators have done was to associate 
allocators with types. That just seems to force the proverbial square 
peg through the round hole. General-purpose allocators just focus on 
size classes and do awfully well. In addition, they do that in an 
integrated, non-modular way (as work on HeapLayers has shown, 
modularizing allocators is difficult and not very intuitive).

 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.

Well I think I owe you an apology. It is very difficult to use limited 
time resources to convince someone of rather subtle matters, 
particularly if the discussion starts not with a question (e.g. "In 
light of what I said, I wonder whether removing delete is the right way 
to go."), but with an assertion. At that point I need to make a sort of 
a cold estimate - do I have a reasonable chance of convincing someone of 
something within a reasonable time frame? It's difficult to make and 
convey that decision without seeming impolite, although please believe 
me being curt pains me as much as anyone.

To make matters more difficult, I'm not an expert in memory allocation 
and garbage collection. Probably I'm just at the point I can recognise 
good work when I see it. Yet, after a book chapter on memory allocation, 
a couple of articles written on memory allocation, a course I'm teaching 
on memory allocation, and a good amount of work on STL's interaction 
with jemalloc while at Facebook - I must admit I've become a bit jaded 
and as such less sanguine about arguing my points.

 * There are many ways in which I am always right. The most dominant being that
any
 sufficiently smart guy with a PhD in rocket science would totally agree with
all
 my statements. No pun/offense intended.

Not sure how you mean that. I mean, nobody here is discussing your 
competence on any personal trait of yours, and in particular I'm not 
keen of drawing any sweeping conclusion from any one discussion. You may 
as well know better than everybody about allocation, but so far you 
didn't seem to have found the right words to convey that. The simple 
fact is that here all anybody knows about you is what you write. And 
what you wrote failed to produce (now by your own admission as far as I 
can understand) a viable case for keeping the delete operator and 
class-level allocators in the D programming language.

 * this discussion would have been a waste of your time, had it not been public.

 Thanks for discussing. It was helpful.

I hope that's not in jest :o).


Andrei

Apr 24 2011

Sean Kelly <sean invisibleduck.org> writes:

On Apr 24, 2011, at 1:17 PM, Andrei Alexandrescu wrote:
=20
 It is much better in very many ways, in particular in D. For one =

simple argument, consider:
=20
 auto obj =3D new Widget;
=20
 Simple modularity considerations would require this operator to have =

semantics independent of Widget, such that modular and generic code can =
use it properly. Custom class allocators ruin all that, because they =
would require the use of delete paired with new, whereas the general =
operator requires no action.

Agreed, so long as there's some form of support for placement new.  But =
that could easily be an explicit __ctor() call into the allocated space, =
which works in D but I don't believe it works in C++ (thus the explicit =
placement new syntax).=

Apr 25 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 4/22/2011 2:04 PM, Timon Gehr wrote:
 The reason Java is garbage collected is _not_ performance, but primarily
 reliability.

I believe a GC is required if one wishes to prove memory safety, which is 
definitely a design goal of Java.

Apr 23 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Linus with some good observations on garbage collection