www.digitalmars.com         C & C++   DMDScript  

D - Garbage Collection is bad... other comments

reply rafael baptista <rafael_member pathlink.com> writes:
I just saw the D spec today... and it is quite wonderful. There are so many ways
that C and C++ could be better and D addresses them. But I think D is a little
too ambitious to its detriment. It has a good side: it fixes syntactic and
semantic problems with C/C++. It eliminates all kinds of misfeatures and cruft
in C++.

And D has a bad side: it attempts to implement all kinds of features that are
best left to the programmer - not the language spec: built in string comparison,
dynamic arrays, associative arrays, but the worst in my mind is the inclusion of
garbage collection.

As is pointed out in the D overview garbage collection makes D unusable for real
time programming - this makes it unusable for Kernel programming, device
drivers, computer games, any real time graphics, industrial control etc. etc. It
limits the language in a way that is really not necessary. Why not just add in
language features that make it easier to integrate GC without making it a part
of the language.

For my part I wouldn't use GC on any project in any case. My experience has been
that GC is inferior to manually managing memory in every case anyway. Why?

Garbage collection is slower: There is an argument in the Garbage collection
section of the D documents that GC is somehow faster than manually managing
memory. Essentially the argument boils down to that totally bungling manual
memory management is slower than GC. Ok that is true. If I program with tons of
redundant destructors, and have massive memory leaks, and spend lots of time
manually allocating and deallocating small buffers, and constantly twiddling
thousands of ref counts then managing memory manually is a disaster. Most
programs manage memory properly and the overhead of memory management is not a
problem.

The argument that GC does not suffer from memory leaks is also invalid. In GC'd
system, if I leave around references to memory that I never intend to use again
I am leaking memory just as surely as if I did not call delete on an allocated
buffer in c++.

Finally there is the argument that manually managing memory is somehow difficult
and error prone. In C++ managing memory is simple - you create a few templatized
container classes at the start of a project. They will allocate and free *all*
of the memory in your program. You verify that they work correctly and then
write the rest of your program using these classes. If you are calling "new" all
over your C++ code you are asking for trouble. The advantage is that you have
full control over how memory is managed in your program. I can plan exactly how
I will deal with the different types of memory, where my data structures are
placed in memory. I can code explicity to make sure that certain data structures
will be in cache memory at the same time.

I had a similar reaction of having the compiler automatically build me hash
tables, string classes and dynamic arrays. I normally code these things myself
with particular attention to how they are going to be used. The unit test stuff
is also redundant. I can chose to implement that stuff whether the language
supports it or not. I normally make unit test programs that generate output that
I can regress against earlier builds - but I wouldn't want to saddle arbitrary
programs with having to run my often quite extensive and time consuming unit
tests at startup. Imaginary numbers and bit fields should similarly be left up
to the programmer.

I suppose I can just chose not to use these features if I don't want to - but it
strikes me as unnecessary cruft - and I subscribe to the idea beautifully
expressed in the D overview that removing cruft makes a language better.

The GC is worse though - because its in there operating - arbitrarily stalling
my program whether I use it or not.

I urge the implementor of D to think about it this way: you have lots of ideas.
Many of them are really good - but it is likely that despite your best efforts
not all of them are. It would make your project much more likely to succeed if
you decoupled the ideas as much as possible such that the good ones can succeed
without being weighed down by the bad. So wherever possible I think you should
move functionality from the language spec to a standard library. This way the
library can continue to improve, while the language proper could stabilize
early.

This is probably old ground - I assume lots of people have probably already
commented on the GC. I looked for a thread on GC and could not find one. 

I for one would find D a lot more attractive if it did not have GC.
Jan 12 2003
next sibling parent reply Clay <Clay_member pathlink.com> writes:
In article <avsfoq$6nc$1 digitaldaemon.com>, rafael baptista says...
I for one would find D a lot more attractive if it did not have GC.

I disagree. For business applications, it is nice not to have to spend time thinking about memory management, b/c it isn't worth the gains at runtime. For simple senarios, STL and GC are probably the same complexity for the programmer. When the variable goes out of scope within a method, it can be cleaned up. But for more complicated code, it helps that we do not think about memory management and more about the software. For example, when a object is created in class A, given to class B and C, and either B or C will clean it up (it is not know ahead of time), a GC is much easier to write for. It comes down to, "who is D for?". But, I wonder if it is possible to create a language that allows for both GC and STL memory management. disclaimer: I am familiar with STL, but I have never used it.
Jan 12 2003
parent reply Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:
Clay wrote:
 In article <avsfoq$6nc$1 digitaldaemon.com>, rafael baptista says...
 
I for one would find D a lot more attractive if it did not have GC.

I disagree. For business applications, it is nice not to have to spend time thinking about memory management, b/c it isn't worth the gains at runtime. For simple senarios, STL and GC are probably the same complexity for the programmer. When the variable goes out of scope within a method, it can be cleaned up. But for more complicated code, it helps that we do not think about memory management and more about the software. For example, when a object is created in class A, given to class B and C, and either B or C will clean it up (it is not know ahead of time), a GC is much easier to write for. It comes down to, "who is D for?". But, I wonder if it is possible to create a language that allows for both GC and STL memory management.

I know Mr. Baptisa by reputation, so I think that I can anticipate some of his concerns (if he's the same person I'm thinking of). As he's a games programmer, he's very concered with more or less realtime performance, and performance with small bits of memory to allocate. I've thought of many of the same issues, being in the same field, and I think that it's more a matter of problem domain than anything else. I was trying to think of a way to do GC on a console, and I've basicaly come to the conclusion that it's more trouble than it would be worth. While GC is great for people with less pressing time and space concerns, like yourself, seemingly, it lacks a lot of fine grained control, and regaining that control would almost require writing a special case CG for every game, which would certainly take as much at as all of the memory management and debugging on a project. I think that his argument is similar to something that I've been saying, which is that memory management should be more cleanly modularized, so that one can pick and choose among strategies, without much language level interference. The unittest comments are less well founded, since unittest go away in release builds. I imagine that it's easy enough to do testing with the systems that are in there without many problems. Kinda a non-issue for me. I agree that some of the builtins should be moved into the library, but I don't really have the time right now to go very deeply into it. Evan
Jan 13 2003
next sibling parent reply rafael b aptista <rafael_member pathlink.com> writes:
In article <avu432$224e$1 digitaldaemon.com>, Evan McClanahan says...
I know Mr. Baptisa by reputation, so I think that I can anticipate some 
of his concerns (if he's the same person I'm thinking of).  As he's a 
games programmer, he's very concered with more or less realtime 
performance, and performance with small bits of memory to allocate. 

Yeah, that describes me pretty well. Wow. I had no idea I had a reputation! Evan, you and I are in perfect agreement on GC, I think - it may be good for some other projects but no for what we do.
The unittest comments are less well founded, since unittest go away in 
release builds. 

Yeah, I don't agree with you about that. I think unit tests are a good idea. Some people would implement them as a series of quick sanity checks on their libraries that would run at load time in debug builds. A proper unit test in my mind is a console app that does an exhaustive check on all of your code and tests all of your assumptions and prints out a long log of all the things it tried. Every time you make changes to a lib, you run the unit test and then diff the log against a previous run of the log and make sure that the results are the same - or that they only differ in expected ways. You then check into source code control the updated diff log. I have known companies that have implemented source code control in such a way that you cannot check your files back in until the system has regressed all the unit tests. Running unit tests at startup time has many disadvantages: 1. Even in a debug build you have limits on how long you are willing to wait through a unit test. With an offline unit test you can make it do everything you want. 2. Unit tests are better as text oriented apps, and most modern applications are graphical. You would have to make the unit test spew to a log, and then go check the log. Similarly it is a pain to try to run a gui app in a script. So if you want to make scripts that regress your libraries you have to make dummy console apps that only run the unit test in a text mode. 3. Unit tests running at application startup time are not running when they should run. You want a unit test to run every time you change the code that it tests - not necessarily every time you start up your main application. Making a unit test function for a class be a class member function has another disadvantage: You are running the unit test after the class constructor, and the unit test is oriented toward testing only that one instance. To unit test a class I prefer to make a non-member function that exercises the whole API in one function - including all the constructors. Experienced programmers can disagree about the best way to do unit tests - and that is exactly why unit tests should not be built in to the language. The way that D supports unit tests is not the way most programmers who implement unit test chose to implement them. So you are codifying into the language something which is not established as the best practice anyway.
Jan 13 2003
next sibling parent Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:
rafael b aptista wrote:
 In article <avu432$224e$1 digitaldaemon.com>, Evan McClanahan says...
 
I know Mr. Baptisa by reputation, so I think that I can anticipate some 
of his concerns (if he's the same person I'm thinking of).  As he's a 
games programmer, he's very concered with more or less realtime 
performance, and performance with small bits of memory to allocate. 

Yeah, that describes me pretty well. Wow. I had no idea I had a reputation! Evan, you and I are in perfect agreement on GC, I think - it may be good for some other projects but no for what we do.

I applied to your company a long time ago (assuming that you're who I think that you are and not someone with the same name), which is how I know who you are (or might be). Game programming is a smallish world anyway. Some interesting stuff on your site. I assumed that your perspective would be coming from GBA developement, and went from there. A GC would be disastrous in that environment, and I think that it would be bad for a lot of system level stuff as well. But then, I think that my perfect language would be almost nothing in the system core, a robust syntax extender, and a biggish library of standard extensions, so that might explain the slant in my perspective.
The unittest comments are less well founded, since unittest go away in 
release builds. 

Yeah, I don't agree with you about that. I think unit tests are a good idea. Some people would implement them as a series of quick sanity checks on their libraries that would run at load time in debug builds. A proper unit test in my mind is a console app that does an exhaustive check on all of your code and tests all of your assumptions and prints out a long log of all the things it tried. Every time you make changes to a lib, you run the unit test and then diff the log against a previous run of the log and make sure that the results are the same - or that they only differ in expected ways. You then check into source code control the updated diff log. I have known companies that have implemented source code control in such a way that you cannot check your files back in until the system has regressed all the unit tests. Running unit tests at startup time has many disadvantages: 1. Even in a debug build you have limits on how long you are willing to wait through a unit test. With an offline unit test you can make it do everything you want. 2. Unit tests are better as text oriented apps, and most modern applications are graphical. You would have to make the unit test spew to a log, and then go check the log. Similarly it is a pain to try to run a gui app in a script. So if you want to make scripts that regress your libraries you have to make dummy console apps that only run the unit test in a text mode. 3. Unit tests running at application startup time are not running when they should run. You want a unit test to run every time you change the code that it tests - not necessarily every time you start up your main application. Making a unit test function for a class be a class member function has another disadvantage: You are running the unit test after the class constructor, and the unit test is oriented toward testing only that one instance. To unit test a class I prefer to make a non-member function that exercises the whole API in one function - including all the constructors. Experienced programmers can disagree about the best way to do unit tests - and that is exactly why unit tests should not be built in to the language. The way that D supports unit tests is not the way most programmers who implement unit test chose to implement them. So you are codifying into the language something which is not established as the best practice anyway.

Ok, I can see your point. I don't think, though that they're insanely harmful, as you can still write traditional unittests of the kind that you describe, and if people decide that the unittests don't work as a normal feature, then they'll go away eventually. I've used the other DBC features more, so you're likely right. Without a proper computer or internet connection at home, I haven't had a chance to program in D as much as I would like to, and the programs that I have had a chance to write were untested and undocumented hacks, so what do i know about testing? :) Evan
Jan 13 2003
prev sibling next sibling parent reply "Sean L. Palmer" <seanpalmer directvinternet.com> writes:
"rafael b aptista" <rafael_member pathlink.com> wrote in message
news:avukdf$2f0g$1 digitaldaemon.com...
 In article <avu432$224e$1 digitaldaemon.com>, Evan McClanahan says...
I know Mr. Baptisa by reputation, so I think that I can anticipate some
of his concerns (if he's the same person I'm thinking of).  As he's a
games programmer, he's very concered with more or less realtime
performance, and performance with small bits of memory to allocate.

Yeah, that describes me pretty well. Wow. I had no idea I had a

 Evan, you and I are in perfect agreement on GC, I think - it may be good

 some other projects but no for what we do.

On a console game, it's always possible to just allocate the entire memory from the system and manage it yourself. That seems to be what everybody does in C++. In D, you can disable GC, and even if you don't, it doesn't run unless you run out of memory. The problem is that D doesn't have good RAII semantics, so implementing your own memory management is a painful process resembling C++ explicit new and delete. That method is bug ridden. In fact I am not sure D allows you to run ctors on arbitrary memory, so it wouldn't be possible to write your own 'new' function template.
The unittest comments are less well founded, since unittest go away in
release builds.

Yeah, I don't agree with you about that. I think unit tests are a good

 Some people would implement them as a series of quick sanity checks on

 libraries that would run at load time in debug builds. A proper unit test

 mind is a console app that does an exhaustive check on all of your code

 tests all of your assumptions and prints out a long log of all the things

 tried. Every time you make changes to a lib, you run the unit test and

 the log against a previous run of the log and make sure that the results

 same - or that they only differ in expected ways. You then check into

 code control the updated diff log. I have known companies that have

 source code control in such a way that you cannot check your files back in

 the system has regressed all the unit tests.

 Running unit tests at startup time has many disadvantages:

 1. Even in a debug build you have limits on how long you are willing to

 through a unit test. With an offline unit test you can make it do

 want.

 2. Unit tests are better as text oriented apps, and most modern

 graphical. You would have to make the unit test spew to a log, and then go

 the log. Similarly it is a pain to try to run a gui app in a script. So if

 want to make scripts that regress your libraries you have to make dummy

 apps that only run the unit test in a text mode.

 3. Unit tests running at application startup time are not running when

 should run. You want a unit test to run every time you change the code

 tests - not necessarily every time you start up your main application.

 Making a unit test function for a class be a class member function has

 disadvantage: You are running the unit test after the class constructor,

 unit test is oriented toward testing only that one instance. To unit test

 class I prefer to make a non-member function that exercises the whole API

 function - including all the constructors.

Unit tests can be global module-scope functions too.
 Experienced programmers can disagree about the best way to do unit tests -

 that is exactly why unit tests should not be built in to the language.

Seems like it's more of a #ifdef UNITTESTS RunUnitTest(); #endif UNITTESTS
 The way that D supports unit tests is not the way most programmers who

 unit test chose to implement them. So you are codifying into the language
 something which is not established as the best practice anyway.

I kinda like integrating the code for the unit tests into the module it tests. But I would like more control over when they are run. Sean
Jan 13 2003
parent "Mike Wynn" <mike.wynn l8night.co.uk> writes:
 On a console game, it's always possible to just allocate the entire memory
 from the system and manage it yourself.  That seems to be what everybody
 does in C++.  In D, you can disable GC, and even if you don't, it doesn't
 run unless you run out of memory.

and you can therefore create all your required objects up front, and manage then manually; (as Java games / embedded programmers do)
 The problem is that D doesn't have good RAII semantics, so implementing

 own memory management is a painful process resembling C++ explicit new and
 delete.  That method is bug ridden.
 In fact I am not sure D allows you to run ctors on arbitrary memory, so it
 wouldn't be possible to write your own 'new' function template.

agree, and D with placement constructors would be good, the effect on the GC (if not disabled) might be undesirable and would allow some nasty 'unions' to be created (construct two objects in the same space). not only would you need to inform the GC that a memory range was under your control, but any object in there would still need to be walked as they could contain refs to GC managed objects. (one of the stumbleing blocks with realtime Java)
Jan 13 2003
prev sibling parent "Walter" <walter digitalmars.com> writes:
Some great comments! My replies are inline below.

"rafael b aptista" <rafael_member pathlink.com> wrote in message
news:avukdf$2f0g$1 digitaldaemon.com...
 Running unit tests at startup time has many disadvantages:
 1. Even in a debug build you have limits on how long you are willing to

 through a unit test. With an offline unit test you can make it do

 want.

A unit test is not a complete replacement for a comprehensive test suite. It just tests the functionality of individual parts of the project. Unit tests can also be turned on or off on a module by module basis - once one section is debugged, you can turn off unit tests for it until it gets changed. My own experience with D unit testing has been that it has saved me from many, many embarassingly stupid bugs. I don't mind waiting a couple seconds on startup for a debug build. If it takes a long time to run the unit tests, it's probably worth looking at the unit tests and pulling the time consuming bits out into an offline test. That's really what I do with the D compiler system itself. The basics are done with unit tests, the comprehensive time consuming tests are done with a separate test suite. But the unit tests assure me a basic level of functionality every time without needing to run the long test suite.
 2. Unit tests are better as text oriented apps, and most modern

 graphical. You would have to make the unit test spew to a log, and then go

 the log. Similarly it is a pain to try to run a gui app in a script. So if

 want to make scripts that regress your libraries you have to make dummy

 apps that only run the unit test in a text mode.

Unit tests in D are designed to throw an exception when they fail. Catching exceptions and putting up a message box for them isn't a big problem. When I've worked on gui apps, I did use log files extensively. Instead of printf'ing to the console, I just printf'd to a log file.
 3. Unit tests running at application startup time are not running when

 should run. You want a unit test to run every time you change the code

 tests - not necessarily every time you start up your main application.

Doing it when the program starts is a reliable way of doing it. For example, if you have module A and module B, A depends on B, and the code for B changes, it's nice to get the unit test for A run too. Having some external mechanism decide which unit tests to run when particular modules change is difficult to get right, and is certainly wrong if it decides solely on the basis of source file changes.
 Making a unit test function for a class be a class member function has

 disadvantage: You are running the unit test after the class constructor,

 unit test is oriented toward testing only that one instance. To unit test

 class I prefer to make a non-member function that exercises the whole API

 function - including all the constructors.

You can do that in D with a module level unit test.
 Experienced programmers can disagree about the best way to do unit tests -

 that is exactly why unit tests should not be built in to the language.

I suggest you could say that about every language feature - just look at this forum! In any case, in my experience, very very few programmers do unit tests. Why? Because it's a pain to do without language support. In D, unit tests are quick and easy to add, they are right next to the code being tested, and so are more likely to get written and be up to date. Running them at startup means they are more likely to be run.
 The way that D supports unit tests is not the way most programmers who

 unit test chose to implement them. So you are codifying into the language
 something which is not established as the best practice anyway.

D does not prevent doing unit tests in a non-builtin manner. But (as far as I know) D is the very first language to build in the concept of unit tests, in a quick, convenient, and reliable manner. I suggest that this will encourage a much greater percentage of programmers taking the time to even write any unit tests, which will be all to the better. I won't argue that D is the best way possible to do unit tests, experience using the language over time may easilly suggest improvements. But it's a great start, and unit tests are a badly needed language feature.
Jan 23 2003
prev sibling parent Ilya Minkov <midiclub tiscali.de> writes:
Hello.

There exists a GC algorithm, which can run without stopping the 
execution at all, in parallel, and is thus okay for realtime systems.

Three Colour Marking GC

Each object has a marking field, which can hold one of three colors:
  - white: objects to which no references were found. All objects start 
out white.
  - grey: object is known to be reachable (referenced), but wasn't 
scanned for references yet;
  - black: completely inspected. Shall not be revisited by a scanner, 
unless it changes color.

The only runtime modification is that whenever pointers are being 
modifed, their containers have to be re-colored from black to grey. 
After there are no more grey objects, white ones can be collected.

That's how it should work in principle. You can imagine some sufficently 
efficient implementatations.

This algorithm is also in a modified form used in OCaml - that's how 
come that games are written in it. After starting out in mid-seventies 
it has born a number of related algorithms.

Hm. This leads again to the question of a flexible code generation, so 
that it's possible to plug in one or another GC system, or possibly 
another memory managers. Or structural optimisations. A complex feature 
not only by itself, but also for interface design. But on the other hand 
it could take some burden off compiler writer and pass it to library 
writers.

-i.
Jan 28 2003
prev sibling next sibling parent Russell Lewis <spamhole-2001-07-16 deming-os.org> writes:
rafael baptista wrote:
 I just saw the D spec today... and it is quite wonderful. There are so many
ways
 that C and C++ could be better and D addresses them. But I think D is a little
 too ambitious to its detriment. It has a good side: it fixes syntactic and
 semantic problems with C/C++. It eliminates all kinds of misfeatures and cruft
 in C++.
 
 And D has a bad side: it attempts to implement all kinds of features that are
 best left to the programmer - not the language spec: built in string
comparison,
 dynamic arrays, associative arrays, but the worst in my mind is the inclusion
of
 garbage collection.

gc.disable(); In Walter's compiler (although other compilers will differ in the future, I suppose) the GC only runs when the memory allocation routines have run out of buffers that they've gotten from the OS. That is, the underlying routines for 'new' grab somewhat large blocks of virtual memory from the OS, and piece them up into individual allocations. When the allocator has used all of its memory, it runs the garbage collector BEFORE it asks the OS for more memory. It only talks to the OS if it cannot find any place in its current space to make the allocation. If you disable the garbage collector, then it just doesn't do GC when the memory allocator runs out of memory. Just use 'new' and 'delete' as you would have in C++. You can also just disable the gc for a small time: gc.disable(); DoTimeCriticalThing(); gc.enable();
Jan 13 2003
prev sibling next sibling parent reply Ilya Minkov <midiclub 8ung.at> writes:
Please, believe me garbage collection (GC) is the future of game design. 
"Realtime programming" doesn't really include games and other things 
called "realtime", since they have to cope with disc access and such, 
graphic card, other devices, which are  NOT REALTIME! You send some 
command and wait for it to be completed, and this usually lasts much, 
much, much, much longer than a GC cycle.

You don't even have to believe me. Read articles of TAD in HUGI #18, i 
guess. (www.hugi.de) He proposes highly efficient structures for games, 
which are highly complex meshes of (concave) objects. Now, as I read 
this through i was fascinated. But as soon as i thought of 
implementation, i got sudden strong headaches. As later on i was 
learning about Hans Boehms GC for C, then about functional programming 
in OCaml (GC-ed language), it became evident to me, that such structures 
as proposed by TAD can be very easily and efficiently implemented with 
GC and only this way. Ref-counting ultimately fails in many cases.

To the question of efficiency. Though the current implementation of D GC 
is said to be very simple-minded and thus not very efficient, there 
exists a very efficient implementation for C which is farly popular. 
I've read of experiences of creating a simple IDE (Wedit) for LCC-Win32, 
a very simple C compiler. The author claims, that plugging in the Hans 
Boehm GC into the project reduced the time requiered to load a project 
to 1/8 of original, because GC-malloc is simply better optimized than 
the normal one.

The manual says that Boehm GC proves to be significantly faster for 
almost any application managing many tiny objects, but tends to be 
slower for applications working with large objects only. The reason for 
the first claim is that it performs a number of optimisations (heap 
compaction), helping keep objects which are used together near each 
other, which drastically reduces cache misses, and as you might know the 
cache is about as fast as the CPU, while the RAM speed has been 
improving much slower than the CPU/Cache speed and the impact is already 
huge, and is likely to grow in the future. I guess D GC doesn't do such 
things yet, but they are to come. These optimisations can only be 
efficiently implemented in conjuction with a GC, that is, if you do 
them, you've got a GC basically for free.

The second claim is based upon the fact, that Boehm GC tends to spend 
too much time figuring out what is a pointer and what not by content. 
You can limit such pointless search for large pointerless (pun intended) 
objects, but it doesn't solve the issue in general. D GC uses type 
information and *never* searches large masses without use, so that you 
can assume that it should be faster than Boehm GC for large objects. 
That is, D gc should also work cleaner.

And if you're still not convinced, go on and turn it off. You're just 
not doing yourself a favor. D GC does the same amount of work what 
ref-counting does once in a while, while ref-counting has to do that at 
almost any operation.

You can also cause GC cycles by timer once every now and then, keeping 
memory usage constantly low and cycle times short, at the cost of making 
them longer in general. The way i'm currently aware of is to disable and 
then enable the GC, not sure it works.

regards,
-i./midiclub
Jan 13 2003
next sibling parent Burton Radons <loth users.sourceforge.net> writes:
Ilya Minkov wrote:
 The second claim is based upon the fact, that Boehm GC tends to spend 
 too much time figuring out what is a pointer and what not by content. 
 You can limit such pointless search for large pointerless (pun intended) 
 objects, but it doesn't solve the issue in general. D GC uses type 
 information and *never* searches large masses without use, so that you 
 can assume that it should be faster than Boehm GC for large objects. 
 That is, D gc should also work cleaner.

Type-based searching isn't implemented yet. DMD should produce a super-optimised scanning function for each allocated and public type in this module. D's GC could also do block allocation for small objects or those which have proper characteristics in previous runs (good but fairly constant use, with high turnover of allocation and freeing). That makes allocation and freeing take just a few cycles and reduces overhead. The effect of this is incalculable at this time as it gets into cache issues, but I've seen the content-to-pointers ratio go in the hundreds when dealing with mesh vertices (textures were uploaded and then freed, otherwise it would be one pointer for 700Kb data). I'm very optimistic that I'll be able to continually run the GC on any game. But I'm even more sure that it's worth working for. If the GC delay is still too great, then all games have sequence points. When the player changes a level or a discrete segment, or is dead, or is clearly AFK, or if the player is swapping out, loading, saving, switching from the menu, or is not being attacked, then is your time.
Jan 13 2003
prev sibling next sibling parent reply rafael baptista <rafael_member pathlink.com> writes:
In article <avv7ve$2tos$1 digitaldaemon.com>, Ilya Minkov says...
Please, believe me garbage collection (GC) is the future of game design.

I doubt it.
"Realtime programming" doesn't really include games and other things 
called "realtime", since they have to cope with disc access and such, 
graphic card, other devices, which are  NOT REALTIME! 

You are right. That is why game programmers avoid accessing slow devices like disks in the middle of a game. If you absolutely must get something off disk you normally stream it in. You set up an interrupt driven process or something like that. So say I need a new model off the disk in a few seconds, I would place a request to the disk streaming subsystem to start getting the stuff off the disk. If you sit there like fool waiting for your CD drive to spin up, then your game is very choppy.
You don't even have to believe me. 

I don't have to believe you. I write games for a living. ;)
The author claims, that plugging in the Hans 
Boehm GC into the project reduced the time requiered to load a project 
to 1/8 of original, because GC-malloc is simply better optimized than 
the normal one.
The manual says that Boehm GC proves to be significantly faster for 
almost any application managing many tiny objects, but tends to be 
slower for applications working with large objects only.

I don't doubt it. Asking the operating system for a buffer is very very slow. On windows it actually pauses your thread, and does an OS context switch. If you are allocating tiny buffers straight from the OS all the time, you need to go back to programming school. Memory should be allocated from the OS in large chunks and then doled out inside your progam as needed - using your own optimized allocators. This is what gets you cache coherency. This is what guarantees that all the vertex data for a model is in cache when you transform it, etc. It is not fair to compare using the most highly optimized GC you have ever heard of against bad manual memory management practices.
The reason for 
the first claim is that it performs a number of optimisations (heap 
compaction), 

There is no heap compaction better than me placing my data structures in memory explicitly for cache coherency. Plus there is no penalty for me having holes in the memory space that are bigger than a memory page anyway. If a whole page is unused it will never be loaded into the cache and will never trouble me. Compacting it away is just wasting bandwidth on the bus that I could be using to load textures.
And if you're still not convinced, go on and turn it off. You're just 
not doing yourself a favor. D GC does the same amount of work what 
ref-counting does once in a while, while ref-counting has to do that at 
almost any operation.

The alternative to GC is not ref counting. Few objects have such an unpredictable life cycle that they cannot be managed any other way. GC essentially embed the ref counting inside the GC subsystem and makes you do some kind of reference tracking for all memory buffer wether you need it or not. I ref count very few objects in a given project. And when I do I'm typically ref counting entire objects, not every single buffer. Plus I can manipulate the ref counts only when it matters. I don't have a GC system internally twiddling pointer ref counts mindlessly when all that is happening is that a temporary pointer was made for a function call or some other limited scope.
Jan 13 2003
next sibling parent reply Ross Judson <d soletta.com> writes:
rafael baptista wrote:
 
 I don't doubt it. Asking the operating system for a buffer is very very slow.
On
 windows it actually pauses your thread, and does an OS context switch. If you
 are allocating tiny buffers straight from the OS all the time, you need to go
 back to programming school. Memory should be allocated from the OS in large
 chunks and then doled out inside your progam as needed - using your own
 optimized allocators. This is what gets you cache coherency. This is what
 guarantees that all the vertex data for a model is in cache when you transform
 it, etc.
 

Dude, what do you think Java (and most other GC-enabled environments) do? They allocate big buffers from the OS, then divide it into various categories of storage (in Java's case, various generations). Highly efficient strategies are then used to quickly allocate chunks of memory that used and discarded quickly. More stable allocations are transferred into storage that has strategies appropriate to that. If you're doing your own memory allocation code, you have to do a LOT of work, and write a LOT of bug free code, to get to where the Java allocator is. D needs GC. You're doing realtime work; stick with C, which is highly appropriate to that task! RJ
Jan 13 2003
next sibling parent reply Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:
Ross Judson wrote:
 Dude, what do you think Java (and most other GC-enabled environments) 
 do?  They allocate big buffers from the OS, then divide it into various 
 categories of storage (in Java's case, various generations).  Highly 
 efficient strategies are then used to quickly allocate chunks of memory 
 that used and discarded quickly.  More stable allocations are 
 transferred into storage that has strategies appropriate to that.
 
 If you're doing your own memory allocation code, you have to do a LOT of 
 work, and write a LOT of bug free code, to get to where the Java 
 allocator is.
 
 D needs GC.  You're doing realtime work; stick with C, which is highly 
 appropriate to that task!

But D is nicer than C, and I'd rather use it, and have it be a general usage language, rather than a rapid development language for apps that need more speed than VB. I like GC, don't get me wrong. But without hardware and OS support, GC is always going to have some problems with certain kinds of applications. Since that support doesn't seem to be forthcoming in the near future, D should be ready to accept those cases and have a simple method for *transparently* swapping out what happens with someone writes new or delete. I shouldn't have to be doing anything complicated to switch it all out. Evan
Jan 14 2003
parent reply Bill Cox <bill viasic.com> writes:
Hi.

I'm new to this news group, but not to language design.  D can't be all 
things to all people, so we might as well live with it's focus.  Garbage 
collection is already there, and that defines the scope of the D 
language about as much as any other feature.

That said, here's my thoughts about GC in general:

GC in general buys very little.  Anyone writing complicated programs 
already have to write recursive destructors to remove objects from 
relationships with other objects.  The value of GC is simply that one 
extra line (delete myObject;) doesn't have to be typed.

People argue that GC helps reduce dangling pointer bugs, but rather than 
a dangling pointer, you've got a memory leak.

While GC doesn't really hurt a language like D for most applications, it 
trashes it's use (as D's author points out) for real time programming 
(not going to see any kernels written in D any time soon).

There's also another weak reason I dislike GC: synthesis of 
datastructures into hardware really doesn't work in a GC based language.

 If you're doing your own memory allocation code, you have to do a LOT 
 of work, and write a LOT of bug free code, to get to where the Java 
 allocator is.


In any current language, you'd be right. However, highly optimized memory management can and should be generated by compilers. We do this now where I work. I type no extra code, but get automatically generated free-lists, and bug-free recursive destructors. It's kind of nice. Bill Cox
Jan 24 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Hello.

Mind what you say.

Though recursive destruction is sufficent for a great number of 
problems, for some it is not. I hve caught myself organising data nicely 
in a tree, then giving the pointer to some small branch somewhere and 
destroying the tree. Done. The program doesn't spend even a bit too much 
memory. Even less than requiered. Perfect, right?

Furthermore, i've accidentally deleted an object which is still in use, 
but the holder doesn't get notified of it anyway. It just *has* to crash 
as soon as any access is done.

Such domains lead to some necessity to keep track of how many users an 
object has, and this was *sometimes hard* in Delphi. And error-prone.

This manner of organising data in recursively-destroyed structures is 
typical of experienced programmers, to have some central point to keep 
it all, and is not requiered in most cases where GC is used. It simply 
requeres a different management for data.

Avoid common roots. Only actual users may have references to objects. 
Then they shall be collected if they're no use and are in a way.

Those people like you with their old habits may try GC someday on some 
program of theirs, which was working well, and notice that GC has 
spoiled it, but the usual cause for it is not obeying these simple 
rules, which only simplify the design and not complicate it.

For some uses though, a GC appears to be of no use. For these cases, i 
see a necessity of a separate heap for by-design non-GCed objects, since 
there still might be objects which requiere GC in a multi-module 
programme, like strings and such. This restricts nothing, and the "mark" 
phase time would decimate vastly if a significant portion is not GCed, 
not to mention it would already be very short as soon as mark-methods 
are implemented.

A simple implementation would be that "non-GCed" objects have an empty 
or programmer-written "mark" method (that is, a method which is called 
by a GC, so that an object can mark the objects it's using), and should 
not be deleted if it has not been marked by any other objects. Simply by 
marking itself, or by more sophisticated means.

I admit that my idea, that every object which provedes a corresponding 
method would be thus informed of GC-collected information is not always 
implementable. In some implementations it is quite possible, that the 
only thing which it would get is whether it was marked at all or not, 
but not how often it has been done. But i'd wish to know how many 
references to "me" exist at the time of mark phase, if it's possible.

-i.

PS. Embedded comments follow

  |
  |
\|/
  '

Bill Cox wrote:
 Hi.
 
 I'm new to this news group, but not to language design.  D can't be all 
 things to all people, so we might as well live with it's focus.  Garbage 
 collection is already there, and that defines the scope of the D 
 language about as much as any other feature.
 
 That said, here's my thoughts about GC in general:
 
 GC in general buys very little.  Anyone writing complicated programs 
 already have to write recursive destructors to remove objects from 
 relationships with other objects.  The value of GC is simply that one 
 extra line (delete myObject;) doesn't have to be typed.

It is a vast help in simple cases. Don't tell me that noone writes simple programs. Or doesn't want to write a complex programme in a simple manner. Simple units can very well evolve to complex programs, i hope in D to a much further extent as in C++.
 People argue that GC helps reduce dangling pointer bugs, but rather than 
 a dangling pointer, you've got a memory leak.

Only without a GC, or with non-GC-friendly structures, which in my experience tend not to appear by themselves, but rather on purpose of conventional memory management. And don't tell me you've nevel leaked memory... You may have leaked tiny objects and wouldn't notice it. I did notice sometimes, but only ocassionaly. A good language supported GC (eg Caml) *can't* leak memory at a full mark cycle, because it colects all memory not being referenced. Simply NULL out your pointers and you're done. As soon as your application or possibly some other application needs memory, they would be collected. Do you really care what your "mem-meter" shows you if you don't intend to use it? Boehm GC can leak memory, but the reason is the absense of language support, and that it's thus unable to be sure of some things.
 While GC doesn't really hurt a language like D for most applications, it 
 trashes it's use (as D's author points out) for real time programming 
 (not going to see any kernels written in D any time soon).

Why? D is not Java. GC is not forced. And else it is inherently not even a bit less efficient than C. Rather more, if Walter decides to get rid of C calling convention for usual fuctions. :>
 There's also another weak reason I dislike GC: synthesis of 
 datastructures into hardware really doesn't work in a GC based language.

???
 If you're doing your own memory allocation code, you have to do a LOT 
 of work, and write a LOT of bug free code, to get to where the Java 
 allocator is.


In any current language, you'd be right. However, highly optimized memory management can and should be generated by compilers. We do this now where I work. I type no extra code, but get automatically generated free-lists, and bug-free recursive destructors. It's kind of nice.

Bug-free recursive destruction is not a problem in my eyes. But keeping an eye that you always recursively copy objects and not give away references to them, or that you manipulate the refcounter correctly is tedious. GC would improve performance in certain cases. Copy-on-write? Then you need a ref-counter. Or a garbage collection. Generational GC would do very well in such case. Like Caml shows, which (almost) always uses copy-on-write, eg when you change something in a linked list. I'd rather be for a system-wide GC usage. Example: data passed to OpenGL is being copied. It is rediculous, simpler would be to use copy-on-change (and OpenGL usually doesn't change data), and to GC the leftovers. It is very probable that the data doesn't have to be copied at all then, because applications rarely change most kinds of data during some frame is drawn. Well, OpenGL inludes ways to make data transfers as rare as possible, but their use becomes tedious.
 
 Bill Cox
 

Jan 24 2003
parent reply Bill Cox <bill viasic.com> writes:
Hi.

 Though recursive destruction is sufficent for a great number of 
 problems, for some it is not. I hve caught myself organising data nicely 
 in a tree, then giving the pointer to some small branch somewhere and 
 destroying the tree. Done. The program doesn't spend even a bit too much 
 memory. Even less than requiered. Perfect, right?
 Furthermore, i've accidentally deleted an object which is still in use, 
 but the holder doesn't get notified of it anyway. It just *has* to crash 
 as soon as any access is done.

I didn't make myself clear. I don't write recursive destructors. Our data management CASE tool generates them for us, 100% automaticallay. It gets it right every time. We never have any unaccounted for pointers. It's fast and productive. In fact, we don't even buy Purify, even though we program in C. We just don't need it. Seg-v's are mostly a thing of the past for us. People using GC are still writing their own destructors, and still getting them wrong. A great way to improve productivity is to use CASE tools to do it for you. I really don't see the need for GC. As case tools are often a sign that the language needs to be extended, I've recently written a language that eliminates the need for the CASE tools we use at work. You have to go further. One goal of the language was optimized to work well with code generators. Every relationship (container class) generates code in both the parent and child classes, and updates the recursive destructor in each class. You can't do that with templates. You don't need a new language to do this, though. As I said, we do this at work in C with code generators.
 There's also another weak reason I dislike GC: synthesis of 
 datastructures into hardware really doesn't work in a GC based language.

???

Basically, GC makes it harder to implement a program on hardware. The field of reconfigurable computing is dedicated to doing this. GC complicates things. GC in D is fine. You make your choices in language design, and obviously smart people often include GC. It just makes the language less attractive for the few of us who need to code on the bleeding edge of speed (we write chip design software). Bill Cox
Jan 25 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Bill Cox wrote:
 Hi.
 
 I didn't make myself clear.  I don't write recursive destructors.  Our 
 data management CASE tool generates them for us, 100% automaticallay. It 
 gets it right every time.  We never have any unaccounted for pointers.  
 It's fast and productive.  In fact, we don't even buy Purify, even 
 though we program in C.  We just don't need it.  Seg-v's are mostly a 
 thing of the past for us.

What further work does it do? Recursive destruction is by far not the only problem. How is the rest of memory management dealt with? I mean, to prevent destruction of useful and how to track down objects not used anymore.
 People using GC are still writing their own destructors, and still 
 getting them wrong.  A great way to improve productivity is to use CASE 
 tools to do it for you.  I really don't see the need for GC.

Destructors are not requiered anymore, except for the cases where you need to destroy some hardware together with an object. :> Well, there are a few special cases. People make mistakes with any model. But IMO making no mistakes with GC is very easy. Although i'm an advocate of GC, i'm open to every other solution.
 As case tools are often a sign that the language needs to be extended, 
 I've recently written a language that eliminates the need for the CASE 
 tools we use at work.  You have to go further.  One goal of the language 
 was optimized to work well with code generators.  Every relationship 
 (container class) generates code in both the parent and child classes, 
 and updates the recursive destructor in each class.  You can't do that 
 with templates.  You don't need a new language to do this, though.  As I 
 said, we do this at work in C with code generators.

Describe the extentions you consider useful in complete detail, else it's no use for Walter. Or give a link. I appreciate that you have *done* something (esp. something that works), but what you've written here so far gives me no idea of what it is and how it should work. I'm not well-informed about CASE in general, and i bet the most of the newsgroup isn't as well. There are currently multiple directions, where D should go. First, it has to become solid in its current form. Second, backends have to be written for different kinds of use, eg. a very portable VM. And then, i'll be making an effort to make writing compilers in D and language extentions to D easy, and i'll gladly take your proposal into account. Currently i'm looking at writing of COCOM-like library for D, enabling fast and flexible parsing and syntax tree transformation.
 Basically, GC makes it harder to implement a program on hardware.  The 
 field of reconfigurable computing is dedicated to doing this.  GC 
 complicates things.

Wouldn't you want to write a program in VHDL or such if you wanted to get hardware as output? Well, i can imagine that nowadays it is possible with higher-level languages. But isn't D an overkill? Oh, you're working for an ASIC company. Then obviously "hard" management is better for *you* than a GC. But how should it work? A ref-counter can be a drop-in replacement for a GC (less reliable though), and how do you make the inegration of CASE? -i.
Jan 25 2003
parent reply Bill Cox <bill viasic.com> writes:
HI, Ilya.

Ilya Minkov wrote:
 Bill Cox wrote:
 
 Hi.

 I didn't make myself clear.  I don't write recursive destructors.  Our 
 data management CASE tool generates them for us, 100% automaticallay. 
 It gets it right every time.  We never have any unaccounted for 
 pointers.  It's fast and productive.  In fact, we don't even buy 
 Purify, even though we program in C.  We just don't need it.  Seg-v's 
 are mostly a thing of the past for us.

What further work does it do? Recursive destruction is by far not the only problem. How is the rest of memory management dealt with? I mean, to prevent destruction of useful and how to track down objects not used anymore.

Ok. Here's a description of our CASE tool: The case tool (DataDraw) is pretty simple. It's open-source and compiles under Visual C++ on Windows. I use it on Linux with wine (a Windows emulator). You draw your class schema much as you would with a UML editor, or a database entity-relationship schema editor. You add properties to classes by double clicking on them and adding the fields. Then, to add relationships (container classes) between classes, you select a relationship drawing tool and draw arrows between the classes. By double clicking on the relationship, you get to set it's type: one-to-one, linked list, tail-linked-list (linked list with a pointer to the last element), doubly-linked list, static array, dynamic array, name-based hash table, or static inheritance (standard C++ style inheritance). You also get to specify whether or not the children should be deleted when the parent is deleted. There are a couple other option: relationship names, and whether or not there is a back pointer to the parent embedded in the child. After describing your data structures, it will generate code for you. There are several code generators to choose from, mostly variations of C coding styles, and one C++ style. We use on of the C styles because it gives us dynamic inheritance, which is a big deal for the kind of software we write (integrated chip design tools that run off of a common database). Static inheritance is great when you are constructing objects from scratch. Our objects already exist in the database, and we need to extend them on-the-fly. The code generator we use does this with no memory or speed penalty, since it represents all the class data in individual arrays of data. To extend all the objects of a given class, the memory manager just allocates more arrays. Instead of using C style pointers, we the code generator typedefs an index as the reference type for the class. We get strong type checking in debug mode by typedefing the handles as pointers to dummy classes, and casting them to ints when accessed. That sounds ugly, but it's all buried under the hood by automatically generated access macros that hide it all. The memory managers in two of the code generators are quite advanced. First, each class has options you can select for how to manage it's memory. Objects of some classes are never individually deleted, only all at once. For those classes, the structures are put into a dynmic array. Allocating an object typically is just a compare and an incremt (thee machine instructions, I think). For most classes, however, you need both create and delete on the fly. The code generator that generates more normal C code alloates memory one page at a time, and then sub-allocates your objects of a given class out of them, which optimizes both paging and cache hits. Free lists are automatically generated and maintained. Structures are also automatically alligned, with fields sorted by size. The funkier code generator we use (the one that uses indexes into arrays instead of pointers to structures) uses dynamic arrays for all properties of all classes. We've tested the performance of our code with both generators, and generally see a 10-20% speed increase using the dynamic array representation instead of structures. This memory manager uses the trick of allocating 50% more memory each time a dynamic array needs more memory. Recursive destructors are also generated, as are some other useful functions. The recursive destructors delete each child in any relationship marked as needing the children deleted. Other relationships are patched. For example, code to remove the object from linked lists is generated. Of particular interest to us is the very efficeint code for our container classes (relationships). A linked list relationship generated by DataDraw embeds the next pointer in the child class, which is a far more efficeint implementation, in terms of both speed and memory. In debug mode, all memory accesses are checked, including array indexing. Some of the code generators also generate binary load/save, and full database integrety self-checks. However, we've come to the conclusion that simple binary dumps make a poor save format, since it's hard to remain backwards compatible. Also, since our pointers almost never get trashed, the database integrety self check is also no longer used. There are ways our database can be trashed. Users of fscanf run into it. We sometimes get seg-v's with too few parameters to printf. All in all, our coding pretty safe from these kinds of bugs. ...
 As case tools are often a sign that the language needs to be extended, 
 I've recently written a language that eliminates the need for the CASE 
 tools we use at work.  You have to go further.  One goal of the 
 language was optimized to work well with code generators.  Every 
 relationship (container class) generates code in both the parent and 
 child classes, and updates the recursive destructor in each class.  
 You can't do that with templates.  You don't need a new language to do 
 this, though.  As I said, we do this at work in C with code generators.

Describe the extentions you consider useful in complete detail, else it's no use for Walter. Or give a link.

I've written a simple code translator from a language that incorporates many of the featues of the DataDraw CASE tool. To add the full power of code generators, you need to be able to compile and run code that generates code to be compiled. This is basically templates on steriods. I'm not familiar with the basic container class packages in D yet, so I'll just write out psuedo-code for an example code generator to add fields for a linked list to two classes. generateRelFields(Relationship rel) { Class parent = rel.parentClass; Class child = rel.childClass; if(rel.type == LINKED_LIST) { generate(parentName:parent.name, childName:child.name, relName:rel.name) { extend class <parentName> { struct <relName> { <childName> first; } add(<childName> child) { child.<relName>.next = this.<relName>.first; this.<relName>.first = child; } remove(<childName> child) { ... } extend class <childName> { struct <relName> { <childName> next; <parentName> owner; } } } if(rel.deleteChildren) { generate(parentName:parent.name, childName:child.name, relName:rel.name) { extend class <parentName> { extend destroy() { <childName> child, nextChild; for(child = this.<relName>.first; child != null; child = nextChild) { nextChild = child.<relName>.next; child.destroy(); } } } extend class <childName> { extend destroy() { if(child.<relName>.owner != null) { child.<relName>.owner.remove(child); } } } } } } } Note the addition of the generate statement, which is similar to templates. Also notice that classes, methods, and functions can all be extended. This is a code-generator friendly syntax.
 Basically, GC makes it harder to implement a program on hardware.  The 
 field of reconfigurable computing is dedicated to doing this.  GC 
 complicates things.

Wouldn't you want to write a program in VHDL or such if you wanted to get hardware as output? Well, i can imagine that nowadays it is possible with higher-level languages. But isn't D an overkill? Oh, you're working for an ASIC company. Then obviously "hard" management is better for *you* than a GC. But how should it work? A ref-counter can be a drop-in replacement for a GC (less reliable though), and how do you make the inegration of CASE?

VHDL (or Verilog) is the standard way to describe hardware. Note that VHDL has a generate statement for generating hardware. IMO, it's VHDL's single most important advantage over Verilog. The reconfigurable computing guys are trying to compile normal programs into hardware. It shouldn't be a goal of D, but it is something I dabble in. Bill Cox
Jan 25 2003
parent reply Ilya Minkov <ilminkov planet-interkom.de> writes:
Hi.

Bill Cox wrote:
 The case tool (DataDraw) is pretty simple.  It's open-source and 
 compiles under Visual C++ on Windows. 

Seems to be kept secret though. I was searching for it, and was unable to find. The description is though very interesting, i still don't seem to understand some mechanics about object deletion. This all seems to be about having a central object repository this or another way, and scope of objects has to be known explicitly. That is, AFAIU it automates the normal C/C++/Delphi data management ideoms? I still don't understand how this simple (and for D vital) example is handled: string operations. Imagine, some function has created a string, and returns it. Now, some operations were done on the string: copying, slicing, and so on. Copying and some cases of slicing (i guess) don't change the string, so it is not being copied for efficiency reasons. Now, the same piece of data has multiple users. Now some user makes a change in data, desiring to keep the changes private. Then there are two cases: - Noone else owns the data. It can be changed in-place; - Data has another users. It has to be copied. This means it needs to know whether the piece of data has further users. Delphi solves this for strings by maintainig the reference counter. GC simplifies it, by allowing for "late" copying of data and then it would find out what data needs to be removed. It was possible to prove, that GC is more efficient that reference counting in a general case of tiny objects and strings. Come to think of it: why did the X-Window servers constantly leak memory before they became a GC attached? Because they handle small structures, not cenralized. And how do you get to know that you don't need a certain part in a centralized data storage? Or explain me what i get wrong. -i.
Jan 31 2003
next sibling parent reply "Sean L. Palmer" <seanpalmer directvinternet.com> writes:
This is why functional languages always copy data.

I would think that the correct semantics is to always copy on write unless
the compiler can tell for absolute sure that there are no other users of the
data, in which case it can optimize to do the operation in place.  An array
passed by reference would be exempt from this;  however D currently has no
way at all to pass arrays by value, so we're forced to deal with the "is it
mine, is it yours, is it ours" mess on every front (user code, library code,
system code).

Ownership is the kind of thing that can't be verified in a precondition or
postcondition;  there's no way to tell in general without knowing the entire
program.  That is a job better suited to a machine than a human programmer.

That is why I would like the default to be pass by value, copy on write,
*unless* either the programmer requests pass by reference or the compiler
can tell for sure, due to scoping/data visibility/data flow analysis that
it's safe not to copy on write.  Err on the side of safety.

Sean

"Ilya Minkov" <ilminkov planet-interkom.de> wrote in message
news:b1ek0e$u42$1 digitaldaemon.com...
 Hi.

 Bill Cox wrote:
 The case tool (DataDraw) is pretty simple.  It's open-source and
 compiles under Visual C++ on Windows.

Seems to be kept secret though. I was searching for it, and was unable to find. The description is though very interesting, i still don't seem to understand some mechanics about object deletion. This all seems to be about having a central object repository this or another way, and scope of objects has to be known explicitly. That is, AFAIU it automates the normal C/C++/Delphi data management ideoms? I still don't understand how this simple (and for D vital) example is handled: string operations. Imagine, some function has created a string, and returns it. Now, some operations were done on the string: copying, slicing, and so on. Copying and some cases of slicing (i guess) don't change the string, so it is not being copied for efficiency reasons. Now, the same piece of data has multiple users. Now some user makes a change in data, desiring to keep the changes private. Then there are two cases: - Noone else owns the data. It can be changed in-place; - Data has another users. It has to be copied. This means it needs to know whether the piece of data has further users. Delphi solves this for strings by maintainig the reference counter. GC simplifies it, by allowing for "late" copying of data and then it would find out what data needs to be removed. It was possible to prove, that GC is more efficient that reference counting in a general case of tiny objects and strings. Come to think of it: why did the X-Window servers constantly leak memory before they became a GC attached? Because they handle small structures, not cenralized. And how do you get to know that you don't need a certain part in a centralized data storage? Or explain me what i get wrong. -i.

Feb 01 2003
next sibling parent "Mike Wynn" <mike.wynn l8night.co.uk> writes:
"Sean L. Palmer" <seanpalmer directvinternet.com> wrote in message
news:b1ge7e$29kd$1 digitaldaemon.com...
 This is why functional languages always copy data.

 I would think that the correct semantics is to always copy on write unless
 the compiler can tell for absolute sure that there are no other users of

 data, in which case it can optimize to do the operation in place.  An

 passed by reference would be exempt from this;  however D currently has no
 way at all to pass arrays by value, so we're forced to deal with the "is

 mine, is it yours, is it ours" mess on every front (user code, library

 system code).

it is, but there is an issue with threading, if the function has a big delay, say it signal one event and then waits on an other event, before reading or writing the item, which it expects is a copy, and another thread is waiting for that first signal modifies the original (which is was passed by ref or owns) and signals the second event, the value with be the changed and not the previous. this can be avoided by copying in the function prolog if a write is detected any where in the function, or programmer understanding the effects of lazy copy on write. arrays already have odd 'in' behaviour, so I guess its o.k. to perform a lazy copy as I doubt the condition would occur often and as long as there is a way to force a copy array.length += 0 ; // for instance (safe to do on a 0 length array unlike array[0] = array[0];) Mike.
Feb 01 2003
prev sibling next sibling parent "Mike Wynn" <mike.wynn l8night.co.uk> writes:
"Sean L. Palmer" <seanpalmer directvinternet.com> wrote in message
news:b1ge7e$29kd$1 digitaldaemon.com...
 This is why functional languages always copy data.

 I would think that the correct semantics is to always copy on write unless
 the compiler can tell for absolute sure that there are no other users of

 data, in which case it can optimize to do the operation in place.  An

 passed by reference would be exempt from this;  however D currently has no
 way at all to pass arrays by value, so we're forced to deal with the "is

 mine, is it yours, is it ours" mess on every front (user code, library

 system code).

it is, but there is an issue with threading, if the function has a big delay, say it signal one event and then waits on an other event, before reading or writing the item, which it expects is a copy, and another thread is waiting for that first signal modifies the original (which is was passed by ref or owns) and signals the second event, the value with be the changed and not the previous. this can be avoided by copying in the function prolog if a write is detected any where in the function, or programmer understanding the effects of lazy copy on write. arrays already have odd 'in' behaviour, so I guess its o.k. to perform a lazy copy as I doubt the condition would occur often and as long as there is a way to force a copy array.length += 0 ; // for instance (safe to do on a 0 length array unlike array[0] = array[0];) Mike.
Feb 01 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <seanpalmer directvinternet.com> wrote in message
news:b1ge7e$29kd$1 digitaldaemon.com...
 I would think that the correct semantics is to always copy on write unless
 the compiler can tell for absolute sure that there are no other users of

 data, in which case it can optimize to do the operation in place.  An

 passed by reference would be exempt from this;  however D currently has no
 way at all to pass arrays by value, so we're forced to deal with the "is

 mine, is it yours, is it ours" mess on every front (user code, library

 system code).

The best solution is, as you suggest, copy on write. In D, you don't really need to know who owns it. Just practice copy on write. Any extra copies will get cleaned up by the gc.
Feb 05 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Walter wrote:
 
 The best solution is, as you suggest, copy on write. In D, you don't really
 need to know who owns it. Just practice copy on write. Any extra copies will
 get cleaned up by the gc.
 

Granted the GC is on. If it's off, these all will pollute the memory. That's why a was arguing for 2 separate heaps, so that objects are created on GCed heap by default, or can (if the programmer is confident) created on a heap completely ignored by a GC. I know that being on GC heap shouldn't be a performance hit, after GC is improved. Except for the cases when objects are very pointer-rich (trees, lists and such), but in some cases it just doesn't make sense to scan objects, because the programmer has decided to follow his old C habits and they are all referenced from somewhere anyway. These should also be leak- and dangler-guarded at debug compile, and completely ignored by GC at release compile.
Feb 05 2003
parent "Walter" <walter digitalmars.com> writes:
"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b1rrn0$gps$1 digitaldaemon.com...
 Granted the GC is on. If it's off, these all will pollute the memory.
 That's why a was arguing for 2 separate heaps, so that objects are
 created on GCed heap by default, or can (if the programmer is confident)
 created on a heap completely ignored by a GC.
 I know that being on GC heap shouldn't be a performance hit, after GC is
 improved. Except for the cases when objects are very pointer-rich
 (trees, lists and such), but in some cases it just doesn't make sense to
 scan objects, because the programmer has decided to follow his old C
 habits and they are all referenced from somewhere anyway.
 These should also be leak- and dangler-guarded at debug compile, and
 completely ignored by GC at release compile.

I hear you, but I really think it's worth just taking advantage of the GC and not worrying about it.
Feb 05 2003
prev sibling parent Bill Cox <bill viasic.com> writes:
Hi, Ilya.

Ilya Minkov wrote:
 Hi.
 
 Bill Cox wrote:
 
 The case tool (DataDraw) is pretty simple.  It's open-source and 
 compiles under Visual C++ on Windows. 

Seems to be kept secret though. I was searching for it, and was unable to find.

Try: www.viasic.com/download/datadraw.tar.gz There ins't any active community working on it. The code dates back to 1991, and is a bit dated, and it's gotten pretty hacked up over the years. Tools like these generally don't get used much. I find it really odd. It seems to me that programmers aren't really focused on productivity. The entire field of CASE is full of good tools that no one ever uses.
 The description is though very interesting, i still don't seem to
 understand some mechanics about object deletion. This all seems to be
 about having a central object repository this or another way, and scope
 of objects has to be known explicitly. That is, AFAIU it automates the 
 normal C/C++/Delphi data management ideoms?

Most of the code generators generate simple custom object databases. There are ways, for example, of traversing all the objects of a given class. However, there's really no magic here. It just outputs code that you might be tempted to write, given the class definitions and relationships. The nice thing is that it automates writing code that you might not have time to get right every time, like allocating memory for objects of a given class in pages, and then sub-allocating individual objects out of each page, and maintaining free lists for the class.
 I still don't understand how this simple (and for D vital) example is
 handled: string operations.
 
 Imagine, some function has created a string, and returns it. Now, some
 operations were done on the string: copying, slicing, and so on. Copying
 and some cases of slicing (i guess) don't change the string, so it is
 not being copied for efficiency reasons. Now, the same piece of data has
 multiple users. Now some user makes a change in data, desiring to keep
 the changes private. Then there are two cases:
   - Noone else owns the data. It can be changed in-place;
   - Data has another users. It has to be copied.
 This means it needs to know whether the piece of data has further users.
 Delphi solves this for strings by maintainig the reference counter.
 GC simplifies it, by allowing for "late" copying of data and then it
 would find out what data needs to be removed. It was possible to prove,
 that GC is more efficient that reference counting in a general case of
 tiny objects and strings.

Strings fall into a difficult space. They're not really quit full blown objects that represent something concrete. DataDraw doesn't deal with them, other than to allow you to copy them in and out of objects in the database. Strings, vectors, and matricies are "value" classes, rather than "object" classes. "Values" tend to live on the stack, and go away at the end of scope. "Objects" tend to live on the heap and have complex relationships to other objects. The problem with strings is having to copy all that data around as you would a simple int value. Yuk.
 Come to think of it: why did the X-Window servers constantly leak memory 
 before they became a GC attached? Because they handle small structures, 
 not cenralized. And how do you get to know that you don't need a certain 
 part in a centralized data storage? Or explain me what i get wrong.

Objects in the centralized storage go away when deleted. It's really not much different than in a GC based system, since you still need to remove objects from their relationships to other objects when you want them to go away. The anoying value objects with associated memory are a problem, but not for the centralized storage if you don't allow them there. String manipulation is typically a significant focus for both language design and benchmarks. Strings are a tricky issues that I feel tends to distract us from other important language applications. D seems to have strings nailed. Bill
Feb 02 2003
prev sibling parent reply rafael baptista <rafael_member pathlink.com> writes:
rafael baptista wrote:
 
 Asking the operating system for a buffer is very very slow. On
 windows it actually pauses your thread, and does an OS context switch. If you
 are allocating tiny buffers straight from the OS all the time, you need to go
 back to programming school. Memory should be allocated from the OS in large
 chunks and then doled out inside your progam as needed - using your own
 optimized allocators. 
 

Ross Judson says...

Dude, what do you think Java (and most other GC-enabled environments) 
do?  They allocate big buffers from the OS, then divide it into various 
categories of storage (in Java's case, various generations).  Highly 
efficient strategies are then used to quickly allocate chunks of memory 
that used and discarded quickly.  More stable allocations are 
transferred into storage that has strategies appropriate to that.

If you're doing your own memory allocation code, you have to do a LOT of 
work, and write a LOT of bug free code, to get to where the Java 
allocator is.

D needs GC.  You're doing realtime work; stick with C, which is highly 
appropriate to that task!

RJ

Thats right, a GC will normally allocate memory in large blocks, which is a good practice, and so will anyone doing proper memory management. So what I mean is just that you can't say GC is faster by comparing it any number of bad practices when managing memory manually. There is no doubt about it, that you can make systems to automatically manage memory that are resonably efficient in a given domain... which is why I want the freedom to write my own. Its not that hard to do and has lots of advantages. When necessary I will write my own GC-like manager for certain classes with unpredictable life cycles ( and only for those ). I will write fixed size allocators for other things. I may have tens of allocators going at once guaranteeing that certain types of data will end up on the same page of memory - for cache reasons. Sometimes I want to replace my heap manager with a debug version that lets me track memory leaks, or lets me graph what subsystems are using up the most memory, or lets me see what subsystems are giving me the most cache misses etc. I may want to receive a warning when certain systems are above their memory budget. I may want to know about any buffers I have overwritten. I want low level control of the memory management all the time. I want maximum flexibility there. I will accept a slow system defined "printf" function, or a slow file open dialog, or a fat and slow directory listing function. Those are called infrequently. But I don't want anything between me and the processor, and I don't want anything between me and system memory. Professional programmers know that their customers only care about performance. No one buys software because the code is reusable, or easily to maintain, or simple for newbie programmers to understand. Your customers buy what is the fastest, has the most features, is easiest to use and makes best use of their hardware. They don't care about anything that makes your life easier. I won't be satisfied with the middle of the road, lowest common denominator, useful for all purposes libraries that ship with my tools. I don't want to cede even one cycle, one byte... nothing to my competitor. Fast, proper, custom written memory management is not just for games. Also, ever wonder why you never see any shrink wrapped software written in Java?
Jan 14 2003
parent Ross Judson <d soletta.com> writes:
rafael baptista wrote:

 Professional programmers know that their customers only care about performance.
 No one buys software because the code is reusable, or easily to maintain, or
 simple for newbie programmers to understand. Your customers buy what is the
 fastest, has the most features, is easiest to use and makes best use of their
 hardware. They don't care about anything that makes your life easier.
 
 I won't be satisfied with the middle of the road, lowest common denominator,
 useful for all purposes libraries that ship with my tools. I don't want to cede
 even one cycle, one byte... nothing to my competitor.
 
 Fast, proper, custom written memory management is not just for games. 
 
 Also, ever wonder why you never see any shrink wrapped software written in
Java?
 

Two things come to mind...first, some customers care about performance. Most care about _correctness_. The more a programmer has to do himself, the more opportunity there is to blow the correctness goal. There's a reason we don't do much work in assembly any more. I'd say that there are a decent number of Java client-side applications out there, but they're not all that well known. Writing good client-side Java is hard work. Check out http://www.jgoodies.com for some very polished work by a talented guy. Ever wonder why you see less and less server-side software written in C? Most of it is done in Java, these days...there's a reason for that! RJ
Jan 14 2003
prev sibling parent Ilya Minkov <midiclub 8ung.at> writes:
rafael baptista wrote:
 In article <avv7ve$2tos$1 digitaldaemon.com>, Ilya Minkov says...
 
 Please, believe me garbage collection (GC) is the future of game
 design.

I doubt it.

OK. Let me see. I've got an OpenGL game somewhere which is written in OCaml. I'll try to compile it and tell you what it yields. I haven't seen it yet, it's not necessarily a complicated one where that all would matter. OCaml is a 100% GC language with mixed powerful functional and imperative features, combined with OO, and there exists a GCC-based native compiler for it.
 and such, graphic card, other devices, which are  NOT REALTIME!

You are right. That is why game programmers avoid accessing slow devices like disks in the middle of a game. If you absolutely must get something off disk you normally stream it in. You set up an interrupt driven process or something like that. So say I need a new model off the disk in a few seconds, I would place a request to the disk streaming subsystem to start getting the stuff off the disk. If you sit there like fool waiting for your CD drive to spin up, then your game is very choppy.

How do you know the OS doesn't want to access the disk for some purpose? It frequently does. :) BTW, i might be writing a game in a couple of erhh... years, and i'll try to remember what you said. ;)
 I don't have to believe you. I write games for a living. ;)

OK, OK, i respect Your opinion a lot, you old rusty... :>
 The author claims, that plugging in the Hans Boehm GC into ...

very slow. On windows it actually pauses your thread, and does an OS context switch. If you are allocating tiny buffers straight from the OS all the time, you need to go back to programming school. Memory should be allocated from the OS in large chunks and then doled out inside your progam as needed - using your own optimized allocators. This is what gets you cache coherency. This is what guarantees that all the vertex data for a model is in cache when you transform it, etc.

Naturally, i won't be keeping vertices one by one. ;)
 It is not fair to compare using the most highly optimized GC you have
 ever heard of against bad manual memory management practices.

OK.
 There is no heap compaction better than me ...

Strong Ego. :>
 in memory explicitly for cache coherency. Plus there is no penalty
 for me having holes in the memory space that are bigger than a memory
 page anyway. If a whole page is unused it will never be loaded into
 the cache and will never trouble me. Compacting it away is just
 wasting bandwidth on the bus that I could be using to load textures.

It might just as well only compact the heap if it makes sense :) (probably late though). And it might avoid moving something if not really needed. It's more an implementation issue than of the principle.
 The alternative to GC is not ref counting. Few objects have such an 
 unpredictable life cycle that they cannot be managed any other way.

You probably mean unmanaged. :) Better GCs avoid bothering with old objects anyway.
 GC essentially embed the ref counting inside the GC subsystem and
 makes you do some kind of reference tracking for all memory buffer
 wether you need it or not. I ref count very few objects in a given
 project. 

 And when I do I'm typically ref counting entire objects, not
 every single buffer. Plus I can manipulate the ref counts only when
 it matters. 

I know. I thought about it a lot. Of course even the structures i mentioned can be used with an efficient reference counting. I even came to a conclusion, that it's requiered and GC would be shut off unless either weak pointers, or object notification is implemented. under the last i mean, the GC could run a designated method of every object which provides it, to notify it that the GC cycle has just run and that they have so many "parents" and here's the handle to the rest of GC statistics and so on. So you could teach your object to commit a suicide after it's had only one parent for a while. This parent would be the central object repository, cache.
 I don't have a GC system internally twiddling pointer ref
 counts mindlessly when all that is happening is that a temporary 
 pointer was made for a function call or some other limited scope.

I don't think it's *that* important. It's just worth an experiment i'll be sure to do someday. -i.
Jan 14 2003
prev sibling parent Ilya Minkov <midiclub tiscali.de> writes:
Ilya Minkov wrote:
 which are highly complex meshes of (concave) objects. Now, as I read 

Sorry, it should read "convex". Please remind me if i ever make another mistake like that, and i'll blow my brain out. ;) It has to do with the fact, that a ray hits a forward surface in a convex only once, so that back surface culling is sufficent for correct display, no sorting or z-buffer requiered. You simply start with a block you're in and draw the neighbors clipped through the portals. Though many aspects are not important for modern hardware anymore, the overdraw would get completely eliminated, and simlar techniques eliminate overdraw in the other cases. -i.
Jan 14 2003
prev sibling next sibling parent "Walter" <walter digitalmars.com> writes:
"rafael baptista" <rafael_member pathlink.com> wrote in message
news:avsfoq$6nc$1 digitaldaemon.com...
 The GC is worse though - because its in there operating - arbitrarily

 my program whether I use it or not.

In D, the GC *only* gets called when a call is made to allocate memory. It is not in a separate thread arbitrarilly and randomly running. To deal with time sensitive code, if you preallocate all the data you'll need for the time sensitive portion, you'll never run the risk of the GC running in the middle of it. Note that a similar problem exists if you use malloc - the library can arbitrarilly decide to call the operating system to allocate more memory to the process, which can take arbitrarilly long amounts of time. The only way to guarantee latency, even in C++, is to preallocate the data.
Jan 23 2003
prev sibling parent reply "Achillefs Margaritis" <axilmar b-online.gr> writes:
C++ can be fully garbage-collected. Just make a GC super class of all
garbage collected classes that upon construction adds the object to a global
internal list, then clear the list from time to time.

For example:


class GC
{
public:
    GC();
    virtual ~GC();
    static void Clear();
};


static CArray<GC*> _gc;


GC::GC()
{
   _gc += this;
}


GC::~GC()
{
    _gc -= this;
}

void GC::Clear()
{
    for(i = 0; i < _gc.GetSize(); i++)
    {
        GC *gc = _gc[i];
        _gc[i] = 0;
        delete gc;
    }
    _gc.SetSize(0);
}


class Foo : public GC
{
};


int main()
{
    //foo is garbage collected
    Foo *foo = new Foo;

    //clear dangling references; will delete 'foo'
    GC::Clear();
}


C++ is so flexible!!! D should be that way also. Garbage collection should
be an option of the programmer.


"rafael baptista" <rafael_member pathlink.com> wrote in message
news:avsfoq$6nc$1 digitaldaemon.com...
 I just saw the D spec today... and it is quite wonderful. There are so

 that C and C++ could be better and D addresses them. But I think D is a

 too ambitious to its detriment. It has a good side: it fixes syntactic and
 semantic problems with C/C++. It eliminates all kinds of misfeatures and

 in C++.

 And D has a bad side: it attempts to implement all kinds of features that

 best left to the programmer - not the language spec: built in string

 dynamic arrays, associative arrays, but the worst in my mind is the

 garbage collection.

 As is pointed out in the D overview garbage collection makes D unusable

 time programming - this makes it unusable for Kernel programming, device
 drivers, computer games, any real time graphics, industrial control etc.

 limits the language in a way that is really not necessary. Why not just

 language features that make it easier to integrate GC without making it a

 of the language.

 For my part I wouldn't use GC on any project in any case. My experience

 that GC is inferior to manually managing memory in every case anyway. Why?

 Garbage collection is slower: There is an argument in the Garbage

 section of the D documents that GC is somehow faster than manually

 memory. Essentially the argument boils down to that totally bungling

 memory management is slower than GC. Ok that is true. If I program with

 redundant destructors, and have massive memory leaks, and spend lots of

 manually allocating and deallocating small buffers, and constantly

 thousands of ref counts then managing memory manually is a disaster. Most
 programs manage memory properly and the overhead of memory management is

 problem.

 The argument that GC does not suffer from memory leaks is also invalid. In

 system, if I leave around references to memory that I never intend to use

 I am leaking memory just as surely as if I did not call delete on an

 buffer in c++.

 Finally there is the argument that manually managing memory is somehow

 and error prone. In C++ managing memory is simple - you create a few

 container classes at the start of a project. They will allocate and free

 of the memory in your program. You verify that they work correctly and

 write the rest of your program using these classes. If you are calling

 over your C++ code you are asking for trouble. The advantage is that you

 full control over how memory is managed in your program. I can plan

 I will deal with the different types of memory, where my data structures

 placed in memory. I can code explicity to make sure that certain data

 will be in cache memory at the same time.

 I had a similar reaction of having the compiler automatically build me

 tables, string classes and dynamic arrays. I normally code these things

 with particular attention to how they are going to be used. The unit test

 is also redundant. I can chose to implement that stuff whether the

 supports it or not. I normally make unit test programs that generate

 I can regress against earlier builds - but I wouldn't want to saddle

 programs with having to run my often quite extensive and time consuming

 tests at startup. Imaginary numbers and bit fields should similarly be

 to the programmer.

 I suppose I can just chose not to use these features if I don't want to -

 strikes me as unnecessary cruft - and I subscribe to the idea beautifully
 expressed in the D overview that removing cruft makes a language better.

 The GC is worse though - because its in there operating - arbitrarily

 my program whether I use it or not.

 I urge the implementor of D to think about it this way: you have lots of

 Many of them are really good - but it is likely that despite your best

 not all of them are. It would make your project much more likely to

 you decoupled the ideas as much as possible such that the good ones can

 without being weighed down by the bad. So wherever possible I think you

 move functionality from the language spec to a standard library. This way

 library can continue to improve, while the language proper could stabilize
 early.

 This is probably old ground - I assume lots of people have probably

 commented on the GC. I looked for a thread on GC and could not find one.

 I for one would find D a lot more attractive if it did not have GC.

Jan 30 2003
next sibling parent Bill Cox <bill viasic.com> writes:
HI, Achillefs.

Achillefs Margaritis wrote:
 C++ can be fully garbage-collected. Just make a GC super class of all
 garbage collected classes that upon construction adds the object to a global
 internal list, then clear the list from time to time.
 
 For example:
 
 
 class GC
 {
 public:
     GC();
     virtual ~GC();
     static void Clear();
 };
 
 
 static CArray<GC*> _gc;
 
 
 GC::GC()
 {
    _gc += this;
 }
 
 
 GC::~GC()
 {
     _gc -= this;
 }
 
 void GC::Clear()
 {
     for(i = 0; i < _gc.GetSize(); i++)
     {
         GC *gc = _gc[i];
         _gc[i] = 0;
         delete gc;
     }
     _gc.SetSize(0);
 }
 
 
 class Foo : public GC
 {
 };
 
 
 int main()
 {
     //foo is garbage collected
     Foo *foo = new Foo;
 
     //clear dangling references; will delete 'foo'
     GC::Clear();
 }
 
 
 C++ is so flexible!!! D should be that way also. Garbage collection should
 be an option of the programmer.

This looks like fairly useful code in some cases, but it's not real GC. It simply deletes ALL GC derived objects that have every been allocated but not manually deleted. To make C++ do real GC is both very slow and very hard. You can use handle classes that are actual class objects. They get added to the root set when they are declared as local variables. When they're destroyed, they are removed from the root set. The container classes need to use handles that aren't in the root set, but which can be traversed given a pointer to the containing a GC derived object. Yuk. The more common way in C++ is reference counting, which can be done with handle classes easily, but at a significant performance cost. Also, it doens't really work. It can GC tree structures, but it fails to delete anything with a loop in it, which most data structures have. Even a linked list typically has the child objects pointing back at the parent. It's a loop. If you want good GC, it needs to be supported directly in the language. I just don't have any use for GC. As I've said before, it doesn't really buy you much. All it does is change a delete statement into a direct call to the destructor, which you have to write anyway, or you won't get the object out of it's relationships with other objects. A better way is to automatically generate recursive destructors. That's how it's done in relational databases (well, almost -- they do it in an interpreted way). Just declare that a relationship delete's it children when the parent is deleted. It couldn't be easier. I've written languages that use GC (a Scheme interpreter). I even used the cool in-place GC (not mark-and-sweep), that runs just as fast as stop-and-copy (faster if you take caching and paging into account). Sure GC can be fast. But why complicate the language? It just isn't worth it. Bill
Jan 31 2003
prev sibling next sibling parent reply Ilya Minkov <midiclub 8ung.at> writes:
Achillefs Margaritis wrote:
 C++ is so flexible!!! D should be that way also. Garbage collection should
 be an option of the programmer.

Walter considers C++ too flexible. And he doesn't rob you of C++, he gives you another language. And you can't implement an efficient GC in C. However, i can't see why such a C++ one can't be implemented on D - if you consider it better of course. A GC can profit a lot from modified code generation. A compiler implementation could choose one of automatic memory management solutions, which would not requiere any changes in code, that is are equivalent to the programmer but having different efficiency characteristics. - simple mark-and-sweep collector, like D is using now is just the beginning - i have heard that it doesn't profit from code generation yet and hence keeps executable size small; - mark phase can be accelerated, if the compiler automatically generates an optimised marking method for each class. This is going to have a small size overhead, but reduce the total cost of GC to almost zero; - coloring collector: can work in background, and has very good run-time characteristics that it doesn't stop the main process, doesn't necessarily cause wild paging, and keeps memory footprint rather tight. It might slow down an application a bit, in part because it requieres wrapping pointer writes in a GC method. Might also increase the code size; - smart reference counting: i have stumbled over a document describing a reference counting memory manager, which can also deal with self-referencing structures. Its advantage is that the application uses exactly as much memory as it needs, but the slowdown is to be much higher than with a coloring GC, pauses might be the case, and a code size would increase significantly. I'd say an implementation may be free to provide these or other automatic memory management systems, so far they are compliant with a GC interface. I suppose that the memory manager interface is minimal on purpose, since things i have proposed before, like "mark-notification", are not possible with any managers except for mark-and-sweep. I've read yet another, and this time better done comparison of "static" and "scripting" languages, and it turns out that the decisions of Perl and Python, to allow the programmer to do complex tasks in a simple way, usually shaves off half (or more) of the time requiered for writing an application. It may not be the most optimal way, but it is generally good for programme correctness. There, hundreds of programmers wrote a simple application in C, C++, Perl, Python, Ruby, Java, and TCL. It shows that though C and C++ might lead to an optimal solution, most often they lead to a very bad one since the programmer tries to accomplish a task in a limited time and simplifies the job. In that example, all entries done in languages with native, simple support for associative arrays were correct and run reasonbly fast, while the others lead to a number of broken programmes. It shows again that Walter's decision makes sense, since it makes it possible to write utilities in D about as simply and with more confidence, and resulting in faster execution, than in Perl. I'm pretty much sure D would devastatingly win this comparison. -i.
Jan 31 2003
parent reply Burton Radons <loth users.sourceforge.net> writes:
Ilya Minkov wrote:
 A GC can profit a lot from modified code generation. A compiler 
 implementation could choose one of automatic memory management 
 solutions, which would not requiere any changes in code, that is are 
 equivalent to the programmer but having different efficiency 
 characteristics.

Let's see what happens when type-based scanning is in first; nobody knows what effect it'll have on scanning, but in a small utility I'm making (fractal browsing sample for dig) scanning takes 60 milliseconds but is scanning... 802 pointers over 1120001 dwords of data, so 1396 times too much data. Once that is pared down I predict I'll be able to call a full collection every second without having to worry about delays in almost every program. The exception will be VERY heavily pointer-oriented programs. We're talking Lisp level and programs where you have an "Int" class. I don't care about those.
Jan 31 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Burton Radons" <loth users.sourceforge.net> wrote in message
news:b1eo7q$15m6$1 digitaldaemon.com...
 Let's see what happens when type-based scanning is in first; nobody
 knows what effect it'll have on scanning, but in a small utility I'm
 making (fractal browsing sample for dig) scanning takes 60 milliseconds
 but is scanning... 802 pointers over 1120001 dwords of data, so 1396
 times too much data.  Once that is pared down I predict I'll be able to
 call a full collection every second without having to worry about delays
 in almost every program.

 The exception will be VERY heavily pointer-oriented programs.  We're
 talking Lisp level and programs where you have an "Int" class.  I don't
 care about those.

I have a lot of ideas about improving GC, such as making it a copying generational collector, implementing hardware write barriers, and using type info to mark whole regions as "not containing any pointers", etc. There is a LOT that can be done to boost performance.
Feb 03 2003
parent reply Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:
Walter wrote:
 "Burton Radons" <loth users.sourceforge.net> wrote in message
 news:b1eo7q$15m6$1 digitaldaemon.com...
 
Let's see what happens when type-based scanning is in first; nobody
knows what effect it'll have on scanning, but in a small utility I'm
making (fractal browsing sample for dig) scanning takes 60 milliseconds
but is scanning... 802 pointers over 1120001 dwords of data, so 1396
times too much data.  Once that is pared down I predict I'll be able to
call a full collection every second without having to worry about delays
in almost every program.

The exception will be VERY heavily pointer-oriented programs.  We're
talking Lisp level and programs where you have an "Int" class.  I don't
care about those.

I have a lot of ideas about improving GC, such as making it a copying generational collector, implementing hardware write barriers, and using type info to mark whole regions as "not containing any pointers", etc. There is a LOT that can be done to boost performance.

Implementing hardware write barriers? I didn't think that the hardware was there on most memory controllers. Evan
Feb 09 2003
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
 I have a lot of ideas about improving GC, such as making it a copying
 generational collector, implementing hardware write barriers, and using


 info to mark whole regions as "not containing any pointers", etc. There


 LOT that can be done to boost performance.

Implementing hardware write barriers? I didn't think that the hardware was there on most memory controllers.

fault.
Feb 09 2003
parent reply Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:
Mike Wynn wrote:
Implementing hardware write barriers?  I didn't think that the hardware
was there on most memory controllers.

I took that to be; marking the page as read only. and catching the page fault.

From what I have read, can't find the link right now, that usually results in a decrease in performance, because the whole page needs to be scanned once the barrer is faulted. You only know that the page is dirty, not how dirty, and then have to scan 4MB for what is possibly only one allocated object. I for one would rather go with real software read and write barriers, which can then be replaced with hooks to real hardware read and write barriers once they finally start being reincluded in memory chips, which I figure that they will be before too long, now that microsoft is pressing on the whole garbage collected C#/.NET thing. Evan
Feb 09 2003
next sibling parent "Mike Wynn" <mike.wynn l8night.co.uk> writes:
"Evan McClanahan" <evan dontSPAMaltarinteractive.com> wrote in message
news:b25v5t$2bms$1 digitaldaemon.com...
 Mike Wynn wrote:
Implementing hardware write barriers?  I didn't think that the hardware
was there on most memory controllers.

I took that to be; marking the page as read only. and catching the page fault.

From what I have read, can't find the link right now, that usually results in a decrease in performance, because the whole page needs to be scanned once the barrer is faulted. You only know that the page is dirty, not how dirty, and then have to scan 4MB for what is possibly only one allocated object. I for one would rather go with real software read and write barriers, which can then be replaced with hooks to real hardware read and write barriers once they finally start being reincluded in memory chips, which I figure that they will be before too long, now that microsoft is pressing on the whole garbage collected C#/.NET thing.

and I believe there is a 2M page too (May be PPro) and I agree that it well designed software write barriers are most likely cheaper (performance, maintance and development) 3 colour non copying concurrent collectors only actually require write and return barriers the people I used to know who write GC's don't believe in h/w gc, partly because it restrict what you can do with the object layouts and partly because you can't change or upgrade the gc. h/w that would assist one type of GC might hinder another. I think it will be a while before hardware gc becomes standard, there are CPU's that don't even have basic MM most old consoles and pda's (although I think WinCE devices have to have MM, I know PalmOS does not require it). and I think the PS2 has MM 'cos it can run linux (not uLinux) and of course if your device has a JavaVM, why bother with hardware MM, when you can do it all in software and save on sillicon. or use Tao intent or similar system. I would expect a CLR bytecode cpu, like the Sun and Arm attempts with Java Bytecode aware CPU's before h/w GC Mike.
Feb 09 2003
prev sibling parent C <blackmarlin asean-mail.com> writes:
Evan McClanahan wrote:
 Mike Wynn wrote:
 
 Implementing hardware write barriers?  I didn't think that the hardware
 was there on most memory controllers.

I took that to be; marking the page as read only. and catching the page fault.

From what I have read, can't find the link right now, that usually results in a decrease in performance, because the whole page needs to be scanned once the barrer is faulted. You only know that the page is dirty, not how dirty, and then have to scan 4MB for what is possibly only one allocated object. I for one would rather go with real software read and write barriers, which can then be replaced with hooks to real hardware read and write barriers once they finally start being reincluded in memory chips, which I figure that they will be before too long, now that microsoft is pressing on the whole garbage collected C#/.NET thing. Evan

Every time the page status is changed the TLB cache would need to be flushed (an expensive operation), this change would also have to be reflected across multiple processors on a SMP system. All of this would probably require support from the operating system, at least with respect to hooks to preset/reset/get the read/write flag of the page. The scanning of the page could be limited by picking a smaller page size, the 80x86 supports 4Kb as well as 4Mb pages (and 2Mb pages with page addressing extensions - see CR4.5 & CPUID(2)/EDX.6) C 2003/2/9
Feb 08 2003
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Achillefs Margaritis" <axilmar b-online.gr> wrote in message
news:b1ce36$1ir3$1 digitaldaemon.com...
 C++ can be fully garbage-collected. Just make a GC super class of all
 garbage collected classes that upon construction adds the object to a

 internal list, then clear the list from time to time.

I know, I use GC in my C++ programs. There are some problems, though, because the C++ compiler provides no type information to the runtime. The way a C++ compiler is implemented restricts the kind of GC algorithms that can be used.
Feb 03 2003