www.digitalmars.com         C & C++   DMDScript  

D - Java vs C Benchmarks

reply Mark Evans <Mark_member pathlink.com> writes:
From a guy who does lots of number crunching; interesting results.

http://www.idiom.com/~zilla/Computer/javaCbenchmark.html
Jun 17 2003
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bcnoeo$1npv$1 digitaldaemon.com...
 From a guy who does lots of number crunching; interesting results.

 http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

It's an interesting article. The author is right that there is no reason that Java numerics code should be slower than in C, except when the Java vm is forced to use suboptimal floating point to conform to the Java way of doing floating point that diverges from FPU hardware. Here are the cases where Java is slower than C: 1) Java doesn't support lightweight objects, or lightweight arrays. This means that Java allocates and deallocates an excessive number of objects compared to C code. 2) The Java startup time can be very slow, due to loading the huge VM and the necessity of compiling it before use. 3) Anytime you need to access some feature not found in the Java library, you need to go through the JNI translation layer. 4) Doing much of anything with arrays in Java requires lots of allocation/deallocation cycles. 5) Java chars are 16 bits, and dealing with a lot of them winds up being slow compared with C dealing with byte sized chars. 6) Java has no out parameters, sometimes necessitating creating lots of dummy classes for returning multiple values. Again, this adds to the allocation/deallocation pressure. 7) The string functions are all synchronized, costing speed. Strings are copy-on-write, which gets expensive. 8) Some of the library functions are inefficiently designed, such as file I/O. 9) There are inefficiencies in the library caused by it trying to present a common interface across diverse platforms. Note that only (2) is related to Java being byte code. None of these are related much to integer manipulation and numerics code, which is why Java can do well with them. These difficulties are pretty hard to overcome. Java does have one important advantage over C. Being compiled at runtime, the generated code can be customized to the idiosyncracies of the particular processor it happens to be running on. C, on the other hand, tends to be compiled to the lowest-common-denominator chip. A lot of the difference between CPUs of the same family is how they schedule instructions, extra instructions for floating point, etc. Optimal code generation for one CPU may be very different from that of another CPU in the same family (Intel is famous for doing that).
Jun 17 2003
parent reply Mark Evans <Mark_member pathlink.com> writes:
Walter,

Glad you like the article.  For the record I do not agree with everything that I
post, but benchmarks interest me more than flame fests.  The author of this
article really just cited other benchmark authors, such as the university study
comparing Java/C++/FORTRAN for numerics work.  It has very nice charts and wide
coverage, and is of recent vintage.
http://www.philippsen.com/JGI2001/finalpapers/18500097.pdf

Personally I think Java is dumbed-down.  Some of the Java variants interest me,
like Nice.  I am however open-minded about the future of Java, e.g.
http://www.sun.com/microelectronics/picoJava/overview.html

Certainly I would rather see picoJava take off than M$oft imposing an
encrypted-instruction-set .NET Windows-only chip on our little planet.

But my real purpose is simply to stir the pot a bit so that the stew cooks
evenly.  The computer field moves very rapidly and "what everybody knows" may
not be as true now as it once was, or ever was.  I try to keep an open mind and
pay attention to real data.

Mark
Jun 21 2003
parent reply "Jeff Peil" <jpeil bigfoot.com> writes:
"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bd2vvh$2u7e$1 digitaldaemon.com
 Certainly I would rather see picoJava take off than M$oft imposing an
 encrypted-instruction-set .NET Windows-only chip on our little planet.

Mark, Sun and Microsoft are both corporations, I don't see why you'd blindly prefer one over the other. In this case you are comparing a proprietary system (Java) with one which is largely an open standard (the ECMA CLI covers the instruction set and much of the class library used in the .NET runtime.) Sun is not some heroic entity that only does good and Microsoft is not some vile entity seeking to do evil. -- Jeff Peil
Jun 22 2003
parent Mark Evans <Mark_member pathlink.com> writes:
Mark,

Sun and Microsoft are both corporations, I don't see why you'd blindly
prefer one over the other.  In this case you are comparing a proprietary
system (Java) with one which is largely an open standard (the ECMA CLI
covers the instruction set and much of the class library used in the .NET
runtime.)  Sun is not some heroic entity that only does good and Microsoft
is not some vile entity seeking to do evil.

-- 
Jeff Peil

Jeff, I don't see why anyone would blindly do anything. I did not compare the two companies. The point of the remark was "Java may have a future in spite of itself." As for Sun and M$oft, I do have comments. Sun has given away millions of lines of source code (OpenOffice) while M$oft is still debating whether to open up MS Word format. The .NET SDK license explicitly prohibits non-Windows development. And what's worse, M$oft engages in serious stock fraud (see billparish.com). I won't bore you with the DOJ findings of fact against M$oft, you can do your own research. Mark
Jun 22 2003
prev sibling parent reply "Achilleas Margaritis" <axilmar b-online.gr> writes:
"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bcnoeo$1npv$1 digitaldaemon.com...
 From a guy who does lots of number crunching; interesting results.

 http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

More Java hype. Please read the following piece taken from the site: "In cases where the compiler cannot determine the necessary information at compile time, the C pointer problem may actually be the bigger performance hit. In the java case, the loop bound(s) can be kept in registers, and the index is certainly in a register, so a register-register test is needed. In the C/C++ case a load from cache is needed" There are major errors with this statement: 1) C/C++ does not do bounds checking on simple arrays. Buffer overrun anyone ? 2) The actual address of the pointer is kept in a register, too. Basically, the guy says that since Java does not have pointer arithmetic and uses indices to process arrays, it is faster, because the pointer arithmetic in C must access the memory for the pointer. That's plain wrong. Consider the following: //index-based processing for (int i = 0; i < max; i++) { a[i] = ...; } and then consider this: //pointer-based processing for(int *i = a; i < pmax; i++) { *i = ...; } The article says that the index-based one is faster, because it is pure register-based (except from the final assignment, of course), because the pointer-based processing requires pointer access. That's stupid. *i may be a pointer, but the value of the pointer is kept in a a register. By pointer arithmetic, the array address calculation before the final assignment is eliminated. There is no cache access involved. In reality, Java is much slower than C/C++. Another foul part: "Most everyone says GC is or should be slower, with no given reason- it's assumed but never discussed. Some computer language researchers say otherwise. Consider what happens when you do a new/malloc: a) the allocator wanders through some lists looking for a slot of the right size, then returns you a pointer. b) This pointer is pointing to some pretty random place. With GC, a) the allocator doesn't need to look for memory, it knows where it is, b) the memory it returns is adjacent to the last bit of memory you requested. The wandering around part happens not all the time but only at garbage collection. And then of course (and depending on the GC algorithm) things get moved of course as well" The guy does not know the basics: 1) a GC does not run all the time, so the next 'new' might happen when the memory is not yet compacted. A GC thread runs periodically and collects resources. In this case, the memory is walked as in C. 2) An implicit thread makes determinism difficult to achieve. You don't know when the GC will kick in and freeze the app. That's a mighty problem when you have timers. 3) In order to make memory compacting possible, Java uses triple indirection, i.e. a pointer to a pointer to the object. The GC knows the 2nd pointer's location, and thus it can freely change memory locations by simply change the 2nd pointer. All pointers to the 2nd pointer will automatically point to the new object's location. Example: //java code byte[] myByteArray = new byte[10]; myByteArray[0] = 1; //the equivalent C++ code: char *p2 = new char[10]; char **p1 = &p2; (**p1)[0] = 1; A GC may move the byte array around, and change the 2nd pointer to point to the new location. You have several costs: 1) data copied around every now and then. 2) triple indirection. Need we say more ? 3) the thread itself. We can see from the above example that arrays are completely different in Java than in C++. Not only that, but when arrays are passed around in methods, Java passes 2 arguments instead of one. The first argument passed is the array's 'environment', i.e. a struct that describes the array. This makes possible to find the array size. If you don't believe me, look at JNI. Another error-prone statement: "The big benefit of GC is memory locality. Because newly allocated memory is adjacent to the memory recently used, it is more likely to already be in the cache". Memory locality and memory adjacent blocks are two completely unrelated things. Even if two memory blocks are in far away addresses, if they are constantly used, they would be already in cache. For example: //array1 int *array1 = new int[200]; ...bla bla bla more allocations... int *array2 = new int [200]; now the two arrays are in wildly different memory locations, but, does it matter ? no!!! Check out the following loop: for(int i = 0; i < 10000000; i++) { for(j = 0; j < 200; j++) { array1[j] = rand(); array2[j] = rand(); } } In the loop above, the first access to the array1 and array2 will bring those arrays to the cache. After that, they would be in the cache for as long as the loop takes place. But they are not adjustent!!! The big bottleneck of Java is that everything is allocated on the heap. So, the following type of code is quite often met: //setting a point by using new Point ? what a waste of resources!!! widget->setPosition(new Point(10, 10)); And this is because Java can only allocate objects on the heap and not on the stack. This makes the Java memory fragmented much more than in C++. Another bunch of lies: "Modern Java virtual machines do some amazing things: they profile the program while it is running, determine which parts to optimize, compile them, then uncompile and recompile them if new classes are loaded that override methods that were inlined! The people working on hotspot in particular have hinted that Java should sometime be faster than C because of this. Why? -Because the compiler knows which classes are actually loaded and being called, it can remove the overhead of virtual methods. - dynamic compiler may also get the branch prediction hints right more often than a static compiler. " But those two things are also done on the static compilers too!!! the only difference between static and dynamic compiling is that dynamic happens while the program is running. There are no more information available dynamically than statically!!! In detail: 1) a virtual method can be made non-virtual if the parent or ancestor class is never used(i.e, the interface is never virtual). In C++, the programmer uses virtual methods because they are actually gonna be used in some place as virtual methods! that means, that the compiler should not check for a method being called virtually!!! In a few simple words, in C++ we make a method 'virtual' because our design says it's gonna be called virtually, otherwise we would not make it virtual!!! 2) branch prediction is done entirely by the hardware, except on EPIC(Itanium, Merced etc). C/C++ compilers don't do branch prediction. Actually, the virtual machine has great overhead. That's why Java compilers exist. If a JVM can do an equal job, then there would be no need for a static compiler. Conclusion: "Java is now nearly on parity with C++ in performance. This means we'll see cases where new releases of C++ compilers do better than Java, and vice-versa, just as different C/C++ compilers often have performance differences of up to a factor two or so. On the other hand, performance does depend (a lot) on how the program is written." I work for a company that makes defense applications for THALES (ex Thomson), in C++, ADA and Java. Java is slow!!! Believe me. Maybe in the future it won't be. Maybe in some extreme cases a Java compiler will do better. But...right now, it is slow as hell. By far the slowest of the 3 languages!!!
Jun 21 2003
next sibling parent Mark T <Mark_member pathlink.com> writes:
More Java hype.

http://www.coyotegulch.com/reviews/almabench.html
Jun 21 2003
prev sibling parent reply Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:
Achilleas Margaritis wrote:

You quote the following statement that you go on to disagree with:

 "Most everyone says GC is or should be slower, with no given reason- it's
 assumed but never discussed. Some computer language researchers say
 otherwise."

Just an observation from 12 months back: Bjarne Stroustrup made some references to GC during one of his sessions at the ACCU conference last year. In particular, he did buy into the "a well written GC should be free" theory (apart from obvious degenerate cases) However, when he tried to produce such a "free" collection the best he could do was a 70% performance hit compared to his regular non-GC code. He either had to admit he couldn't write a decent GC, or find some other explaination <g> Given no-one was going to hit him with (i), his conclusion came back to one of your observations - C++ is a stack base language. Moving everything onto the heap in order to support garbage collection has a cost, and that covered the majority of his 70% cost. I have certainly seen the classic 'flawed' benchmarchs where GC blows away explicit memory management (allocate 1,000,000 unused pointers, GC recycles the same pointer) so have no doubt there are memory-use patterns that will be well suited to GC and may well see a benefit from it. I am yet to see any real demonstration of this effect beyond simple bechmarks as above though. It would be nice to understand what real-world profiles GC is well suited to, but that is not as sexy as an all-conquering redundant benchmark, so if the work has been done it has not had sufficient press for me to notice : ( -- AlisdairM
Jun 22 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 He either had to admit he couldn't write a decent GC, or find some
 other explaination <g>  Given no-one was going to hit him with (i),
 his conclusion came back to one of your observations - C++ is a stack
 base language.  Moving everything onto the heap in order to support
 garbage collection has a cost, and that covered the majority of his
 70% cost.

GC is bound to lose against stack allocation - everything that can be properly allocated on the stack will be up to several orders of magnitude faster than what any more sophisticated allocator can do - because both allocation and deallocation is, for all it's worth, free on current machines, where the part of one clock cycle actively occupied with performing the addition (or subtraction) required will not even notice, especially compared to the cost of e.g. a branch misprediction or a cache miss. However, that is by no means an interesting measure. Anything that can be allocated on the stack *should* be allocated there - there's no way around that. The interesting question is whether GC is faster than traditional C-style heap allocation. I do believe it atleast won't lose much in most actual applications, which makes it a worthwhile change no matter what, but I don't have actual benchmarks to backup this. Moreover, I have the feeling that most benchmarks ever done on the issue are actually flawed. According to the paper "Dynamic Storage Allocation: A survey and Critical Review" by Wilson, Johnstone et al, allocator performance has traditionally been evaluated using synthetic traces, rather than using actual allocation traces from actual programs. Similarly, comparision between GC and manual memory management is often done by taking one program employing one of the two mechanisms, preparing a version that uses the other approach, and benchmarking the two. No matter the result, I don't think that strategy makes sense. No matter how transparent (or non-transparent), memory management IS an issue that affects the design of any large program, so an actually meaningful benchmark would have to state a given program and then compare its implementation for a non- GC environment with a GC version. Care must be taken that the core algorithms and their rough implementation techniques are the same - however, the same cannot be said for data structures; those are heavily influenced by the way memory is managed, no matter what. I don't know of any such Benchmark, but this is the only kind of empirical study on the issue I could really accept :) -fg
Jun 22 2003
parent reply Helmut Leitner <helmut.leitner chello.at> writes:
Fabian Giesen wrote:
 ... The interesting question is whether GC is faster than traditional
 C-style heap allocation. I do believe it atleast won't lose much in most
 actual applications, which makes it a worthwhile change no matter what, but
 I don't have actual benchmarks to backup this.

Sorry, I have to comment on this. Talking about performance or optimization a believe don't count, only what you can measure and prove does count. GC gives up knowledge about the memory blocks needed or unneeded. To regain this knowledge has a cost. That's my theoretical basis why I think that GC will always (statistically) be slower than good tradition memory management. But: I would never trust this theory. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 22 2003
next sibling parent reply Georg Wrede <Georg_member pathlink.com> writes:
I think we all agree on that an expert on assembler
programming can write code that is faster than anything
written in any other language. And that interpreted code
mostly is slower than e.g. C code. Still a lot of code is
written in interpreted languages, and more code is written
in C than assembler.

Of course the answer is programmer productivity. Writing in
C gives you almost the same power as writing in assembler,
but you get a lot more done in given time. The same holds
true when comparing interpreted languages with C.

Some applications need to get more bang from the iron, and
those are usually written in C or assembler in spite of the
smaller programmer productivity.

Since in real-world programming the speed demands don't
cover the entire program, only the most essential parts are
usually written in assembler.

Additionally, one could discuss endlessly the practical
results of individual programmers writing assembler, C or an
interpreted language. It should not be too hard to show that
differences in programmer's abilities have a higher impact
on program speed than the chosen language -- even if the
choice is between assembler and an interpreted language.
(That is, a badly written bad algorithm in assembler can't
catch up with a well written excellent algorithm in, say,
Perl, Lua, or even GW-Basic.)

In light of all this, it seems marginal to discuss the
merits of garbage collecting vs. explicit allocation. 

What I am saying is, a good programmer with excellent
experience with (say, D) should be able to write quite fast
code in spite of GC. Also, the speed demands do not fall
evenly on the program, as anyone who's done profiling knows.
This lets the programmer make GC happen at suitable times
and not happen when there's need for speed.

I think we all agree on that it would be very nice if both D
and (Pascal, C, C++, ...) were to have garbage collectors
that you can use in some parts of your program, and not use
in other parts of the same program. And once the science of
GC advances a little, we could use several different GCs in
different parts of our program.

----

The above [ :-( ] became a (too) lenthy way of saying:

If the impact of GC were 10% in speed, and if the choice of
programmer meant a 20% difference in speed, shouldn't we
choose and educate our staff better, instead of splitting
hairs over GC/no GC.

----

Still, I wouldn't feel comfortable writing the Linux kernel
in D. Or a real-time system. But I sure wish I'll see the
day when something convinces me I can do either or both in
D with peace of mind.
Jun 23 2003
next sibling parent Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:
Georg Wrede wrote:

 Of course the answer is programmer productivity. Writing in
 C gives you almost the same power as writing in assembler,
 but you get a lot more done in given time. The same holds
 true when comparing interpreted languages with C.
 
 Some applications need to get more bang from the iron, and
 those are usually written in C or assembler in spite of the
 smaller programmer productivity.

Productivity also plays a big part in outright speed of course, as it is easier to experiment and find the optimal algorithm in the more productive tool. You can generally optimise a more complex form of algorithm tailored to the problem in said productive tool. Once you have the final solution, you can probably realise an even more efficient implementation in the lower level tool. Starting from cold, unless the high level language has a huge abstraction penalty or the problem is trivial/already well understood, you may well get a more efficient solution using the higher level language. -- AlisdairM
Jun 23 2003
prev sibling next sibling parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 Still, I wouldn't feel comfortable writing the Linux kernel
 in D. Or a real-time system. But I sure wish I'll see the
 day when something convinces me I can do either or both in
 D with peace of mind.

"Non-Compacting Memory Allocation and Real-Time Garbage Collection" Mark S. Johnstone, University of Texas, 1996 http://citeseer.nj.nec.com/255424.html May atleast convince you of the possibility (and gritty details :) of real-time garbage collection. -fg
Jun 23 2003
parent Georg Wrede <Georg_member pathlink.com> writes:
In article <bd7aji$16q4$1 digitaldaemon.com>, Fabian Giesen says...
 Still, I wouldn't feel comfortable writing the Linux kernel
 in D. Or a real-time system. But I sure wish I'll see the
 day when something convinces me I can do either or both in
 D with peace of mind.

"Non-Compacting Memory Allocation and Real-Time Garbage Collection" Mark S. Johnstone, University of Texas, 1996 http://citeseer.nj.nec.com/255424.html May atleast convince you of the possibility (and gritty details :) of real-time garbage collection.

Yes! Thanks a lot! It did convince me of the possibility! And, reading the paper, it did not seem much harder to write such an RT-GC than a regular GC. Most of the idea was about doing the Right Thing, and not doing something hairily elaborate. This should be required reading for a lot of people. Having this kind of GC in D would kick some butt when folks start seriously comparing languages for important projects. I'd also be pretty sure that the folks at NASA would then use D for the Mars rovers and stuff, instead of OO-plain-C. And rumor of this would be enough to crash Walter's D site. :-)
Jun 24 2003
prev sibling next sibling parent anonymous <anonymous_member pathlink.com> writes:
In article <bd692p$4f2$1 digitaldaemon.com>, Georg Wrede says...
I think we all agree on that an expert on assembler
programming can write code that is faster than anything
written in any other language. And that interpreted code
mostly is slower than e.g. C code. Still a lot of code is
written in interpreted languages, and more code is written
in C than assembler.

Of course the answer is programmer productivity. Writing in
C gives you almost the same power as writing in assembler,
but you get a lot more done in given time. The same holds
true when comparing interpreted languages with C.

Some applications need to get more bang from the iron, and
those are usually written in C or assembler in spite of the
smaller programmer productivity.

Since in real-world programming the speed demands don't
cover the entire program, only the most essential parts are
usually written in assembler.

Additionally, one could discuss endlessly the practical
results of individual programmers writing assembler, C or an
interpreted language. It should not be too hard to show that
differences in programmer's abilities have a higher impact
on program speed than the chosen language -- even if the
choice is between assembler and an interpreted language.
(That is, a badly written bad algorithm in assembler can't
catch up with a well written excellent algorithm in, say,
Perl, Lua, or even GW-Basic.)

In light of all this, it seems marginal to discuss the
merits of garbage collecting vs. explicit allocation. 

What I am saying is, a good programmer with excellent
experience with (say, D) should be able to write quite fast
code in spite of GC. Also, the speed demands do not fall
evenly on the program, as anyone who's done profiling knows.
This lets the programmer make GC happen at suitable times
and not happen when there's need for speed.

I think we all agree on that it would be very nice if both D
and (Pascal, C, C++, ...) were to have garbage collectors
that you can use in some parts of your program, and not use
in other parts of the same program. And once the science of
GC advances a little, we could use several different GCs in
different parts of our program.

----

The above [ :-( ] became a (too) lenthy way of saying:

If the impact of GC were 10% in speed, and if the choice of
programmer meant a 20% difference in speed, shouldn't we
choose and educate our staff better, instead of splitting
hairs over GC/no GC.

----

Still, I wouldn't feel comfortable writing the Linux kernel
in D. Or a real-time system. But I sure wish I'll see the
day when something convinces me I can do either or both in
D with peace of mind.

Actually the Linux Kernel does use manual garbage collection. It utilizes malloc to dynamically allocate system tables and perform kernel operations. If everything was statically allocated it would be much more limited. Garbage collection and memory management is important to an operating system. Many of the tree functions and lists depend on cleaning up the memory regardless of whether it is garbage collected or not. You could argue as in the case of C that C can generate better assembly than a programmer. And that D could create a better garbage collector or memory manager than a programmer. For extremely complex programs having the compiler manage the memory is much better. The more complex architectures such as Itanium are much more difficult to write assembly in. An assembly programmer would be hard pressed writing a parellel assembly program let alone a parellel garbage collector on the NEC Earth Simulator or an 8000 processor computer.
Jun 23 2003
prev sibling parent Helmut Leitner <helmut.leitner chello.at> writes:
Georg Wrede wrote:
 If the impact of GC were 10% in speed, and if the choice of
 programmer meant a 20% difference in speed, shouldn't we
 choose and educate our staff better, instead of splitting
 hairs over GC/no GC.

People are so used to the GC/no GC conflict, that any argument is immediately seen as a pro or con argument. My "GC has a cost" meant only what it says. I agree with you. If we have a GC that has an average cost of 10% and a worst case cost of 30%, then I would be happy. I don't think that Java GC comes near, I know that D GC doesn't come near. There's a lot to do in GC development. In this situation I just hate to hear "GC is free" and even "GC may even be better". -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 23 2003
prev sibling next sibling parent reply Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:
Helmut Leitner wrote:

 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

Depends on whether tracking the knowledge of memory blocks is gaining you anything useful for the given context. With GC you don't need to explicitly track and manage memory through the life of your algorithm. Not doing un-necessary work is a fairly traditional optimisation <g> So if you have a problem that is particularly heavy on your memory manager, it may be that GC is a valid optimisation. Of course, you could always write a custom allocator for that specific problem, but that holds for any problem. I don't know about you, but I don't write custom allocators that often. Switching to an alternate GC allocator (if provided) and measuring again would be a nice option to have available. How does GC work in 'D'? [I lurk and jump on intersting sounding threads, but don't currently have the time available to play with the language itself. That's why its nice when threads wander off, they come back full circle to 'D' <g>] -- AlisdairM
Jun 23 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Alisdair Meredith wrote:

 How does GC work in 'D'?
 
 [I lurk and jump on intersting sounding threads, but don't currently
 have the time available to play with the language itself.  That's why
 its nice when threads wander off, they come back full circle to 'D' <g>]

As far as i can remember from the rumors and some glazing at the source, the current GC works slowly. It's clearly a prototype and should be replaced by one improved by orders of magnitude - Walter has his plans and tons of ideas on it. As of now, it appears to be a rather simple conservative mark-sweep-collector, similar yet in some cases slower than Boehm. It *does* allocate small structures of same size together in larger blocks though. -i.
Jun 23 2003
parent reply Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:
Ilya Minkov wrote:

 As far as i can remember from the rumors and some glazing at the source,
 the current GC works slowly. It's clearly a prototype and should be
 replaced by one improved by orders of magnitude - Walter has his plans
 and tons of ideas on it.

I asking more about how it is used. Is it opt-in, opt-out or mandatory? How does it interact with the RAII idiom, and things like std::auto_ptr? -- AlisdairM
Jun 23 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Alisdair Meredith wrote:
 I asking more about how it is used.  Is it opt-in, opt-out or
 mandatory?  How does it interact with the RAII idiom, and things like
 std::auto_ptr?

std::auto_ptr??? Bwaaahahahaa! You're in the wrong newsgroup! I suggest you read: http://www.digitalmars.com/d/garbage.html http://www.digitalmars.com/d/memory.html Gargbage collection engine is automatically started with the application. Then, you just allocate objects as usual, and need not take care of their deallocation, which may happen whenever the runtime decides so. The distructors are called whenever actual deallocation takes place. You can also deallocate objects by hand if you like, and you can temporily disable garbage collection. It has nothing to do with RAII at all, since RAII is conserned with allocation, not deallocation. -i.
Jun 23 2003
parent reply Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:
Ilya Minkov wrote:

 std::auto_ptr??? Bwaaahahahaa! You're in the wrong newsgroup!

Well, I could have suggested boost::shared_ptr <g>
 I suggest you read:
 http://www.digitalmars.com/d/garbage.html
 http://www.digitalmars.com/d/memory.html

Thanks, I think that's what I was looking for.
 Gargbage collection engine is automatically started with the
 application. Then, you just allocate objects as usual, and need not take
 care of their deallocation, which may happen whenever the runtime
 decides so. The distructors are called whenever actual deallocation
 takes place. You can also deallocate objects by hand if you like, and
 you can temporily disable garbage collection.

 It has nothing to do with RAII at all, since RAII is conserned with
 allocation, not deallocation.

I disagree, RAII is useless if the resource acquisition in the constructor is not matched by resource release in the destructor. The symmetry is what makes the idiom work, and avoids introducing all that nasty manual exception handling with try/finally. RAII matters especially when you are dealing with resources other than memory, which are often more limitted such as file handles, mutex locks etc. In these cases you really do want destruction to occur at the specified time and not when the garbage collector gets to it. Given D supports RAII, it is important that garbage collection can be turned off for these reasons. OTOH, garbage collection can be very useful in its own right, and certainly makes memory management with cyclic dependencies much easier (something traditionally quite problematic in C++) Not that it is easy to write a garbage collector that collects cycles, but once written client code need never see the problem again <g> I'm still not sure about mixing GC with non-GC languages, as it makes code-reading much more labour intensive. If everything is GC, no problems, ignore those resources that aren't really leaking. If there is no GC, then chase down everything that looks like a leak. With a mix, you need to be alert for leaks all the time, and all that GC code will be sending alarm bells to the human parser as they read the code and continually reassure themself that a given 'leak' will in fact be collected. OTOH, it might be really neat to simply pull the right tool out the toolbox to handle a given problem <g> Until we have more experienced of mixed-mode environments on large scale projects, I'm trying to keep an open mind. -- AlisdairM
Jun 23 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 I disagree, RAII is useless if the resource acquisition in the
 constructor is not matched by resource release in the destructor.  The
 symmetry is what makes the idiom work, and avoids introducing all that
 nasty manual exception handling with try/finally.

There are normal classes and auto classes in D. Normal classes are always allocated on the heap, and auto classes always on the stack (with proper exact lifetime semantics). You can do RAII using them. -fg
Jun 23 2003
parent reply Farmer <itsFarmer. freenet.de> writes:
 There are normal classes and auto classes in D. Normal classes are always
 allocated on the heap, and auto classes always on the stack (with proper
 exact lifetime semantics). You can do RAII using them.
 
 -fg
 

Auto classes are *not always* allocated on the stack. At least the spec doesn't say so. <quote> When an auto class reference goes out of scope, the destructor (if any) for it is automatically called. This holds true even if the scope was exited via a thrown exception. <quote/> In fact, current DMD always allocates auto classes on the gc-heap. The destructor is called when the reference goes out of scope, but the memory of auto classes is not freed immediately, instead it is released during normal garbage collection. Farmer.
Jun 24 2003
parent "Walter" <walter digitalmars.com> writes:
"Farmer" <itsFarmer. freenet.de> wrote in message
news:Xns93A4C9961DCBFitsFarmer 63.105.9.61...
 In fact, current DMD always allocates auto classes on the gc-heap. The
 destructor is called when the reference goes out of scope, but the memory

 auto classes is not freed immediately, instead it is released during

 garbage collection.

Correct, but it is within the semantics of the language to put them on the stack.
Jun 24 2003
prev sibling next sibling parent "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

 But: I would never trust this theory.

That knowledge also has a cost - it doesn't come by itself. A traditional C memory manager doesn't just know where there are free memory blocks, it has to constantly update data structures to do so (and mostly data structures scattered over the whole of memory, which isn't exactly great in terms of cache usage etc.). A program using manual memory management *does* obviously perform more free operations, so its memory footprint is relatively small, but then, a GCed program has the tendency to free memory in larger blocks, which reduces memory fragmentation, and thus, in the long term, memory usage. Also, GCed programs usually have less to worry about memory allocation issues, which makes the code overall simpler and shorter, thus saving both execution time and code space. GCs have been shown to have better locality of reference in allocated memory blocks, which again improves cache usage. And both collections and malloc/free are definitely *not* free in terms of execution time. GC tends to clump memory management costs while malloc/free tend to spread them fairly evenly, but that by itself doesn't tell anything about the overall performance. So the issue is hardly clear-cut :) -fg
Jun 23 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EF69401.F1EC2F81 chello.at...
 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

Many programs rarely ever need to free memory, gc can be a big win for them. Also, there is overhead involved in keeping track of who owns each chunk of memory.
Jun 23 2003
parent reply Helmut Leitner <helmut.leitner chello.at> writes:
Walter wrote:
 
 "Helmut Leitner" <helmut.leitner chello.at> wrote in message
 news:3EF69401.F1EC2F81 chello.at...
 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

Many programs rarely ever need to free memory, gc can be a big win for them. Also, there is overhead involved in keeping track of who owns each chunk of memory.

I know this, but the majority of software needs freeing memory. It makes little sense to talk about special cases. I don't want to be misunderstood: "GC has a cost" doesn't mean "I don't want GC" it means: "don't tell me that GC is free". I don't think that current GC in D is good enough. If you think it is, I'll start benchmarking it. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 23 2003
next sibling parent Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:
Helmut Leitner wrote:

 I know this, but the majority of software needs freeing memory.
 It makes little sense to talk about special cases.

The interesting cases are those that need to free unused memory (eg running 24/7) but maybe not right away.
 I don't want to be misunderstood:
    "GC has a cost"
 doesn't mean
    "I don't want GC"
 it means:
    "don't tell me that GC is free".

Scheduled memory bookkeeping for a time the computer is less busy does not make GC free (overall) but it is effectively 'free-at-point-of-use', or actually better-than-free as there is less bookkeeping <g> I agree that GC is not free, it is simply a matter of deferring the cost to when you can afford to pay it. GC comes on credit <g> -- AlisdairM Team Thai Kingdom
Jun 24 2003
prev sibling parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 I don't want to be misunderstood:
    "GC has a cost"
 doesn't mean
    "I don't want GC"
 it means:
    "don't tell me that GC is free".

As said, so does manual heap-based memory management :) (And a considerable one at that, some papers on allocator performance show that data-structure intensitive programs such as compilers tend to spend >20% of their overall runtime allocating and deallocating memory).
 I don't think that current GC in D is good enough.

I agree with that, and I also think the bad image of GC is partly due to the fact of most garbage collectors actually used in practice are far from state of the art. But then, Walter has already mentioned several times that his focus is currently on the compiler and not the runtime support. -fg
Jun 24 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bd9rqm$1mde$1 digitaldaemon.com...
 I don't think that current GC in D is good enough.

due to the fact of most garbage collectors actually used in practice are far from state of the art. But then, Walter has already mentioned several times that his focus is currently on the compiler and not the runtime support.

You don't really need to wait for me, write a better gc! The one that's in there now aimed to simply work correctly and get D off the ground.
Jun 24 2003
next sibling parent reply Helmut Leitner <leitner hls.via.at> writes:
Walter wrote:
 
 "Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
 news:bd9rqm$1mde$1 digitaldaemon.com...
 I don't think that current GC in D is good enough.

due to the fact of most garbage collectors actually used in practice are far from state of the art. But then, Walter has already mentioned several times that his focus is currently on the compiler and not the runtime support.

You don't really need to wait for me, write a better gc! The one that's in there now aimed to simply work correctly and get D off the ground.

Is it possible to replace or extend the current GC module? If yes, what are the interfaces to the current system? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 24 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <leitner hls.via.at> wrote in message
news:3EF88CB3.80A35237 hls.via.at...
 Walter wrote:
 "Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
 news:bd9rqm$1mde$1 digitaldaemon.com...
 I don't think that current GC in D is good enough.

due to the fact of most garbage collectors actually used in practice are far from state of the art. But then, Walter has already mentioned several times that his focus is currently on the compiler and not the runtime support.



 there now aimed to simply work correctly and get D off the ground.


Yes.
 If yes, what are the interfaces to the current system?

It's designed to be replaced. Check out gc.d, all it does is forward the calls to another, hidden class. Those calls would be the interface.
Jun 24 2003
parent reply Helmut Leitner <leitner hls.via.at> writes:
Walter wrote:
 Is it possible to replace or extend the current GC module?

Yes.
 If yes, what are the interfaces to the current system?

It's designed to be replaced. Check out gc.d, all it does is forward the calls to another, hidden class. Those calls would be the interface.

I think that the GC is one of the vital part of D and it would be a really good thing to have a workgroup dealing with all related issues. Possible goals for the start (please extend list): - analyze, understand and document the current collector - make the collector plugable (document the necessary steps to replace it) - build a GC testsuite for benchmarking and language comparisons - building, testing and suggesting alternative collectors - ... Who is interested to collaborate? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 24 2003
next sibling parent reply Georg Wrede <Georg_member pathlink.com> writes:
In article <3EF947F1.5BAD3D2D hls.via.at>, Helmut Leitner says...
I think that the GC is one of the vital part of D and it would be 
a really good thing to have a workgroup dealing with all related 
issues.

Possible goals for the start (please extend list):
  - analyze, understand and document the current collector
  - make the collector plugable (document the necessary steps to replace it)
  - build a GC testsuite for benchmarking and language comparisons
  - building, testing and suggesting alternative collectors
  - ...
Who is interested to collaborate?

While it's unlikely that I'll submit code, I could contribute the same way as I've done in this newsgroup. If this is welcome, then I'm in.
Jun 25 2003
parent reply Helmut Leitner <helmut.leitner chello.at> writes:
Georg Wrede wrote:
 
 In article <3EF947F1.5BAD3D2D hls.via.at>, Helmut Leitner says...
I think that the GC is one of the vital part of D and it would be
a really good thing to have a workgroup dealing with all related
issues.

Possible goals for the start (please extend list):
  - analyze, understand and document the current collector
  - make the collector plugable (document the necessary steps to replace it)
  - build a GC testsuite for benchmarking and language comparisons
  - building, testing and suggesting alternative collectors
  - ...
Who is interested to collaborate?

While it's unlikely that I'll submit code, I could contribute the same way as I've done in this newsgroup. If this is welcome, then I'm in.

That certainly *is* welcome. Thinking is more important than coding. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 25 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EFA1167.B4912224 chello.at...
 That certainly *is* welcome. Thinking is more important than coding.

The hardest part of a gc is making sure it works. I.e. the test suite for it is the most important piece. Subtle bugs in the gc can have a disastrous effect on D programs, making for very hard to pin down bugs. The \dmd\src\phobos\gc2\testgc.d is a good starting point. I also suggest the "Garbage Collection" book on www.digitalmars.com/bibliography.html. It covers all the main algorithms, their variants, and strengths and weaknesses. It's essential reading, if you don't want to reinvent the wheel <g>. What I see as ideal for D is a two-generation, partially copying collector, with write barriers on the old generation done in hardware with the paging system.
Jun 25 2003
next sibling parent Helmut Leitner <leitner hls.via.at> writes:
Walter wrote:
 
 "Helmut Leitner" <helmut.leitner chello.at> wrote in message
 news:3EFA1167.B4912224 chello.at...
 That certainly *is* welcome. Thinking is more important than coding.

The hardest part of a gc is making sure it works. I.e. the test suite for it is the most important piece. Subtle bugs in the gc can have a disastrous effect on D programs, making for very hard to pin down bugs. The \dmd\src\phobos\gc2\testgc.d is a good starting point. I also suggest the "Garbage Collection" book on www.digitalmars.com/bibliography.html. It covers all the main algorithms, their variants, and strengths and weaknesses. It's essential reading, if you don't want to reinvent the wheel <g>. What I see as ideal for D is a two-generation, partially copying collector, with write barriers on the old generation done in hardware with the paging system.

I've no set opinion on the algorithm that should be used. I know that most people that use malloc/free don't have a faint idea about the behaviour of the system they are using. I agree that a testsuite must guarantee (as far as this is possible) the flawless operation of the GC system. Given that, it would be nice to have an architecture that allows to plug in different GC systems (e. g. by linking to different GC libraries) at will and depending on the problem. Some user might need "hard real time GC" while some other might prefer "copying GC to avoid fragmentation". Pluggable GC might even be interesting to the GC scientists and might inspire them use D for some of their explorations. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 26 2003
prev sibling next sibling parent reply Helmut Leitner <leitner hls.via.at> writes:
Walter wrote:
 I also suggest the "Garbage Collection" book on
 www.digitalmars.com/bibliography.html. It covers all the main algorithms,
 their variants, and strengths and weaknesses. It's essential reading, if you
 don't want to reinvent the wheel <g>.

I've just ordered the book. But usually I don't believe in the quality of prefabricated and published solutions. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 26 2003
parent "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <leitner hls.via.at> wrote in message
news:3EFAB36B.C0014A83 hls.via.at...
 Walter wrote:
 I also suggest the "Garbage Collection" book on
 www.digitalmars.com/bibliography.html. It covers all the main


 their variants, and strengths and weaknesses. It's essential reading, if


 don't want to reinvent the wheel <g>.

the quality of prefabricated and published solutions.

That book is better than most at being practical.
Jun 26 2003
prev sibling parent reply Patrick Down <Patrick_member pathlink.com> writes:
In article <bddfrm$a29$1 digitaldaemon.com>, Walter says...
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EFA1167.B4912224 chello.at...
 That certainly *is* welcome. Thinking is more important than coding.

The hardest part of a gc is making sure it works. I.e. the test suite for it is the most important piece. Subtle bugs in the gc can have a disastrous effect on D programs, making for very hard to pin down bugs. The \dmd\src\phobos\gc2\testgc.d is a good starting point. I also suggest the "Garbage Collection" book on www.digitalmars.com/bibliography.html. It covers all the main algorithms, their variants, and strengths and weaknesses. It's essential reading, if you don't want to reinvent the wheel <g>. What I see as ideal for D is a two-generation, partially copying collector, with write barriers on the old generation done in hardware with the paging system.

The GC does need information from the compiler about data types. Some benafit can be gained from marking or allocating on a separate heap data types that don't have any internal pointers so that they don't need to be scanned. Boehm has GC_malloc_atomic for this purpose. Any plugable GC architecture should contain information about the types being allocated.
Jun 26 2003
next sibling parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 The GC does need information from the compiler about data types.
 Some benafit can be gained from marking or allocating on a
 separate heap data types that don't have any internal pointers
 so that they don't need to be scanned.  Boehm has GC_malloc_atomic
 for this purpose.

 Any plugable GC architecture should contain information about the
 types being allocated.

Agree. Conservative GC is a kludge to make GC work in environments that weren't really meant for it, but I see no point in making the task unnecessarily more complex (and less reliable :) in a language that is designed from ground up with GC in mind. -fg
Jun 26 2003
next sibling parent reply Juarez Rudsatz <Juarez_member pathlink.com> writes:
A interesting link :

http://www-106.ibm.com/developerworks/ibm/library/i-incrcomp/?ca=dgr-lnxw01TrashCompactor
Jun 26 2003
parent Helmut Leitner <helmut.leitner chello.at> writes:
Juarez Rudsatz wrote:
 
 A interesting link :
 
 http://www-106.ibm.com/developerworks/ibm/library/i-incrcomp/?ca=dgr-lnxw01TrashCompactor

I've started a wiki page for the GC workgroup: <http://www.wikiservice.at/wiki4d/wiki.cgi?GCW> and added your link. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 26 2003
prev sibling parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
It seems what is needed is something like:

In the typeinfo for every type there should be a function,    bool
ScanObjectForPointer(void* object, void* p);   which could be used by the GC
to scan blocks;  obviously it'd not be recursive thru pointers so we would
also need   struct OPI { void* objptr; TypeInfo* type }  OPI[]
GetNonNullPointersFromObject(void* object);    .. both functions would use
the typeinfo size to determine the extent to scan.  Probably need something
to scan arrays of objects as well..

If the compiler provided a framework like this, a non-conservative GC could
be built off of it.

At very least some tables would have to be emitted with typeinfo* and offset
and size for the members relevant to the GC.

Sean

"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bdf77j$20sb$1 digitaldaemon.com...
 The GC does need information from the compiler about data types.
 Some benafit can be gained from marking or allocating on a
 separate heap data types that don't have any internal pointers
 so that they don't need to be scanned.  Boehm has GC_malloc_atomic
 for this purpose.

 Any plugable GC architecture should contain information about the
 types being allocated.

Agree. Conservative GC is a kludge to make GC work in environments that weren't really meant for it, but I see no point in making the task unnecessarily more complex (and less reliable :) in a language that is designed from ground up with GC in mind. -fg

Jun 27 2003
parent "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 In the typeinfo for every type there should be a function,    bool
 ScanObjectForPointer(void* object, void* p);   which could be used by
 the GC to scan blocks;  obviously it'd not be recursive thru pointers
 so we would also need   struct OPI { void* objptr; TypeInfo* type }
 OPI[] GetNonNullPointersFromObject(void* object);    .. both
 functions would use the typeinfo size to determine the extent to
 scan.  Probably need something to scan arrays of objects as well..

 If the compiler provided a framework like this, a non-conservative GC
 could be built off of it.

 At very least some tables would have to be emitted with typeinfo* and
 offset and size for the members relevant to the GC.

I'd go for a very simple variant. You only need to know where pointers are, so a normal ~0-terminated array of offsets relative to the struct/class beginning (for structs and classes only, obviously) should suffice. All pointers not inside structs/classes are roots by definition, which the compiler obviously also needs to tell the GC about. -fg
Jun 27 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Patrick Down" <Patrick_member pathlink.com> wrote in message
news:bdf2j3$1rui$1 digitaldaemon.com...
 The GC does need information from the compiler about data types.
 Some benafit can be gained from marking or allocating on a
 separate heap data types that don't have any internal pointers
 so that they don't need to be scanned.  Boehm has GC_malloc_atomic
 for this purpose.

Yes, that can help.
 Any plugable GC architecture should contain information about the
 types being allocated.

Jun 26 2003
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Walter wrote:
 "Patrick Down" <Patrick_member pathlink.com> wrote in message
 news:bdf2j3$1rui$1 digitaldaemon.com...
 
The GC does need information from the compiler about data types.
Some benafit can be gained from marking or allocating on a
separate heap data types that don't have any internal pointers
so that they don't need to be scanned.  Boehm has GC_malloc_atomic
for this purpose.

Yes, that can help.
Any plugable GC architecture should contain information about the
types being allocated.


I know that many people are not big fans of reference counting collectors, but if we're implementing a generic GC interface, then we should include reference/dereference handlers so that refcounters could work. Or, it occurs to me, you could just have a callback to tell refcounters that "object has changed one or more times." The refcounter could do lazy refcounting that way. That callback would be useful as a software write barrier for copying collectors, too.... Of course, any such callback should be inlined (if at all possible); thus, GCs that don't need the callback will simply implement a no-op stub that will vanish when it is inlined. Russ
Jun 27 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 I know that many people are not big fans of reference counting
 collectors, but if we're implementing a generic GC interface, then we
 should include reference/dereference handlers so that refcounters
 could work.

Refcounting cannot even work in the presence of data structures with cyclical references, which includes very basic things like double-linked lists and trees where nodes have pointers to their parents. Making ref-counting work in that case requires either a complete redesign of the datastructures or so-called "weak references", i.e. references that don't "own" the object they're referencing. This can silently break existing code in a myriad of ways, and it's especially bad because it won't cause any suspicious behavior, it'll just cause memory leaks to pile. As the whole idea was to make memory management *easier* and not more involved and error-prone, I really don't think it's worth it. -fg
Jun 27 2003
next sibling parent reply Helmut Leitner <helmut.leitner chello.at> writes:
Fabian Giesen wrote:
 
 I know that many people are not big fans of reference counting
 collectors, but if we're implementing a generic GC interface, then we
 should include reference/dereference handlers so that refcounters
 could work.

Refcounting cannot even work in the presence of data structures with cyclical references, which includes very basic things like double-linked lists and trees where nodes have pointers to their parents. Making ref-counting work in that case requires either a complete redesign of the datastructures or so-called "weak references", i.e. references that don't "own" the object they're referencing. This can silently break existing code in a myriad of ways, and it's especially bad because it won't cause any suspicious behavior, it'll just cause memory leaks to pile. As the whole idea was to make memory management *easier* and not more involved and error-prone, I really don't think it's worth it.

I always wondered why these approaches are not combined. Use refcounts, refcount=0 means free (90-95% of all objects) For all others do your chosen GC algorithm. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 28 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 I always wondered why these approaches are not combined.

 Use refcounts, refcount=0 means free (90-95% of all objects)
 For all others do your chosen GC algorithm.

Because reference counting is quite expensive in terms of CPU time (yes, I know, it's counterintuitive). -fg
Jun 28 2003
parent reply Helmut Leitner <leitner hls.via.at> writes:
Fabian Giesen wrote:
 
 I always wondered why these approaches are not combined.

 Use refcounts, refcount=0 means free (90-95% of all objects)
 For all others do your chosen GC algorithm.

Because reference counting is quite expensive in terms of CPU time (yes, I know, it's counterintuitive).

I know. But in optimization you must often try things that are counterintuitive. One good paper that was cited (and Walters gc2) use size classes 2^N which give at least 50% memory size overhead. Nobody talks about this. But with this amount of memory you can try a lot of promising algorithms. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 28 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 Because reference counting is quite expensive in terms of CPU
 time (yes, I know, it's counterintuitive).

I know. But in optimization you must often try things that are counterintuitive.

Yeah, but I absolutely don't see how any garbage collector could profit from the reference counts. You are right in the observation that most objects do have a very short lifetime, but you don't need reference counting to know that. And a GC doesn't even touch the dead objects (that is, the ones with refcount 0) - it won't even find them for the precise reason that all references to them are dead :) I really don't see any way how the reference counts could be useful to a collector. Maybe you could explain your idea?
 One good paper that was cited (and Walters gc2) use size classes
 2^N which give at least 50% memory size overhead. Nobody talks
 about this. But with this amount of memory you can try a lot of
 promising algorithms.

...on the cost of 50% additional memory overhead. Which pretty much rules them out of you ask me. We're talking memory management here, not memory wasting. As long as 50% of additional memory requirement is no issue because memory usage is low, there's little reason to worry about GC performance anyway. And for programs that are memory intensive, ANYTHING you can gain in the allocator will be easily dwarfed by the cost incurred by negative cache effects and, even worse, virtual memory - 50% more memory usage will make swapping both start earlier and have more to do. And anyone who has ever been in that situation knows that paging, once it begins, can easily slow down a program by an order of magnitude. Don't get me wrong, I'm certainly in for new ideas, but I will easily drop them if I think their cost is too high or not well-spent :) -fg
Jun 28 2003
parent reply Helmut Leitner <helmut.leitner chello.at> writes:
Fabian Giesen wrote:
 
 Because reference counting is quite expensive in terms of CPU
 time (yes, I know, it's counterintuitive).

I know. But in optimization you must often try things that are counterintuitive.

Yeah, but I absolutely don't see how any garbage collector could profit from the reference counts. You are right in the observation that most objects do have a very short lifetime, but you don't need reference counting to know that. And a GC doesn't even touch the dead objects (that is, the ones with refcount 0) - it won't even find them for the precise reason that all references to them are dead :) I really don't see any way how the reference counts could be useful to a collector. Maybe you could explain your idea?
 One good paper that was cited (and Walters gc2) use size classes
 2^N which give at least 50% memory size overhead. Nobody talks
 about this. But with this amount of memory you can try a lot of
 promising algorithms.

...on the cost of 50% additional memory overhead. Which pretty much rules them out of you ask me. We're talking memory management here, not memory wasting. As long as 50% of additional memory requirement is no issue because memory usage is low, there's little reason to worry about GC performance anyway. And for programs that are memory intensive, ANYTHING you can gain in the allocator will be easily dwarfed by the cost incurred by negative cache effects and, even worse, virtual memory - 50% more memory usage will make swapping both start earlier and have more to do. And anyone who has ever been in that situation knows that paging, once it begins, can easily slow down a program by an order of magnitude. Don't get me wrong, I'm certainly in for new ideas, but I will easily drop them if I think their cost is too high or not well-spent :)

Currently I need my brain for the ICFP contest, but I'd like to comment on that in a few days. It's inevitable because I started that Wiki page. One shouldn't rule out things too early. Only few people are aware that malloc/free typically has a 12-20 byte overhead per allocation. And we don't yet talk about fragmentation. Anyway I don't think that "state of the art GC" whatever that may be is good enough for D. If D is to replace C (and I hope it will, I'll put some trust and work into it) then it must have an absolutely flawless memory management system. That's the winning point - not some fancy syntax or OO feature. Any small linear overhead on allocation or deallocation is no problem. But a 10 sec collection pause is. I would even say, that memory mangement is so important, that all other features must be constructed around it to support its efficiency. But, as I said, let's discuss this a bit later. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 28 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 Any small linear overhead on allocation or deallocation is no problem.
 But a 10 sec collection pause is.

As I said before, there are collectors that can satisfy hard realtime requirements. -fg
Jun 28 2003
parent reply Helmut Leitner <leitner hls.via.at> writes:
Fabian Giesen wrote:
 
 Any small linear overhead on allocation or deallocation is no problem.
 But a 10 sec collection pause is.

As I said before, there are collectors that can satisfy hard realtime requirements.

I think I read the link you mentioned. But did you also read that it has an 4-6 (worst case 10) factor between "live object" data and maximum heap size? That is: when a program at a point has 2 MB of objects it may need at the same time 20MB (ok typical 10 MB) heap to do this real time? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Jun 29 2003
parent "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 I think I read the link you mentioned. But did you also read that it
 has an 4-6 (worst case 10) factor between "live object" data and
 maximum heap size?

 That is: when a program at a point has 2 MB of objects it may need
 at the same time 20MB (ok typical 10 MB) heap to do this real time?

There are no "typical" values in that table, it's strictly worst-case analysis, because the absolute upper bounds are what counts for hard real time requirements. Also note that the statistics given are assuming the use of a segregated storage allocator - presumably because worst-case bounds for segregated storage are far easier to compute than for e.g. best fit :). Finally, the actual GC overhead is "twice the amount of memory that is allocated per full garbage collection cycle" (per size class; page 26) - the rest is due to the memory allocator used. Finally, all this is for *hard* realtime requirements. Relaxing this to soft real-time (i.e. trading somewhat worse worst-case complexity for far better average-complexity) should give quite competitive results (even though it won't show in the worst-case statistics, of course). -fg
Jun 30 2003
prev sibling parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Fabian Giesen wrote:
I know that many people are not big fans of reference counting
collectors, but if we're implementing a generic GC interface, then we
should include reference/dereference handlers so that refcounters
could work.

Refcounting cannot even work in the presence of data structures with cyclical references, which includes very basic things like double-linked lists and trees where nodes have pointers to their parents. Making ref-counting work in that case requires either a complete redesign of the datastructures or so-called "weak references", i.e. references that don't "own" the object they're referencing. This can silently break existing code in a myriad of ways, and it's especially bad because it won't cause any suspicious behavior, it'll just cause memory leaks to pile. As the whole idea was to make memory management *easier* and not more involved and error-prone, I really don't think it's worth it. -fg

This problem has been solved. http://www.research.ibm.com/people/d/dfb/papers.html Read the links on "Recycler"
Jun 30 2003
prev sibling parent Mark Evans <Mark_member pathlink.com> writes:
http://www.cs.kent.ac.uk/people/staff/rej/gc.html
Jul 02 2003
prev sibling parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 You don't really need to wait for me, write a better gc! The one
 that's in there now aimed to simply work correctly and get D off the
 ground.

Hm, the problem is that most of the more interesting GC algorithms require changes to the compiler/runtime environment. Conservative GC is actually a bad idea with a language/environment that's designed for use with garbage collection, simply because one can do without any scanning for pointers, *given that the compiler provides the necessary metadata about data structures in the program* :) Some algorithms (like the Realtime GC one for which I posted a link to the paper yesterday) also require read or write barriers, which also need to be implemented in code generation. Overall I doubt that a truly efficient GC can be done as just a pluggable library module - it requires very close cooperation with the compiler. -fg
Jun 24 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bdabu4$62p$1 digitaldaemon.com...
 Hm, the problem is that most of the more interesting GC algorithms
 require changes to the compiler/runtime environment. Conservative GC
 is actually a bad idea with a language/environment that's designed for
 use with garbage collection, simply because one can do without any
 scanning for pointers, *given that the compiler provides the necessary
 metadata about data structures in the program* :)

 Some algorithms (like the Realtime GC one for which I posted a link to
 the paper yesterday) also require read or write barriers, which also
 need to be implemented in code generation.

 Overall I doubt that a truly efficient GC can be done as just a
 pluggable library module - it requires very close cooperation with the
 compiler.

Write barriers can be done entirely within the gc, if the paging hardware is used to implement them.
Jun 24 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 Write barriers can be done entirely within the gc, if the paging
 hardware is used to implement them.

I doubt that using paging to detect pointer writes is a good idea. You'd get literally tons of page faults using that technique to detect pointer writes for data structures that often change. And a page fault is far from cheap :) -fg
Jun 24 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bdajmr$d0s$1 digitaldaemon.com...
 Write barriers can be done entirely within the gc, if the paging
 hardware is used to implement them.

You'd get literally tons of page faults using that technique to detect pointer writes for data structures that often change. And a page fault is far from cheap :)

For a generational collector, how it works is for the 'old' generation, the pages are marked read-only. If anyone writes to a page, that generates a fault. The gc catches the fault, marks the page as "needs to be rescanned", turns on write for that page, and restarts the offending write instruction. The reason it is effective is because statistically old generation pages are rarely written to, and the overhead of a page fault only happens once per page, not for every write. The scheme would work even better if the paging hardware was set up to work with the gc and collect the dirty bits automatically. Unfortunately, no operating system I know of provides an API for that.
Jun 24 2003
parent "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 For a generational collector, how it works is for the 'old'
 generation, the pages are marked read-only. If anyone writes to a
 page, that generates a fault. The gc catches the fault, marks the
 page as "needs to be rescanned", turns on write for that page, and
 restarts the offending write instruction.

 The reason it is effective is because statistically old generation
 pages are rarely written to, and the overhead of a page fault only
 happens once per page, not for every write.

Hm okay, now that I think of it, that *would* make sense. Definitely worth a try :) -fg
Jun 25 2003