D - Java vs C Benchmarks

Mark Evans (2/2) Jun 17 2003 From a guy who does lots of number crunching; interesting results.

Walter (38/40) Jun 17 2003 It's an interesting article. The author is right that there is no reason

Mark Evans (17/17) Jun 21 2003 Walter,

Jeff Peil (11/13) Jun 22 2003 Mark,

Mark Evans (10/19) Jun 22 2003 Jeff, I don't see why anyone would blindly do anything. I did not compa...

Achilleas Margaritis (139/141) Jun 21 2003 More Java hype. Please read the following piece taken from the site:

Mark T (2/3) Jun 21 2003 This website seems to support your observations:
Alisdair Meredith (25/28) Jun 22 2003 Just an observation from 12 months back:

Fabian Giesen (32/38) Jun 22 2003 GC is bound to lose against stack allocation - everything that can be

Helmut Leitner (12/16) Jun 22 2003 Sorry, I have to comment on this.

Georg Wrede (50/50) Jun 23 2003 I think we all agree on that an expert on assembler

Alisdair Meredith (12/20) Jun 23 2003 Productivity also plays a big part in outright speed of course, as it is
Fabian Giesen (6/10) Jun 23 2003 "Non-Compacting Memory Allocation and Real-Time Garbage Collection"

Georg Wrede (13/22) Jun 24 2003 Yes! Thanks a lot! It did convince me of the possibility!

anonymous (14/64) Jun 23 2003 Actually the Linux Kernel does use manual garbage collection. It utiliz...
Helmut Leitner (13/17) Jun 23 2003 People are so used to the GC/no GC conflict, that any argument

Alisdair Meredith (18/22) Jun 23 2003 Depends on whether tracking the knowledge of memory blocks is gaining

Ilya Minkov (10/15) Jun 23 2003 As far as i can remember from the rumors and some glazing at the source,...

Alisdair Meredith (6/10) Jun 23 2003 I asking more about how it is used. Is it opt-in, opt-out or

Ilya Minkov (14/17) Jun 23 2003 std::auto_ptr??? Bwaaahahahaa! You're in the wrong newsgroup!

Alisdair Meredith (32/44) Jun 23 2003 Thanks, I think that's what I was looking for.

Fabian Giesen (4/8) Jun 23 2003 There are normal classes and auto classes in D. Normal classes are alway...

Farmer (12/18) Jun 24 2003 Auto classes are *not always* allocated on the stack. At least the spec

Walter (6/10) Jun 24 2003 of

Fabian Giesen (20/25) Jun 23 2003 That knowledge also has a cost - it doesn't come by itself. A
Walter (5/9) Jun 23 2003 Many programs rarely ever need to free memory, gc can be a big win for t...

Helmut Leitner (14/25) Jun 23 2003 I know this, but the majority of software needs freeing memory.

Alisdair Meredith (11/19) Jun 24 2003 The interesting cases are those that need to free unused memory (eg
Fabian Giesen (11/18) Jun 24 2003 As said, so does manual heap-based memory management :) (And a

Walter (4/10) Jun 24 2003 You don't really need to wait for me, write a better gc! The one that's ...

Helmut Leitner (6/18) Jun 24 2003 Is it possible to replace or extend the current GC module?

Walter (6/19) Jun 24 2003 in

Helmut Leitner (13/21) Jun 24 2003 I think that the GC is one of the vital part of D and it would be a real...

Georg Wrede (4/14) Jun 25 2003 While it's unlikely that I'll submit code, I could contribute

Helmut Leitner (5/23) Jun 25 2003 That certainly *is* welcome. Thinking is more important than coding.

Walter (13/14) Jun 25 2003 The hardest part of a gc is making sure it works. I.e. the test suite fo...

Helmut Leitner (15/33) Jun 26 2003 I've no set opinion on the algorithm that should be used. I know that mo...
Helmut Leitner (6/10) Jun 26 2003 I've just ordered the book. But usually I don't believe in

Walter (5/12) Jun 26 2003 algorithms,

Patrick Down (8/22) Jun 26 2003 The GC does need information from the compiler about data types.

Fabian Giesen (5/12) Jun 26 2003 Agree. Conservative GC is a kludge to make GC work in environments that

Juarez Rudsatz (2/2) Jun 26 2003 A interesting link :

Helmut Leitner (7/11) Jun 26 2003 I've started a wiki page for the GC workgroup:

Sean L. Palmer (15/28) Jun 27 2003 It seems what is needed is something like:

Fabian Giesen (6/17) Jun 27 2003 I'd go for a very simple variant. You only need to know where pointers

Walter (3/10) Jun 26 2003 Yes, that can help.

Russ Lewis (13/28) Jun 27 2003 I know that many people are not big fans of reference counting

Fabian Giesen (12/16) Jun 27 2003 Refcounting cannot even work in the presence of data structures with

Helmut Leitner (7/25) Jun 28 2003 I always wondered why these approaches are not combined.

Fabian Giesen (3/6) Jun 28 2003 Because reference counting is quite expensive in terms of CPU

Helmut Leitner (10/18) Jun 28 2003 I know. But in optimization you must often try things that are

Fabian Giesen (22/30) Jun 28 2003 Yeah, but I absolutely don't see how any garbage collector could

Helmut Leitner (19/55) Jun 28 2003 Currently I need my brain for the ICFP contest, but I'd like to comment

Fabian Giesen (3/5) Jun 28 2003 As I said before, there are collectors that can satisfy hard realtime

Helmut Leitner (9/15) Jun 29 2003 I think I read the link you mentioned. But did you also read that it

Fabian Giesen (13/18) Jun 30 2003 There are no "typical" values in that table, it's strictly worst-case

Russ Lewis (4/24) Jun 30 2003 This problem has been solved.

Mark Evans (1/1) Jul 02 2003 http://www.cs.kent.ac.uk/people/staff/rej/gc.html

Fabian Giesen (13/16) Jun 24 2003 Hm, the problem is that most of the more interesting GC algorithms

Walter (4/16) Jun 24 2003 Write barriers can be done entirely within the gc, if the paging hardwar...

Fabian Giesen (5/7) Jun 24 2003 I doubt that using paging to detect pointer writes is a good idea.

Walter (12/18) Jun 24 2003 For a generational collector, how it works is for the 'old' generation, ...

Fabian Giesen (3/11) Jun 25 2003 Hm okay, now that I think of it, that *would* make sense. Definitely

Mark Evans (10/10) Jul 22 2003 This was another interesting find. -M.

Mark Evans <Mark_member pathlink.com> writes:

From a guy who does lots of number crunching; interesting results.

http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

Jun 17 2003

"Walter" <walter digitalmars.com> writes:

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bcnoeo$1npv$1 digitaldaemon.com...
 From a guy who does lots of number crunching; interesting results.

 http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

It's an interesting article. The author is right that there is no reason
that Java numerics code should be slower than in C, except when the Java vm
is forced to use suboptimal floating point to conform to the Java way of
doing floating point that diverges from FPU hardware.

Here are the cases where Java is slower than C:

1) Java doesn't support lightweight objects, or lightweight arrays. This
means that Java allocates and deallocates an excessive number of objects
compared to C code.

2) The Java startup time can be very slow, due to loading the huge VM and
the necessity of compiling it before use.

3) Anytime you need to access some feature not found in the Java library,
you need to go through the JNI translation layer.

4) Doing much of anything with arrays in Java requires lots of
allocation/deallocation cycles.

5) Java chars are 16 bits, and dealing with a lot of them winds up being
slow compared with C dealing with byte sized chars.

6) Java has no out parameters, sometimes necessitating creating lots of
dummy classes for returning multiple values. Again, this adds to the
allocation/deallocation pressure.

7) The string functions are all synchronized, costing speed. Strings are
copy-on-write, which gets expensive.

8) Some of the library functions are inefficiently designed, such as file
I/O.

9) There are inefficiencies in the library caused by it trying to present a
common interface across diverse platforms.

Note that only (2) is related to Java being byte code. None of these are
related much to integer manipulation and numerics code, which is why Java
can do well with them. These difficulties are pretty hard to overcome.

Java does have one important advantage over C. Being compiled at runtime,
the generated code can be customized to the idiosyncracies of the particular
processor it happens to be running on. C, on the other hand, tends to be
compiled to the lowest-common-denominator chip. A lot of the difference
between CPUs of the same family is how they schedule instructions, extra
instructions for floating point, etc. Optimal code generation for one CPU
may be very different from that of another CPU in the same family (Intel is
famous for doing that).

Jun 17 2003

Mark Evans <Mark_member pathlink.com> writes:

Walter,

Glad you like the article.  For the record I do not agree with everything that I
post, but benchmarks interest me more than flame fests.  The author of this
article really just cited other benchmark authors, such as the university study
comparing Java/C++/FORTRAN for numerics work.  It has very nice charts and wide
coverage, and is of recent vintage.
http://www.philippsen.com/JGI2001/finalpapers/18500097.pdf

Personally I think Java is dumbed-down.  Some of the Java variants interest me,
like Nice.  I am however open-minded about the future of Java, e.g.
http://www.sun.com/microelectronics/picoJava/overview.html

Certainly I would rather see picoJava take off than M$oft imposing an
encrypted-instruction-set .NET Windows-only chip on our little planet.

But my real purpose is simply to stir the pot a bit so that the stew cooks
evenly.  The computer field moves very rapidly and "what everybody knows" may
not be as true now as it once was, or ever was.  I try to keep an open mind and
pay attention to real data.

Mark

Jun 21 2003

"Jeff Peil" <jpeil bigfoot.com> writes:

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bd2vvh$2u7e$1 digitaldaemon.com
 Certainly I would rather see picoJava take off than M$oft imposing an
 encrypted-instruction-set .NET Windows-only chip on our little planet.

Mark,

Sun and Microsoft are both corporations, I don't see why you'd blindly
prefer one over the other.  In this case you are comparing a proprietary
system (Java) with one which is largely an open standard (the ECMA CLI
covers the instruction set and much of the class library used in the .NET
runtime.)  Sun is not some heroic entity that only does good and Microsoft
is not some vile entity seeking to do evil.

-- 
Jeff Peil

Jun 22 2003

Mark Evans <Mark_member pathlink.com> writes:

Mark,

Sun and Microsoft are both corporations, I don't see why you'd blindly
prefer one over the other.  In this case you are comparing a proprietary
system (Java) with one which is largely an open standard (the ECMA CLI
covers the instruction set and much of the class library used in the .NET
runtime.)  Sun is not some heroic entity that only does good and Microsoft
is not some vile entity seeking to do evil.

-- 
Jeff Peil

Jeff, I don't see why anyone would blindly do anything.  I did not compare the
two companies.  The point of the remark was "Java may have a future in spite of
itself."

As for Sun and M$oft, I do have comments.  Sun has given away millions of lines
of source code (OpenOffice) while M$oft is still debating whether to open up MS
Word format.  The .NET SDK license explicitly prohibits non-Windows development.
And what's worse, M$oft engages in serious stock fraud (see billparish.com).  I
won't bore you with the DOJ findings of fact against M$oft, you can do your own
research.

Mark

Jun 22 2003

"Achilleas Margaritis" <axilmar b-online.gr> writes:

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bcnoeo$1npv$1 digitaldaemon.com...
 From a guy who does lots of number crunching; interesting results.

 http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

More Java hype. Please read the following piece taken from the site:

"In cases where the compiler cannot determine the necessary information at
compile time, the C pointer problem may actually be the bigger performance
hit. In the java case, the loop bound(s) can be kept in registers, and the
index is certainly in a register, so a register-register test is needed. In
the C/C++ case a load from cache is needed"

There are major errors with this statement:

1) C/C++ does not do bounds checking on simple arrays. Buffer overrun anyone
?

2) The actual address of the pointer is kept in a register, too. Basically,
the guy says that since Java does not have pointer arithmetic and uses
indices to process arrays, it is faster, because the pointer arithmetic in C
must access the memory for the pointer. That's plain wrong. Consider the
following:

//index-based processing
for (int i = 0; i < max; i++) {
    a[i] = ...;
}

and then consider this:

//pointer-based processing
for(int *i = a; i < pmax; i++) {
    *i = ...;
}

The article says that the index-based one is faster, because it is pure
register-based (except from the final assignment, of course), because the
pointer-based processing requires pointer access.

That's stupid. *i may be a pointer, but the value of the pointer is kept in
a a register. By pointer arithmetic, the array address calculation before
the final assignment is eliminated. There is no cache access involved.

In reality, Java is much slower than C/C++.

Another foul part:

"Most everyone says GC is or should be slower, with no given reason- it's
assumed but never discussed. Some computer language researchers say
otherwise.
Consider what happens when you do a new/malloc: a) the allocator wanders
through some lists looking for a slot of the right size, then returns you a
pointer. b) This pointer is pointing to some pretty random place.

With GC, a) the allocator doesn't need to look for memory, it knows where it
is, b) the memory it returns is adjacent to the last bit of memory you
requested. The wandering around part happens not all the time but only at
garbage collection. And then of course (and depending on the GC algorithm)
things get moved of course as well"

The guy does not know the basics:

1) a GC does not run all the time, so the next 'new' might happen when the
memory is not yet compacted. A GC thread runs periodically and collects
resources. In this case, the memory is walked as in C.

2) An implicit thread makes determinism difficult to achieve. You don't know
when the GC will kick in and freeze the app. That's a mighty problem when
you have timers.

3) In order to make memory compacting possible, Java uses triple
indirection, i.e. a pointer to a pointer to the object. The GC knows the 2nd
pointer's location, and thus it can freely change memory locations by simply
change the 2nd pointer. All pointers to the 2nd pointer will automatically
point to the new object's location. Example:

//java code

byte[] myByteArray = new byte[10];

myByteArray[0] = 1;



//the equivalent C++ code:

char *p2 = new char[10];

char **p1 = &p2;

(**p1)[0] = 1;



A GC may move the byte array around, and change the 2nd pointer to point to
the new location. You have several costs:

1) data copied around every now and then.

2) triple indirection. Need we say more ?

3) the thread itself.

We can see from the above example that arrays are completely different in
Java than in C++. Not only that, but when arrays are passed around in
methods, Java passes 2 arguments instead of one. The first argument passed
is the array's 'environment', i.e. a struct that describes the array. This
makes possible to find the array size. If you don't believe me, look at JNI.

Another error-prone statement:

"The big benefit of GC is memory locality. Because newly allocated memory is
adjacent to the memory recently used, it is more likely to already be in the
cache".

Memory locality and memory adjacent blocks are two completely unrelated
things. Even if two memory blocks are in far away addresses, if they are
constantly used, they would be already in cache.  For example:

//array1

int *array1 = new int[200];

...bla bla bla more allocations...

int *array2 = new int [200];

now the two arrays are in wildly different memory locations, but, does it
matter ? no!!! Check out the following loop:

for(int i = 0; i < 10000000; i++) {

    for(j = 0; j < 200; j++) {

        array1[j] = rand();

        array2[j] = rand();

    }

}

In the loop above, the first access to the array1 and array2 will bring
those arrays to the cache. After that, they would be in the cache for as
long as the loop takes place. But they are not adjustent!!!

The big bottleneck of  Java is that everything is allocated on the heap. So,
the following type of code is quite often met:

//setting a point by using new Point ? what a waste of resources!!!

widget->setPosition(new Point(10, 10));

And this is because Java can only allocate objects on the heap and not on
the stack. This makes the Java memory fragmented much more than in C++.



Another bunch of lies:

"Modern Java virtual machines do some amazing things: they profile the
program while it is running, determine which parts to optimize, compile
them, then uncompile and recompile them if new classes are loaded that
override methods that were inlined!

The people working on hotspot in particular have hinted that Java should
sometime be faster than C because of this. Why?

-Because the compiler knows which classes are actually loaded and being
called, it can remove the overhead of virtual methods.

- dynamic compiler may also get the branch prediction hints right more often
than a static compiler. "

But those two things are also done on the static compilers too!!! the only
difference between static and dynamic compiling is that dynamic happens
while the program is running. There are no more information available
dynamically than statically!!! In detail:

1) a virtual method can be made non-virtual if the parent or ancestor class
is never used(i.e, the interface is never virtual). In C++, the programmer
uses virtual methods because they are actually gonna be used in some place
as virtual methods! that means, that the compiler should not check for a
method being called virtually!!! In a few simple words, in C++ we make a
method 'virtual' because our design says it's gonna be called virtually,
otherwise we would not make it virtual!!!

2) branch prediction is done entirely by the hardware, except on
EPIC(Itanium, Merced etc). C/C++ compilers don't do branch prediction.

Actually, the virtual machine has great overhead. That's why Java compilers
exist. If a JVM can do an equal job, then there would be no need for a
static compiler.

Conclusion:

"Java is now nearly on parity with C++ in performance. This means we'll see
cases where new releases of C++ compilers do better than Java, and
vice-versa, just as different C/C++ compilers often have performance
differences of up to a factor two or so. On the other hand, performance does
depend (a lot) on how the program is written."

I work for a company that makes defense applications for THALES (ex
Thomson), in C++, ADA and Java. Java is slow!!! Believe me. Maybe in the
future it won't be. Maybe in some extreme cases a Java compiler will do
better. But...right now, it is slow as hell. By far the slowest of the 3
languages!!!

Jun 21 2003

Mark T <Mark_member pathlink.com> writes:

More Java hype.

This website seems to support your observations:
http://www.coyotegulch.com/reviews/almabench.html

Jun 21 2003

Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:

Achilleas Margaritis wrote:

You quote the following statement that you go on to disagree with:

 "Most everyone says GC is or should be slower, with no given reason- it's
 assumed but never discussed. Some computer language researchers say
 otherwise."

Just an observation from 12 months back:
Bjarne Stroustrup made some references to GC during one of his sessions
at the ACCU conference last year.  In particular, he did buy into the "a
well written GC should be free" theory (apart from obvious degenerate
cases)

However, when he tried to produce such a "free" collection the best he
could do was a 70% performance hit compared to his regular non-GC code.

He either had to admit he couldn't write a decent GC, or find some other
explaination <g>  Given no-one was going to hit him with (i), his
conclusion came back to one of your observations - C++ is a stack base
language.  Moving everything onto the heap in order to support garbage
collection has a cost, and that covered the majority of his 70% cost.

I have certainly seen the classic 'flawed' benchmarchs where GC blows
away explicit memory management (allocate 1,000,000 unused pointers, GC
recycles the same pointer) so have no doubt there are memory-use
patterns that will be well suited to GC and may well see a benefit from
it.  I am yet to see any real demonstration of this effect beyond simple
bechmarks as above though.  It would be nice to understand what
real-world profiles GC is well suited to, but that is not as sexy as an
all-conquering redundant benchmark, so if the work has been done it has
not had sufficient press for me to notice :� (

-- 
AlisdairM

Jun 22 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 He either had to admit he couldn't write a decent GC, or find some
 other explaination <g>  Given no-one was going to hit him with (i),
 his conclusion came back to one of your observations - C++ is a stack
 base language.  Moving everything onto the heap in order to support
 garbage collection has a cost, and that covered the majority of his
 70% cost.

GC is bound to lose against stack allocation - everything that can be
properly allocated on the stack will be up to several orders of magnitude
faster than what any more sophisticated allocator can do - because both
allocation and deallocation is, for all it's worth, free on current machines,
where the part of one clock cycle actively occupied with performing the
addition (or subtraction) required will not even notice, especially compared
to the cost of e.g. a branch misprediction or a cache miss.

However, that is by no means an interesting measure. Anything that can be
allocated on the stack *should* be allocated there - there's no way around
that. The interesting question is whether GC is faster than traditional
C-style heap allocation. I do believe it atleast won't lose much in most
actual applications, which makes it a worthwhile change no matter what, but
I don't have actual benchmarks to backup this.

Moreover, I have the feeling that most benchmarks ever done on the issue are
actually flawed. According to the paper "Dynamic Storage Allocation: A survey
and Critical Review" by Wilson, Johnstone et al, allocator performance has
traditionally been evaluated using synthetic traces, rather than using actual
allocation traces from actual programs. Similarly, comparision between GC
and manual memory management is often done by taking one program employing
one of the two mechanisms, preparing a version that uses the other approach,
and benchmarking the two.

No matter the result, I don't think that strategy makes sense. No matter how
transparent (or non-transparent), memory management IS an issue that affects
the design of any large program, so an actually meaningful benchmark would
have to state a given program and then compare its implementation for a non-
GC environment with a GC version. Care must be taken that the core
algorithms and their rough implementation techniques are the same - however,
the same cannot be said for data structures; those are heavily influenced by
the way memory is managed, no matter what.

I don't know of any such Benchmark, but this is the only kind of empirical
study on the issue I could really accept :)

-fg

Jun 22 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Fabian Giesen wrote:
 ... The interesting question is whether GC is faster than traditional
 C-style heap allocation. I do believe it atleast won't lose much in most
 actual applications, which makes it a worthwhile change no matter what, but
 I don't have actual benchmarks to backup this.

Sorry, I have to comment on this.

Talking about performance or optimization a believe don't count,
only what you can measure and prove does count.

GC gives up knowledge about the memory blocks needed or unneeded.
To regain this knowledge has a cost. That's my theoretical basis
why I think that GC will always (statistically) be slower than good 
tradition memory management. 

But: I would never trust this theory. 

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 22 2003

Georg Wrede <Georg_member pathlink.com> writes:

I think we all agree on that an expert on assembler
programming can write code that is faster than anything
written in any other language. And that interpreted code
mostly is slower than e.g. C code. Still a lot of code is
written in interpreted languages, and more code is written
in C than assembler.

Of course the answer is programmer productivity. Writing in
C gives you almost the same power as writing in assembler,
but you get a lot more done in given time. The same holds
true when comparing interpreted languages with C.

Some applications need to get more bang from the iron, and
those are usually written in C or assembler in spite of the
smaller programmer productivity.

Since in real-world programming the speed demands don't
cover the entire program, only the most essential parts are
usually written in assembler.

Additionally, one could discuss endlessly the practical
results of individual programmers writing assembler, C or an
interpreted language. It should not be too hard to show that
differences in programmer's abilities have a higher impact
on program speed than the chosen language -- even if the
choice is between assembler and an interpreted language.
(That is, a badly written bad algorithm in assembler can't
catch up with a well written excellent algorithm in, say,
Perl, Lua, or even GW-Basic.)

In light of all this, it seems marginal to discuss the
merits of garbage collecting vs. explicit allocation. 

What I am saying is, a good programmer with excellent
experience with (say, D) should be able to write quite fast
code in spite of GC. Also, the speed demands do not fall
evenly on the program, as anyone who's done profiling knows.
This lets the programmer make GC happen at suitable times
and not happen when there's need for speed.

I think we all agree on that it would be very nice if both D
and (Pascal, C, C++, ...) were to have garbage collectors
that you can use in some parts of your program, and not use
in other parts of the same program. And once the science of
GC advances a little, we could use several different GCs in
different parts of our program.

----

The above [ :-( ] became a (too) lenthy way of saying:

If the impact of GC were 10% in speed, and if the choice of
programmer meant a 20% difference in speed, shouldn't we
choose and educate our staff better, instead of splitting
hairs over GC/no GC.

----

Still, I wouldn't feel comfortable writing the Linux kernel
in D. Or a real-time system. But I sure wish I'll see the
day when something convinces me I can do either or both in
D with peace of mind.

Jun 23 2003

Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:

Georg Wrede wrote:

 Of course the answer is programmer productivity. Writing in
 C gives you almost the same power as writing in assembler,
 but you get a lot more done in given time. The same holds
 true when comparing interpreted languages with C.
 
 Some applications need to get more bang from the iron, and
 those are usually written in C or assembler in spite of the
 smaller programmer productivity.

Productivity also plays a big part in outright speed of course, as it is
easier to experiment and find the optimal algorithm in the more
productive tool.  You can generally optimise a more complex form of
algorithm tailored to the problem in said productive tool.

Once you have the final solution, you can probably realise an even more
efficient implementation in the lower level tool.  Starting from cold,
unless the high level language has a huge abstraction penalty or the
problem is trivial/already well understood, you may well get a more
efficient solution using the higher level language.

-- 
AlisdairM

Jun 23 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 Still, I wouldn't feel comfortable writing the Linux kernel
 in D. Or a real-time system. But I sure wish I'll see the
 day when something convinces me I can do either or both in
 D with peace of mind.

"Non-Compacting Memory Allocation and Real-Time Garbage Collection"
Mark S. Johnstone, University of Texas, 1996
http://citeseer.nj.nec.com/255424.html

May atleast convince you of the possibility (and gritty details :)
of real-time garbage collection.

-fg

Jun 23 2003

Georg Wrede <Georg_member pathlink.com> writes:

In article <bd7aji$16q4$1 digitaldaemon.com>, Fabian Giesen says...
 Still, I wouldn't feel comfortable writing the Linux kernel
 in D. Or a real-time system. But I sure wish I'll see the
 day when something convinces me I can do either or both in
 D with peace of mind.

"Non-Compacting Memory Allocation and Real-Time Garbage Collection"
Mark S. Johnstone, University of Texas, 1996
http://citeseer.nj.nec.com/255424.html

May atleast convince you of the possibility (and gritty details :)
of real-time garbage collection.

Yes! Thanks a lot! It did convince me of the possibility!

And, reading the paper, it did not seem much harder to write
such an RT-GC than a regular GC. Most of the idea was about
doing the Right Thing, and not doing something hairily
elaborate.

This should be required reading for a lot of people.

Having this kind of GC in D would kick some butt when folks
start seriously comparing languages for important projects.

I'd also be pretty sure that the folks at NASA would then
use D for the Mars rovers and stuff, instead of OO-plain-C.

And rumor of this would be enough to crash Walter's 
D site. :-)

Jun 24 2003

anonymous <anonymous_member pathlink.com> writes:

In article <bd692p$4f2$1 digitaldaemon.com>, Georg Wrede says...
I think we all agree on that an expert on assembler
programming can write code that is faster than anything
written in any other language. And that interpreted code
mostly is slower than e.g. C code. Still a lot of code is
written in interpreted languages, and more code is written
in C than assembler.

Of course the answer is programmer productivity. Writing in
C gives you almost the same power as writing in assembler,
but you get a lot more done in given time. The same holds
true when comparing interpreted languages with C.

Some applications need to get more bang from the iron, and
those are usually written in C or assembler in spite of the
smaller programmer productivity.

Since in real-world programming the speed demands don't
cover the entire program, only the most essential parts are
usually written in assembler.

Additionally, one could discuss endlessly the practical
results of individual programmers writing assembler, C or an
interpreted language. It should not be too hard to show that
differences in programmer's abilities have a higher impact
on program speed than the chosen language -- even if the
choice is between assembler and an interpreted language.
(That is, a badly written bad algorithm in assembler can't
catch up with a well written excellent algorithm in, say,
Perl, Lua, or even GW-Basic.)

In light of all this, it seems marginal to discuss the
merits of garbage collecting vs. explicit allocation. 

What I am saying is, a good programmer with excellent
experience with (say, D) should be able to write quite fast
code in spite of GC. Also, the speed demands do not fall
evenly on the program, as anyone who's done profiling knows.
This lets the programmer make GC happen at suitable times
and not happen when there's need for speed.

I think we all agree on that it would be very nice if both D
and (Pascal, C, C++, ...) were to have garbage collectors
that you can use in some parts of your program, and not use
in other parts of the same program. And once the science of
GC advances a little, we could use several different GCs in
different parts of our program.

----

The above [ :-( ] became a (too) lenthy way of saying:

If the impact of GC were 10% in speed, and if the choice of
programmer meant a 20% difference in speed, shouldn't we
choose and educate our staff better, instead of splitting
hairs over GC/no GC.

----

Still, I wouldn't feel comfortable writing the Linux kernel
in D. Or a real-time system. But I sure wish I'll see the
day when something convinces me I can do either or both in
D with peace of mind.

Actually the Linux Kernel does use manual garbage collection.  It utilizes
malloc to dynamically allocate system tables and perform kernel operations.  If
everything was statically allocated it would be much more limited.  Garbage
collection and memory management is important to an operating system.

Many of the tree functions and lists depend on cleaning up the memory regardless
of whether it is garbage collected or not.  You could argue as in the case of C
that C can generate better assembly than a programmer.  And that D could create
a better garbage collector or memory manager than a programmer.

For extremely complex programs having the compiler manage the memory is much
better.  The more complex architectures such as Itanium are much more difficult
to write assembly in.  An assembly programmer would be hard pressed writing a
parellel assembly program let alone a parellel garbage collector on the NEC
Earth Simulator or an 8000 processor computer.

Jun 23 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Georg Wrede wrote:
 If the impact of GC were 10% in speed, and if the choice of
 programmer meant a 20% difference in speed, shouldn't we
 choose and educate our staff better, instead of splitting
 hairs over GC/no GC.

People are so used to the GC/no GC conflict, that any argument
is immediately seen as a pro or con argument. 

My "GC has a cost" meant only what it says.

I agree with you. If we have a GC that has an average cost
of 10% and a worst case cost of 30%, then I would be happy.
I don't think that Java GC comes near, I know that D GC
doesn't come near. There's a lot to do in GC development.

In this situation I just hate to hear "GC is free" and
even "GC may even be better". 

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 23 2003

Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:

Helmut Leitner wrote:

 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

Depends on whether tracking the knowledge of memory blocks is gaining
you anything useful for the given context.  With GC you don't need to
explicitly track and manage memory through the life of your algorithm. 
Not doing un-necessary work is a fairly traditional optimisation <g>

So if you have a problem that is particularly heavy on your memory
manager, it may be that GC is a valid optimisation.  Of course, you
could always write a custom allocator for that specific problem, but
that holds for any problem.  I don't know about you, but I don't write
custom allocators that often.  Switching to an alternate GC allocator
(if provided) and measuring again would be a nice option to have
available.

How does GC work in 'D'?

[I lurk and jump on intersting sounding threads, but don't currently
have the time available to play with the language itself.  That's why
its nice when threads wander off, they come back full circle to 'D' <g>]

-- 
AlisdairM

Jun 23 2003

Ilya Minkov <midiclub 8ung.at> writes:

Alisdair Meredith wrote:

 How does GC work in 'D'?
 
 [I lurk and jump on intersting sounding threads, but don't currently
 have the time available to play with the language itself.  That's why
 its nice when threads wander off, they come back full circle to 'D' <g>]

As far as i can remember from the rumors and some glazing at the source, 
the current GC works slowly. It's clearly a prototype and should be 
replaced by one improved by orders of magnitude - Walter has his plans 
and tons of ideas on it.

As of now, it appears to be a rather simple conservative 
mark-sweep-collector, similar yet in some cases slower than Boehm. It 
*does* allocate small structures of same size together in larger blocks 
though.

-i.

Jun 23 2003

Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:

Ilya Minkov wrote:

 As far as i can remember from the rumors and some glazing at the source,
 the current GC works slowly. It's clearly a prototype and should be
 replaced by one improved by orders of magnitude - Walter has his plans
 and tons of ideas on it.

I asking more about how it is used.  Is it opt-in, opt-out or
mandatory?  How does it interact with the RAII idiom, and things like
std::auto_ptr?

-- 
AlisdairM

Jun 23 2003

Ilya Minkov <midiclub 8ung.at> writes:

Alisdair Meredith wrote:
 I asking more about how it is used.  Is it opt-in, opt-out or
 mandatory?  How does it interact with the RAII idiom, and things like
 std::auto_ptr?

std::auto_ptr??? Bwaaahahahaa! You're in the wrong newsgroup!

I suggest you read:
http://www.digitalmars.com/d/garbage.html
http://www.digitalmars.com/d/memory.html

Gargbage collection engine is automatically started with the 
application. Then, you just allocate objects as usual, and need not take 
care of their deallocation, which may happen whenever the runtime 
decides so. The distructors are called whenever actual deallocation 
takes place. You can also deallocate objects by hand if you like, and 
you can temporily disable garbage collection.

It has nothing to do with RAII at all, since RAII is conserned with 
allocation, not deallocation.

-i.

Jun 23 2003

Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:

Ilya Minkov wrote:

 std::auto_ptr??? Bwaaahahahaa! You're in the wrong newsgroup!

Well, I could have suggested boost::shared_ptr <g>

 I suggest you read:
 http://www.digitalmars.com/d/garbage.html
 http://www.digitalmars.com/d/memory.html

Thanks, I think that's what I was looking for.

 Gargbage collection engine is automatically started with the
 application. Then, you just allocate objects as usual, and need not take
 care of their deallocation, which may happen whenever the runtime
 decides so. The distructors are called whenever actual deallocation
 takes place. You can also deallocate objects by hand if you like, and
 you can temporily disable garbage collection.

 It has nothing to do with RAII at all, since RAII is conserned with
 allocation, not deallocation.

I disagree, RAII is useless if the resource acquisition in the
constructor is not matched by resource release in the destructor.  The
symmetry is what makes the idiom work, and avoids introducing all that
nasty manual exception handling with try/finally.

RAII matters especially when you are dealing with resources other than
memory, which are often more limitted such as file handles, mutex locks
etc.  In these cases you really do want destruction to occur at the
specified time and not when the garbage collector gets to it.  Given D
supports RAII, it is important that garbage collection can be turned off
for these reasons.

OTOH, garbage collection can be very useful in its own right, and
certainly makes memory management with cyclic dependencies much easier
(something traditionally quite problematic in C++)  Not that it is easy
to write a garbage collector that collects cycles, but once written
client code need never see the problem again <g>

I'm still not sure about mixing GC with non-GC languages, as it makes
code-reading much more labour intensive.  If everything is GC, no
problems, ignore those resources that aren't really leaking.  If there
is no GC, then chase down everything that looks like a leak.  With a
mix, you need to be alert for leaks all the time, and all that GC code
will be sending alarm bells to the human parser as they read the code
and continually reassure themself that a given 'leak' will in fact be
collected.

OTOH, it might be really neat to simply pull the right tool out the
toolbox to handle a given problem <g>

Until we have more experienced of mixed-mode environments on large scale
projects, I'm trying to keep an open mind.

-- 
AlisdairM

Jun 23 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 I disagree, RAII is useless if the resource acquisition in the
 constructor is not matched by resource release in the destructor.  The
 symmetry is what makes the idiom work, and avoids introducing all that
 nasty manual exception handling with try/finally.

There are normal classes and auto classes in D. Normal classes are always
allocated on the heap, and auto classes always on the stack (with proper
exact lifetime semantics). You can do RAII using them.

-fg

Jun 23 2003

Farmer <itsFarmer. freenet.de> writes:

 There are normal classes and auto classes in D. Normal classes are always
 allocated on the heap, and auto classes always on the stack (with proper
 exact lifetime semantics). You can do RAII using them.
 
 -fg
 

Auto classes are *not always* allocated on the stack. At least the spec 
doesn't say so. 

<quote>
When an auto class reference goes out of scope, the destructor (if any) for 
it is automatically called. This holds true even if the scope was exited via 
a thrown exception. 
<quote/>

In fact, current DMD always allocates auto classes on the gc-heap. The 
destructor is called when the reference goes out of scope, but the memory of 
auto classes is not freed immediately, instead it is released during normal 
garbage collection. 



Farmer.

Jun 24 2003

"Walter" <walter digitalmars.com> writes:

"Farmer" <itsFarmer. freenet.de> wrote in message
news:Xns93A4C9961DCBFitsFarmer 63.105.9.61...
 In fact, current DMD always allocates auto classes on the gc-heap. The
 destructor is called when the reference goes out of scope, but the memory

of
 auto classes is not freed immediately, instead it is released during

normal
 garbage collection.

Correct, but it is within the semantics of the language to put them on the
stack.

Jun 24 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

 But: I would never trust this theory.

That knowledge also has a cost - it doesn't come by itself. A
traditional C memory manager doesn't just know where there are free
memory blocks, it has to constantly update data structures to do so
(and mostly data structures scattered over the whole of memory, which
isn't exactly great in terms of cache usage etc.).

A program using manual memory management *does* obviously perform more
free operations, so its memory footprint is relatively small, but then,
a GCed program has the tendency to free memory in larger blocks, which
reduces memory fragmentation, and thus, in the long term, memory usage.
Also, GCed programs usually have less to worry about memory allocation
issues, which makes the code overall simpler and shorter, thus saving
both execution time and code space.

GCs have been shown to have better locality of reference in allocated
memory blocks, which again improves cache usage. And both collections
and malloc/free are definitely *not* free in terms of execution time.
GC tends to clump memory management costs while malloc/free tend to
spread them fairly evenly, but that by itself doesn't tell anything
about the overall performance.

So the issue is hardly clear-cut :)

-fg

Jun 23 2003

"Walter" <walter digitalmars.com> writes:

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EF69401.F1EC2F81 chello.at...
 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

Many programs rarely ever need to free memory, gc can be a big win for them.
Also, there is overhead involved in keeping track of who owns each chunk of
memory.

Jun 23 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Walter wrote:
 
 "Helmut Leitner" <helmut.leitner chello.at> wrote in message
 news:3EF69401.F1EC2F81 chello.at...
 GC gives up knowledge about the memory blocks needed or unneeded.
 To regain this knowledge has a cost. That's my theoretical basis
 why I think that GC will always (statistically) be slower than good
 tradition memory management.

 
 Many programs rarely ever need to free memory, gc can be a big win for them.
 Also, there is overhead involved in keeping track of who owns each chunk of
 memory.

I know this, but the majority of software needs freeing memory.
It makes little sense to talk about special cases.

I don't want to be misunderstood:
   "GC has a cost"
doesn't mean
   "I don't want GC"
it means:
   "don't tell me that GC is free".

I don't think that current GC in D is good enough. 
If you think it is, I'll start benchmarking it. 

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 23 2003

Alisdair Meredith <alisdair.meredith uk.renaultf1.com> writes:

Helmut Leitner wrote:

 I know this, but the majority of software needs freeing memory.
 It makes little sense to talk about special cases.

The interesting cases are those that need to free unused memory (eg
running 24/7) but maybe not right away.  

 I don't want to be misunderstood:
    "GC has a cost"
 doesn't mean
    "I don't want GC"
 it means:
    "don't tell me that GC is free".

Scheduled memory bookkeeping for a time the computer is less busy does
not make GC free (overall) but it is effectively 'free-at-point-of-use',
or actually better-than-free as there is less bookkeeping <g>

I agree that GC is not free, it is simply a matter of deferring the cost
to when you can afford to pay it.  GC comes on credit <g>

-- 
AlisdairM
Team Thai Kingdom

Jun 24 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 I don't want to be misunderstood:
    "GC has a cost"
 doesn't mean
    "I don't want GC"
 it means:
    "don't tell me that GC is free".

As said, so does manual heap-based memory management :) (And a
considerable one at that, some papers on allocator performance
show that data-structure intensitive programs such as compilers
tend to spend >20% of their overall runtime allocating and
deallocating memory).

 I don't think that current GC in D is good enough.

I agree with that, and I also think the bad image of GC is partly
due to the fact of most garbage collectors actually used in practice
are far from state of the art. But then, Walter has already mentioned
several times that his focus is currently on the compiler and not the
runtime support.

-fg

Jun 24 2003

"Walter" <walter digitalmars.com> writes:

"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bd9rqm$1mde$1 digitaldaemon.com...
 I don't think that current GC in D is good enough.

 I agree with that, and I also think the bad image of GC is partly
 due to the fact of most garbage collectors actually used in practice
 are far from state of the art. But then, Walter has already mentioned
 several times that his focus is currently on the compiler and not the
 runtime support.

You don't really need to wait for me, write a better gc! The one that's in
there now aimed to simply work correctly and get D off the ground.

Jun 24 2003

Helmut Leitner <leitner hls.via.at> writes:

Walter wrote:
 
 "Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
 news:bd9rqm$1mde$1 digitaldaemon.com...
 I don't think that current GC in D is good enough.

 I agree with that, and I also think the bad image of GC is partly
 due to the fact of most garbage collectors actually used in practice
 are far from state of the art. But then, Walter has already mentioned
 several times that his focus is currently on the compiler and not the
 runtime support.

 
 You don't really need to wait for me, write a better gc! The one that's in
 there now aimed to simply work correctly and get D off the ground.

Is it possible to replace or extend the current GC module?

If yes, what are the interfaces to the current system?

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Jun 24 2003

"Walter" <walter digitalmars.com> writes:

"Helmut Leitner" <leitner hls.via.at> wrote in message
news:3EF88CB3.80A35237 hls.via.at...
 Walter wrote:
 "Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
 news:bd9rqm$1mde$1 digitaldaemon.com...
 I don't think that current GC in D is good enough.

 I agree with that, and I also think the bad image of GC is partly
 due to the fact of most garbage collectors actually used in practice
 are far from state of the art. But then, Walter has already mentioned
 several times that his focus is currently on the compiler and not the
 runtime support.

 You don't really need to wait for me, write a better gc! The one that's


in
 there now aimed to simply work correctly and get D off the ground.

 Is it possible to replace or extend the current GC module?

Yes.

 If yes, what are the interfaces to the current system?

It's designed to be replaced. Check out gc.d, all it does is forward the
calls to another, hidden class. Those calls would be the interface.

Jun 24 2003

Helmut Leitner <leitner hls.via.at> writes:

Walter wrote:
 Is it possible to replace or extend the current GC module?

 
 Yes.
 
 If yes, what are the interfaces to the current system?

 
 It's designed to be replaced. Check out gc.d, all it does is forward the
 calls to another, hidden class. Those calls would be the interface.

I think that the GC is one of the vital part of D and it would be a really
good thing to have a workgroup dealing with all related issues.

Possible goals for the start (please extend list):
  - analyze, understand and document the current collector
  - make the collector plugable (document the necessary steps to replace it)
  - build a GC testsuite for benchmarking and language comparisons
  - building, testing and suggesting alternative collectors
  - ...

Who is interested to collaborate?

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Jun 24 2003

Georg Wrede <Georg_member pathlink.com> writes:

In article <3EF947F1.5BAD3D2D hls.via.at>, Helmut Leitner says...
I think that the GC is one of the vital part of D and it would be 
a really good thing to have a workgroup dealing with all related 
issues.

Possible goals for the start (please extend list):
  - analyze, understand and document the current collector
  - make the collector plugable (document the necessary steps to replace it)
  - build a GC testsuite for benchmarking and language comparisons
  - building, testing and suggesting alternative collectors
  - ...
Who is interested to collaborate?

While it's unlikely that I'll submit code, I could contribute
the same way as I've done in this newsgroup. If this is welcome,
then I'm in.

Jun 25 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Georg Wrede wrote:
 
 In article <3EF947F1.5BAD3D2D hls.via.at>, Helmut Leitner says...
I think that the GC is one of the vital part of D and it would be
a really good thing to have a workgroup dealing with all related
issues.

Possible goals for the start (please extend list):
  - analyze, understand and document the current collector
  - make the collector plugable (document the necessary steps to replace it)
  - build a GC testsuite for benchmarking and language comparisons
  - building, testing and suggesting alternative collectors
  - ...
Who is interested to collaborate?

 
 While it's unlikely that I'll submit code, I could contribute
 the same way as I've done in this newsgroup. If this is welcome,
 then I'm in.

That certainly *is* welcome. Thinking is more important than coding.

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 25 2003

"Walter" <walter digitalmars.com> writes:

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EFA1167.B4912224 chello.at...
 That certainly *is* welcome. Thinking is more important than coding.

The hardest part of a gc is making sure it works. I.e. the test suite for it
is the most important piece. Subtle bugs in the gc can have a disastrous
effect on D programs, making for very hard to pin down bugs. The
\dmd\src\phobos\gc2\testgc.d is a good starting point.

I also suggest the "Garbage Collection" book on
www.digitalmars.com/bibliography.html. It covers all the main algorithms,
their variants, and strengths and weaknesses. It's essential reading, if you
don't want to reinvent the wheel <g>.

What I see as ideal for D is a two-generation, partially copying collector,
with write barriers on the old generation done in hardware with the paging
system.

Jun 25 2003

Helmut Leitner <leitner hls.via.at> writes:

Walter wrote:
 
 "Helmut Leitner" <helmut.leitner chello.at> wrote in message
 news:3EFA1167.B4912224 chello.at...
 That certainly *is* welcome. Thinking is more important than coding.

 
 The hardest part of a gc is making sure it works. I.e. the test suite for it
 is the most important piece. Subtle bugs in the gc can have a disastrous
 effect on D programs, making for very hard to pin down bugs. The
 \dmd\src\phobos\gc2\testgc.d is a good starting point.
 
 I also suggest the "Garbage Collection" book on
 www.digitalmars.com/bibliography.html. It covers all the main algorithms,
 their variants, and strengths and weaknesses. It's essential reading, if you
 don't want to reinvent the wheel <g>.
 
 What I see as ideal for D is a two-generation, partially copying collector,
 with write barriers on the old generation done in hardware with the paging
 system.

I've no set opinion on the algorithm that should be used. I know that most
people that use malloc/free don't have a faint idea about the behaviour of
the system they are using.

I agree that a testsuite must guarantee (as far as this is possible) the
flawless operation of the GC system.

Given that, it would be nice to have an architecture that allows to plug
in different GC systems (e. g. by linking to different GC libraries) at will
and depending on the problem. Some user might need "hard real time GC"
while some other might prefer "copying GC to avoid fragmentation". 

Pluggable GC might even be interesting to the GC scientists and might
inspire them use D for some of their explorations.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Jun 26 2003

Helmut Leitner <leitner hls.via.at> writes:

Walter wrote:
 I also suggest the "Garbage Collection" book on
 www.digitalmars.com/bibliography.html. It covers all the main algorithms,
 their variants, and strengths and weaknesses. It's essential reading, if you
 don't want to reinvent the wheel <g>.

I've just ordered the book. But usually I don't believe in 
the quality of prefabricated and published solutions.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Jun 26 2003

"Walter" <walter digitalmars.com> writes:

"Helmut Leitner" <leitner hls.via.at> wrote in message
news:3EFAB36B.C0014A83 hls.via.at...
 Walter wrote:
 I also suggest the "Garbage Collection" book on
 www.digitalmars.com/bibliography.html. It covers all the main


algorithms,
 their variants, and strengths and weaknesses. It's essential reading, if


you
 don't want to reinvent the wheel <g>.

 I've just ordered the book. But usually I don't believe in
 the quality of prefabricated and published solutions.

That book is better than most at being practical.

Jun 26 2003

Patrick Down <Patrick_member pathlink.com> writes:

In article <bddfrm$a29$1 digitaldaemon.com>, Walter says...
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EFA1167.B4912224 chello.at...
 That certainly *is* welcome. Thinking is more important than coding.

The hardest part of a gc is making sure it works. I.e. the test suite for it
is the most important piece. Subtle bugs in the gc can have a disastrous
effect on D programs, making for very hard to pin down bugs. The
\dmd\src\phobos\gc2\testgc.d is a good starting point.

I also suggest the "Garbage Collection" book on
www.digitalmars.com/bibliography.html. It covers all the main algorithms,
their variants, and strengths and weaknesses. It's essential reading, if you
don't want to reinvent the wheel <g>.

What I see as ideal for D is a two-generation, partially copying collector,
with write barriers on the old generation done in hardware with the paging
system.

The GC does need information from the compiler about data types.
Some benafit can be gained from marking or allocating on a 
separate heap data types that don't have any internal pointers
so that they don't need to be scanned.  Boehm has GC_malloc_atomic 
for this purpose.

Any plugable GC architecture should contain information about the
types being allocated.

Jun 26 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 The GC does need information from the compiler about data types.
 Some benafit can be gained from marking or allocating on a
 separate heap data types that don't have any internal pointers
 so that they don't need to be scanned.  Boehm has GC_malloc_atomic
 for this purpose.

 Any plugable GC architecture should contain information about the
 types being allocated.

Agree. Conservative GC is a kludge to make GC work in environments that
weren't really meant for it, but I see no point in making the task
unnecessarily more complex (and less reliable :) in a language that is
designed from ground up with GC in mind.

-fg

Jun 26 2003

Juarez Rudsatz <Juarez_member pathlink.com> writes:

A interesting link :

http://www-106.ibm.com/developerworks/ibm/library/i-incrcomp/?ca=dgr-lnxw01TrashCompactor

Jun 26 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Juarez Rudsatz wrote:
 
 A interesting link :
 
 http://www-106.ibm.com/developerworks/ibm/library/i-incrcomp/?ca=dgr-lnxw01TrashCompactor

I've started a wiki page for the GC workgroup:
   <http://www.wikiservice.at/wiki4d/wiki.cgi?GCW>
and added your link.

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 26 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

It seems what is needed is something like:

In the typeinfo for every type there should be a function,    bool
ScanObjectForPointer(void* object, void* p);   which could be used by the GC
to scan blocks;  obviously it'd not be recursive thru pointers so we would
also need   struct OPI { void* objptr; TypeInfo* type }  OPI[]
GetNonNullPointersFromObject(void* object);    .. both functions would use
the typeinfo size to determine the extent to scan.  Probably need something
to scan arrays of objects as well..

If the compiler provided a framework like this, a non-conservative GC could
be built off of it.

At very least some tables would have to be emitted with typeinfo* and offset
and size for the members relevant to the GC.

Sean

"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bdf77j$20sb$1 digitaldaemon.com...
 The GC does need information from the compiler about data types.
 Some benafit can be gained from marking or allocating on a
 separate heap data types that don't have any internal pointers
 so that they don't need to be scanned.  Boehm has GC_malloc_atomic
 for this purpose.

 Any plugable GC architecture should contain information about the
 types being allocated.

 Agree. Conservative GC is a kludge to make GC work in environments that
 weren't really meant for it, but I see no point in making the task
 unnecessarily more complex (and less reliable :) in a language that is
 designed from ground up with GC in mind.

 -fg

Jun 27 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 In the typeinfo for every type there should be a function,    bool
 ScanObjectForPointer(void* object, void* p);   which could be used by
 the GC to scan blocks;  obviously it'd not be recursive thru pointers
 so we would also need   struct OPI { void* objptr; TypeInfo* type }
 OPI[] GetNonNullPointersFromObject(void* object);    .. both
 functions would use the typeinfo size to determine the extent to
 scan.  Probably need something to scan arrays of objects as well..

 If the compiler provided a framework like this, a non-conservative GC
 could be built off of it.

 At very least some tables would have to be emitted with typeinfo* and
 offset and size for the members relevant to the GC.

I'd go for a very simple variant. You only need to know where pointers
are, so a normal ~0-terminated array of offsets relative to the struct/class
beginning (for structs and classes only, obviously) should suffice. All pointers
not inside structs/classes are roots by definition, which the compiler obviously
also needs to tell the GC about.

-fg

Jun 27 2003

"Walter" <walter digitalmars.com> writes:

"Patrick Down" <Patrick_member pathlink.com> wrote in message
news:bdf2j3$1rui$1 digitaldaemon.com...
 The GC does need information from the compiler about data types.
 Some benafit can be gained from marking or allocating on a
 separate heap data types that don't have any internal pointers
 so that they don't need to be scanned.  Boehm has GC_malloc_atomic
 for this purpose.

Yes, that can help.

 Any plugable GC architecture should contain information about the
 types being allocated.

Jun 26 2003

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Walter wrote:
 "Patrick Down" <Patrick_member pathlink.com> wrote in message
 news:bdf2j3$1rui$1 digitaldaemon.com...
 
The GC does need information from the compiler about data types.
Some benafit can be gained from marking or allocating on a
separate heap data types that don't have any internal pointers
so that they don't need to be scanned.  Boehm has GC_malloc_atomic
for this purpose.

 
 
 Yes, that can help.
 
 
Any plugable GC architecture should contain information about the
types being allocated.


I know that many people are not big fans of reference counting 
collectors, but if we're implementing a generic GC interface, then we 
should include reference/dereference handlers so that refcounters could 
work.

Or, it occurs to me, you could just have a callback to tell refcounters 
that "object has changed one or more times."  The refcounter could do 
lazy refcounting that way.  That callback would be useful as a software 
write barrier for copying collectors, too....

Of course, any such callback should be inlined (if at all possible); 
thus, GCs that don't need the callback will simply implement a no-op 
stub that will vanish when it is inlined.

Russ

Jun 27 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 I know that many people are not big fans of reference counting
 collectors, but if we're implementing a generic GC interface, then we
 should include reference/dereference handlers so that refcounters
 could work.

Refcounting cannot even work in the presence of data structures with
cyclical references, which includes very basic things like double-linked
lists and trees where nodes have pointers to their parents. Making
ref-counting work in that case requires either a complete redesign of the
datastructures or so-called "weak references", i.e. references that don't
"own" the object they're referencing.

This can silently break existing code in a myriad of ways, and it's
especially bad because it won't cause any suspicious behavior, it'll just
cause memory leaks to pile. As the whole idea was to make memory management
*easier* and not more involved and error-prone, I really don't think it's
worth it.

-fg

Jun 27 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Fabian Giesen wrote:
 
 I know that many people are not big fans of reference counting
 collectors, but if we're implementing a generic GC interface, then we
 should include reference/dereference handlers so that refcounters
 could work.

 
 Refcounting cannot even work in the presence of data structures with
 cyclical references, which includes very basic things like double-linked
 lists and trees where nodes have pointers to their parents. Making
 ref-counting work in that case requires either a complete redesign of the
 datastructures or so-called "weak references", i.e. references that don't
 "own" the object they're referencing.
 
 This can silently break existing code in a myriad of ways, and it's
 especially bad because it won't cause any suspicious behavior, it'll just
 cause memory leaks to pile. As the whole idea was to make memory management
 *easier* and not more involved and error-prone, I really don't think it's
 worth it.

I always wondered why these approaches are not combined.

Use refcounts, refcount=0 means free (90-95% of all objects)
For all others do your chosen GC algorithm.

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 28 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 I always wondered why these approaches are not combined.

 Use refcounts, refcount=0 means free (90-95% of all objects)
 For all others do your chosen GC algorithm.

Because reference counting is quite expensive in terms of CPU
time (yes, I know, it's counterintuitive).

-fg

Jun 28 2003

Helmut Leitner <leitner hls.via.at> writes:

Fabian Giesen wrote:
 
 I always wondered why these approaches are not combined.

 Use refcounts, refcount=0 means free (90-95% of all objects)
 For all others do your chosen GC algorithm.

 
 Because reference counting is quite expensive in terms of CPU
 time (yes, I know, it's counterintuitive).

I know. But in optimization you must often try things that are
counterintuitive.

One good paper that was cited (and Walters gc2) use size classes
2^N which give at least 50% memory size overhead. Nobody talks
about this. But with this amount of memory you can try a lot of
promising algorithms. 

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Jun 28 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 Because reference counting is quite expensive in terms of CPU
 time (yes, I know, it's counterintuitive).

 I know. But in optimization you must often try things that are
 counterintuitive.

Yeah, but I absolutely don't see how any garbage collector could
profit from the reference counts. You are right in the observation
that most objects do have a very short lifetime, but you don't need
reference counting to know that. And a GC doesn't even touch the
dead objects (that is, the ones with refcount 0) - it won't even find
them for the precise reason that all references to them are dead :)

I really don't see any way how the reference counts could be useful
to a collector. Maybe you could explain your idea?

 One good paper that was cited (and Walters gc2) use size classes
 2^N which give at least 50% memory size overhead. Nobody talks
 about this. But with this amount of memory you can try a lot of
 promising algorithms.

...on the cost of 50% additional memory overhead. Which pretty much
rules them out of you ask me. We're talking memory management here,
not memory wasting. As long as 50% of additional memory requirement
is no issue because memory usage is low, there's little reason to
worry about GC performance anyway. And for programs that are memory
intensive, ANYTHING you can gain in the allocator will be easily
dwarfed by the cost incurred by negative cache effects and, even worse,
virtual memory - 50% more memory usage will make swapping both start
earlier and have more to do. And anyone who has ever been in that
situation knows that paging, once it begins, can easily slow down a
program by an order of magnitude.

Don't get me wrong, I'm certainly in for new ideas, but I will easily
drop them if I think their cost is too high or not well-spent :)

-fg

Jun 28 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Fabian Giesen wrote:
 
 Because reference counting is quite expensive in terms of CPU
 time (yes, I know, it's counterintuitive).

 I know. But in optimization you must often try things that are
 counterintuitive.

 
 Yeah, but I absolutely don't see how any garbage collector could
 profit from the reference counts. You are right in the observation
 that most objects do have a very short lifetime, but you don't need
 reference counting to know that. And a GC doesn't even touch the
 dead objects (that is, the ones with refcount 0) - it won't even find
 them for the precise reason that all references to them are dead :)
 
 I really don't see any way how the reference counts could be useful
 to a collector. Maybe you could explain your idea?
 
 One good paper that was cited (and Walters gc2) use size classes
 2^N which give at least 50% memory size overhead. Nobody talks
 about this. But with this amount of memory you can try a lot of
 promising algorithms.

 
 ...on the cost of 50% additional memory overhead. Which pretty much
 rules them out of you ask me. We're talking memory management here,
 not memory wasting. As long as 50% of additional memory requirement
 is no issue because memory usage is low, there's little reason to
 worry about GC performance anyway. And for programs that are memory
 intensive, ANYTHING you can gain in the allocator will be easily
 dwarfed by the cost incurred by negative cache effects and, even worse,
 virtual memory - 50% more memory usage will make swapping both start
 earlier and have more to do. And anyone who has ever been in that
 situation knows that paging, once it begins, can easily slow down a
 program by an order of magnitude.
 
 Don't get me wrong, I'm certainly in for new ideas, but I will easily
 drop them if I think their cost is too high or not well-spent :)

Currently I need my brain for the ICFP contest, but I'd like to comment
on that in a few days. It's inevitable because I started that Wiki page.

One shouldn't rule out things too early. Only few people are aware that
malloc/free typically has a 12-20 byte overhead per allocation. And we
don't yet talk about fragmentation.

Anyway I don't think that "state of the art GC" whatever that may be is
good enough for D. If D is to replace C (and I hope it will, I'll put
some trust and work into it) then it must have an absolutely flawless
memory management system. That's the winning point - not some fancy
syntax or OO feature. 

Any small linear overhead on allocation or deallocation is no problem.
But a 10 sec collection pause is. 

I would even say, that memory mangement is so important, that all other
features must be constructed around it to support its efficiency. 

But, as I said, let's discuss this a bit later.

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Jun 28 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 Any small linear overhead on allocation or deallocation is no problem.
 But a 10 sec collection pause is.

As I said before, there are collectors that can satisfy hard realtime
requirements.

-fg

Jun 28 2003

Helmut Leitner <leitner hls.via.at> writes:

Fabian Giesen wrote:
 
 Any small linear overhead on allocation or deallocation is no problem.
 But a 10 sec collection pause is.

 
 As I said before, there are collectors that can satisfy hard realtime
 requirements.

I think I read the link you mentioned. But did you also read that it
has an 4-6 (worst case 10) factor between "live object" data and
maximum heap size?

That is: when a program at a point has 2 MB of objects it may need
at the same time 20MB (ok typical 10 MB) heap to do this real time?

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Jun 29 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 I think I read the link you mentioned. But did you also read that it
 has an 4-6 (worst case 10) factor between "live object" data and
 maximum heap size?

 That is: when a program at a point has 2 MB of objects it may need
 at the same time 20MB (ok typical 10 MB) heap to do this real time?

There are no "typical" values in that table, it's strictly worst-case
analysis, because the absolute upper bounds are what counts for hard
real time requirements. Also note that the statistics given are assuming
the use of a segregated storage allocator - presumably because worst-case
bounds for segregated storage are far easier to compute than for e.g. best
fit :). Finally, the actual GC overhead is "twice the amount of memory that
is allocated per full garbage collection cycle" (per size class; page 26) - the
rest is due to the memory allocator used.

Finally, all this is for *hard* realtime requirements. Relaxing this to soft
real-time (i.e. trading somewhat worse worst-case complexity for far better
average-complexity) should give quite competitive results (even though it won't
show in the worst-case statistics, of course).

-fg

Jun 30 2003

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Fabian Giesen wrote:
I know that many people are not big fans of reference counting
collectors, but if we're implementing a generic GC interface, then we
should include reference/dereference handlers so that refcounters
could work.

 
 
 Refcounting cannot even work in the presence of data structures with
 cyclical references, which includes very basic things like double-linked
 lists and trees where nodes have pointers to their parents. Making
 ref-counting work in that case requires either a complete redesign of the
 datastructures or so-called "weak references", i.e. references that don't
 "own" the object they're referencing.
 
 This can silently break existing code in a myriad of ways, and it's
 especially bad because it won't cause any suspicious behavior, it'll just
 cause memory leaks to pile. As the whole idea was to make memory management
 *easier* and not more involved and error-prone, I really don't think it's
 worth it.
 
 -fg

This problem has been solved.

http://www.research.ibm.com/people/d/dfb/papers.html

Read the links on "Recycler"

Jun 30 2003

Mark Evans <Mark_member pathlink.com> writes:

http://www.cs.kent.ac.uk/people/staff/rej/gc.html

Jul 02 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 You don't really need to wait for me, write a better gc! The one
 that's in there now aimed to simply work correctly and get D off the
 ground.

Hm, the problem is that most of the more interesting GC algorithms
require changes to the compiler/runtime environment. Conservative GC
is actually a bad idea with a language/environment that's designed for
use with garbage collection, simply because one can do without any
scanning for pointers, *given that the compiler provides the necessary
metadata about data structures in the program* :)

Some algorithms (like the Realtime GC one for which I posted a link to
the paper yesterday) also require read or write barriers, which also
need to be implemented in code generation.

Overall I doubt that a truly efficient GC can be done as just a
pluggable library module - it requires very close cooperation with the
compiler.

-fg

Jun 24 2003

"Walter" <walter digitalmars.com> writes:

"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bdabu4$62p$1 digitaldaemon.com...
 Hm, the problem is that most of the more interesting GC algorithms
 require changes to the compiler/runtime environment. Conservative GC
 is actually a bad idea with a language/environment that's designed for
 use with garbage collection, simply because one can do without any
 scanning for pointers, *given that the compiler provides the necessary
 metadata about data structures in the program* :)

 Some algorithms (like the Realtime GC one for which I posted a link to
 the paper yesterday) also require read or write barriers, which also
 need to be implemented in code generation.

 Overall I doubt that a truly efficient GC can be done as just a
 pluggable library module - it requires very close cooperation with the
 compiler.

Write barriers can be done entirely within the gc, if the paging hardware is
used to implement them.

Jun 24 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 Write barriers can be done entirely within the gc, if the paging
 hardware is used to implement them.

I doubt that using paging to detect pointer writes is a good idea.
You'd get literally tons of page faults using that technique to
detect pointer writes for data structures that often change. And
a page fault is far from cheap :)

-fg

Jun 24 2003

"Walter" <walter digitalmars.com> writes:

"Fabian Giesen" <rygNO SPAMgmx.net> wrote in message
news:bdajmr$d0s$1 digitaldaemon.com...
 Write barriers can be done entirely within the gc, if the paging
 hardware is used to implement them.

 I doubt that using paging to detect pointer writes is a good idea.
 You'd get literally tons of page faults using that technique to
 detect pointer writes for data structures that often change. And
 a page fault is far from cheap :)

For a generational collector, how it works is for the 'old' generation, the
pages are marked read-only. If anyone writes to a page, that generates a
fault. The gc catches the fault, marks the page as "needs to be rescanned",
turns on write for that page, and restarts the offending write instruction.

The reason it is effective is because statistically old generation pages are
rarely written to, and the overhead of a page fault only happens once per
page, not for every write.

The scheme would work even better if the paging hardware was set up to work
with the gc and collect the dirty bits automatically. Unfortunately, no
operating system I know of provides an API for that.

Jun 24 2003

"Fabian Giesen" <rygNO SPAMgmx.net> writes:

 For a generational collector, how it works is for the 'old'
 generation, the pages are marked read-only. If anyone writes to a
 page, that generates a fault. The gc catches the fault, marks the
 page as "needs to be rescanned", turns on write for that page, and
 restarts the offending write instruction.

 The reason it is effective is because statistically old generation
 pages are rarely written to, and the overhead of a page fault only
 happens once per page, not for every write.

Hm okay, now that I think of it, that *would* make sense. Definitely
worth a try :)

-fg

Jun 25 2003

Mark Evans <Mark_member pathlink.com> writes:

This was another interesting find. -M.

"The story of Jython begins in the spring of 1997 while I was working on my PhD
at MIT. While doing some benchmark work comparing the performance of Numeric
Python to a variety of other programming languages, I was amazed to discover
that Java was as fast as C-code for some simple numeric benchmarks. Previously,
I had been uninterested in Java because I saw it as an inferior replacement for
Python. Now I saw the possibility that Java could be a replacement for the ugly
and error-prone C code that I was writing for the performance intensive parts of
my systems."

http://hugunin.net/story_of_jython.html

Jul 22 2003

D Programming

C/C++ Programming

Other

D - Java vs C Benchmarks