www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - General Problems for GC'ed Applications?

reply Karen Lanrap <karen digitaldaemon.com> writes:
I see three problems:

1) The typical behaviour of a GC'ed application is to require more 
and more main memory but not to need it. Hence every GC'ed 
application forces the OS to diminish the size of the system cache 
held in main memory until the GC of the application kicks in.

2) If the available main memory is unsufficient for the true memory 
requirements of the application and the OS provides virtual memory 
by swapping out to secondary storage, every run of the GC forces 
the OS to slowly swap back all data for this application from 
secondary storage and runs of the GC occur frequently, because main 
memory is tight.

3) If there is more than one GC'ed application running, those 
applications compete for the available main memory.


I see four risks:

a) from 1: The overall reaction of the system gets slower in favor 
for the GC'ed application.

b) from 2: Projects decomposed into several subtasks may face 
severe runtime problems when integrating the independent and 
succesful tested modules.

c) from 2 and b: The reduction of man time in the development and 
maintenance phases for not being forced to avoid memory leaks may 
be overly compensated by an increase of machine time by a factor of 
50 or more.

d) from 1 and 3: A more complicated version of the dining 
philosophers problem is introduced. In this version every 
philosopher is allowed to rush around the table and grab all unused 
forks and declare them used, before he starts to eat---and nobody 
can force him to put them back on the table.


Conclusion:

I know that solving the original dining philosophers problem took 
several years and I do not see any awareness towards this more 
complicated version arising by using a GC.
Risks c) and d) are true killers.
Therefore GC'ed applications currently seem to be suitable only if 
they are running single instance on a machine well equipped with 
main memory and no other GC'ed applications are used.
To assure that these conditions hold, the GC should maintain 
statistics on the duration of its runs and frequency of calls. This 
would allows the GC to throw an "Almost out of memory".
Jul 22 2006
next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Personally, I see it as good coding practice to, in D, use delete on 
those things you know you need/want to delete immediately.  Just don't 
go nuts about those things you don't.

Typically, in a well programmed piece of software, the number of 
allocations you can't be entirely sure about deleting will be much less 
than those you are sure about.  Without a GC, this small percentage 
causes a huge amount of headache.  With it, you can ignore these and 
worry only about the easily-determined cases.

Throwing caution to the wind and using the GC for everything isn't, in 
my opinion, a good idea.  If anyone disagrees with me, I'd like to hear 
it, but I know of no reason why you'd want to do that.  I realize many 
people do throw caution to the wind, but so do many people drive drunk. 
  It doesn't mean anyone suggests it, necessarily.

#1 does not seem to be a severe problem to me.  Memory usage will be 
higher, but that's why there's flex for cache and buffers.

#2 would be a problem for any application.  I do not estimate that my 
application, if I did not track every allocation down (but only delete 
when I know for sure), would use even 50% more memory.

As such, I think it would be difficult to show a piece of well-written 
software that runs into this problem using the GC, but does not 
otherwise.  IMHO, if a program requires so much memory swapping starts 
happening frequently, it's a bug, a design flaw, or a fact of life. 
It's not the garbage collector.

Yes, a collect will cause swapping - if you have that much memory used. 
  Ideally, collects won't happen often (since they can't just happen 
whenever anyway, they happen when you use up milestones of memory) and 
you can disable/enable the GC and run collects manually when it makes 
the most sense for your software.

Failing that, software which is known to use a large amount of memory 
may need to use manual memory management.  Likely said software will 
perform poorly anyway.

#3 depends on the above cases being true.  Some competition may happen 
indeed, but I think it would take more analysis to see how strongly this 
would affect things.

Neither is this a new problem; programs will always compete for main 
memory.  #3 is only a problem when taking into account the higher memory 
usage, and is negligible if the memory usage is not much higher.

I would say these problems are heavily dependent on application design, 
code quality, and purpose.  In other words, these are more problems for 
the programmer than for the garbage collector.  But this is only my opinion.

As for the risks... a I see as the OS's problem where #1 is an issue; #2 
I see as a problem regardless of garbage collection; c I agree with, as 
mentioned above, these should be balanced - only when it is not of 
benefit should the "leak" be left for the GC.

I'm afraid I'm not terribly familiar with the dining philosopher's 
problem, but again I think this is a problem only somewhat aggravated by 
garbage collection.

Most of your post seems to be wholly concerned with applications that 
use at least the exact figure of Too Much Memory (tm).  While I realize 
there are several special-cases where such usage is necessary or 
acceptable, I seem at a loss to think of any general or practical 
reasons, aside from poor code quality... or database systems.

The software on my home computer which uses the most memory uses nothing 
even close to the amount of system memory I have.  Indeed, the sum of 
memory use on my machine with several programs running is still less 
than that typically shipped with machines even a few years ago (I'm 
putting that number at 512 megabytes.)

The servers I manage have fairly standard amounts of ram (average, let's 
say, 2 gigabytes.)  Typically, even in periods of high traffic use, they 
do not take much swap.  In fact, Apache is garbage collected 
(pools/subpools, that is) and doesn't seem to be a problem at all.  PHP, 
which is used on many of these servers, is not garbage collected 
(traditional memory management) and tends to hog memory just a bit.

MySQL and other database systems obviously take the largest chunk.  For 
such systems, you don't want any of your data paged, ever.  You 
typically have large, static cache areas which you don't even want 
garbage collected, and you never realloc/free until the end of the 
process.  These areas would not be garbage collected and the data in 
them would not be scanned by the garbage collector.

In fact, you'd normally want your database on a dedicated machine with 
extra helpings of memory for these reasons.  Whether or not it was 
garbage collected wouldn't affect whether it was suitable only as a 
single instance on a single machine.  As above, this is a matter of the 
software, not of the GC.

So my suggestion is that you look at the limitations of your software, 
design, and development team and make your decisions from there.  A 
sweeping statement that garbage collection causes a dining philosopher's 
problem just doesn't seem correct to me.

Thanks,
-[Unknown]


 I see three problems:
 
 1) The typical behaviour of a GC'ed application is to require more 
 and more main memory but not to need it. Hence every GC'ed 
 application forces the OS to diminish the size of the system cache 
 held in main memory until the GC of the application kicks in.
 
 2) If the available main memory is unsufficient for the true memory 
 requirements of the application and the OS provides virtual memory 
 by swapping out to secondary storage, every run of the GC forces 
 the OS to slowly swap back all data for this application from 
 secondary storage and runs of the GC occur frequently, because main 
 memory is tight.
 
 3) If there is more than one GC'ed application running, those 
 applications compete for the available main memory.
 
 
 I see four risks:
 
 a) from 1: The overall reaction of the system gets slower in favor 
 for the GC'ed application.
 
 b) from 2: Projects decomposed into several subtasks may face 
 severe runtime problems when integrating the independent and 
 succesful tested modules.
 
 c) from 2 and b: The reduction of man time in the development and 
 maintenance phases for not being forced to avoid memory leaks may 
 be overly compensated by an increase of machine time by a factor of 
 50 or more.
 
 d) from 1 and 3: A more complicated version of the dining 
 philosophers problem is introduced. In this version every 
 philosopher is allowed to rush around the table and grab all unused 
 forks and declare them used, before he starts to eat---and nobody 
 can force him to put them back on the table.
 
 
 Conclusion:
 
 I know that solving the original dining philosophers problem took 
 several years and I do not see any awareness towards this more 
 complicated version arising by using a GC.
 Risks c) and d) are true killers.
 Therefore GC'ed applications currently seem to be suitable only if 
 they are running single instance on a machine well equipped with 
 main memory and no other GC'ed applications are used.
 To assure that these conditions hold, the GC should maintain 
 statistics on the duration of its runs and frequency of calls. This 
 would allows the GC to throw an "Almost out of memory".

Jul 23 2006
next sibling parent reply "Andrew Fedoniouk" <news terrainformatica.com> writes:
I agree on 100%.

I would add only here:
GC is one of possible memory management technics.
Explicit new/delete (heap) is another, reference counting is
here too.
Each memory management technic is optimal in its
own area and use case.
No one of these is a mega-super-silver-bullet for all problems.

As D allows you to use both worlds it is potentially
suitable for engineering optimal or suboptimal applications
(in respect of memory consuming)

Yes, D does not allow you to use smart pointers/reference counting
(as a memory management technic) but this is not a principal
problem and I beleive will be implemented in D one day.

Andrew Fedoniouk.
http://terrainformatica.com
Jul 23 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Andrew Fedoniouk wrote:
 Yes, D does not allow you to use smart pointers/reference counting
 (as a memory management technic) but this is not a principal
 problem and I beleive will be implemented in D one day.

You can do reference counting in D programming, you just have to do it manually (I've been doing it manually for years, it isn't that bad).
Jul 23 2006
parent "Andrew Fedoniouk" <news terrainformatica.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:ea0e80$bkg$2 digitaldaemon.com...
 Andrew Fedoniouk wrote:
 Yes, D does not allow you to use smart pointers/reference counting
 (as a memory management technic) but this is not a principal
 problem and I beleive will be implemented in D one day.

You can do reference counting in D programming, you just have to do it manually (I've been doing it manually for years, it isn't that bad).

It is about smart pointers - automatic reference counting. I mean ideal thing shall allow automatic GC *and* automatic RC. Both as first class citizens. Currently you can do in D automatic GC but manual RC and in C++ automatic RC and manual GC. This situation is not doing anyone of them significantly better, at least for tasks in my area of competence. I am personally using both - GC and RC so I am making decisions (C++ or D) based on different criterias (next step in decision making graph). The problem is that market is dichotomic now: either do everything non-manageable (RC) either manageable (GC). But not both. (That ugly attempt named MC++ is just a design disaster, IMO). Having tool allowing to use best of both worlds will put such tool on the next step of the ladder. Andrew.
Jul 23 2006
prev sibling next sibling parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
Oops.... I meant, of course:

As for the risks... (a) I see as the OS's problem, where #1 is an issue; 
(b) I see as a problem regardless of garbage collection; (c) I agree 
with, as I mentioned above, and these should be balanced - only when it 
is not of more use to track down the memory should it be left to the GC.

That's what I get for typing such a response at that time of night :/. 
"a", "#2", "c"... man...

-[Unknown]


 As for the risks... a I see as the OS's problem where #1 is an issue; #2 
 I see as a problem regardless of garbage collection; c I agree with, as 
 mentioned above, these should be balanced - only when it is not of 
 benefit should the "leak" be left for the GC.

Jul 23 2006
prev sibling parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Unknown W. Brackets wrote:

 Yes, a collect will cause swapping - if you have that much
 memory used. 
   Ideally, collects won't happen often (since they can't just
   happen 
 whenever anyway, they happen when you use up milestones of
 memory) and you can disable/enable the GC and run collects
 manually when it makes the most sense for your software.
 
 Failing that, software which is known to use a large amount of
 memory may need to use manual memory management.  Likely said
 software will perform poorly anyway.

I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 GB memory, from which 0.7 GB to 0.9 GB are vital data. If you run this program on a machine equipped with 1 GB, the OS will swap out the 0.8 GB data that is accessed infrequently. Therefore this program cause swapping only if it accesses data from the swapped out part of data and the size of the swapped data will be approximately bounded by doubling the size of the data needed to be swapped back. This changes dramatically if you GC it, because on every allocation the available main memory is exhausted and the GC requires the OS to swap all 0.8 GB back, doesn't it.
 I'm afraid I'm not terribly familiar with the dining
 philosopher's problem, but again I think this is a problem only
 somewhat aggravated by garbage collection.
 
 Most of your post seems to be wholly concerned with applications
 that use at least the exact figure of Too Much Memory (tm). 

It is not only somewhat aggravated. Assume the example given above is doubled by two instances of that program and the main memory is not only doubled to 2GB but increased to 4GB or even more. Again both non GC'ed version of the program run without any performance problems, but the GC'ed versions do not---although the memory size is increased by a factor that enables the OS to not swap out any allocated data in case of the non GC'ed versions. This is because both programs at least slowly increase their allocations of main memory. This goes without performance problems unitl the available main memory is exhausted. The first program that hits the limit starts GC'ing its allocated memory---and forces the OS to swap all in. Hence this first program is getting in the danger that all memory freed by its GC is immetiately eaten up by the other instance, that continues running unaffected because its thirst for main memory is accompülished by the GC of the other instance, if that GC is freeing memory as the GC recognizes it. At the time when this GC run ends there are at least two cases distinguishable: a) the main memory at the end of the run is still insufficient, because the other application ate it all up. Then this instance stops with "out of memory". b) the main memory at the end of the run by chance is sufficient, because the other application was not that hungry. Then this instance will start being performant again. But only for the short time until the limit is reached again. This is a simple example with only one processor and two competing applications---and I believe that case a) can happen. So I feel unable to prove that on multi-core machines running several GC'ed applications the case a) will never happen. And even if case a) never happens there might be always at least one application that is running its GC. Hence swapping si always on the run.
 A sweeping statement that garbage collection causes
 a dining philosopher's problem just doesn't seem correct to me.

Then prove me wrong.
Jul 24 2006
next sibling parent reply Dave <Dave_member pathlink.com> writes:
Karen Lanrap wrote:
 Unknown W. Brackets wrote:
 
 Yes, a collect will cause swapping - if you have that much
 memory used. 
   Ideally, collects won't happen often (since they can't just
   happen 
 whenever anyway, they happen when you use up milestones of
 memory) and you can disable/enable the GC and run collects
 manually when it makes the most sense for your software.

 Failing that, software which is known to use a large amount of
 memory may need to use manual memory management.  Likely said
 software will perform poorly anyway.

I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 GB memory, from which 0.7 GB to 0.9 GB are vital data. If you run this program on a machine equipped with 1 GB, the OS will swap out the 0.8 GB data that is accessed infrequently. Therefore this program cause swapping only if it accesses data from the swapped out part of data and the size of the swapped data will be approximately bounded by doubling the size of the data needed to be swapped back. This changes dramatically if you GC it, because on every allocation the available main memory is exhausted and the GC requires the OS to swap all 0.8 GB back, doesn't it.
 I'm afraid I'm not terribly familiar with the dining
 philosopher's problem, but again I think this is a problem only
 somewhat aggravated by garbage collection.

 Most of your post seems to be wholly concerned with applications
 that use at least the exact figure of Too Much Memory (tm). 

It is not only somewhat aggravated. Assume the example given above is doubled by two instances of that program and the main memory is not only doubled to 2GB but increased to 4GB or even more. Again both non GC'ed version of the program run without any performance problems, but the GC'ed versions do not---although the memory size is increased by a factor that enables the OS to not swap out any allocated data in case of the non GC'ed versions. This is because both programs at least slowly increase their allocations of main memory. This goes without performance problems unitl the available main memory is exhausted. The first program that hits the limit starts GC'ing its allocated memory---and forces the OS to swap all in. Hence this first program is getting in the danger that all memory freed by its GC is immetiately eaten up by the other instance, that continues running unaffected because its thirst for main memory is accompülished by the GC of the other instance, if that GC is freeing memory as the GC recognizes it. At the time when this GC run ends there are at least two cases distinguishable: a) the main memory at the end of the run is still insufficient, because the other application ate it all up. Then this instance stops with "out of memory". b) the main memory at the end of the run by chance is sufficient, because the other application was not that hungry. Then this instance will start being performant again. But only for the short time until the limit is reached again. This is a simple example with only one processor and two competing applications---and I believe that case a) can happen. So I feel unable to prove that on multi-core machines running several GC'ed applications the case a) will never happen. And even if case a) never happens there might be always at least one application that is running its GC. Hence swapping si always on the run.

Someone else pointed out earlier that "stupid is as stupid does" with regard to memory mgmt., whether or not your using a GC. This has historically been a big problem with how Java programs are written. D OHOH allows manual deletion, plus the combination of the GC along with things like easy array slicing should allow for memory efficient design patterns that are not only feasible but efficient to develop and maintain. Plus of course D allows crt memory mgmt. should you really need that.
  
 A sweeping statement that garbage collection causes
 a dining philosopher's problem just doesn't seem correct to me.

Then prove me wrong.

You're making the original assertion that it's a problem - I believe the onus is on you to prove that it would apply to efficient design patterns using D <g>
Jul 24 2006
parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Dave wrote:
 You're making the original assertion that it's a problem - I
 believe the onus is on you to prove that it would apply to
 efficient design patterns using D <g>

If efficient design patterns using D forbid memory leaks, then there will never occur any garbage in any application. This would rise the question why the GC is enabled by default. If on the other hand memory leaks do appear I have a running example that shows a side effect of enabling the GC similar to what I have brain stormed. As far as I can see, this side effect is neither documented here nor have I found any mentioning in other resources on the net. But according to those Unknown guys here, its of no interest here anyway.
Jul 25 2006
parent reply Dave <Dave_member pathlink.com> writes:
Karen Lanrap wrote:
 Dave wrote:
 You're making the original assertion that it's a problem - I
 believe the onus is on you to prove that it would apply to
 efficient design patterns using D <g>

If efficient design patterns using D forbid memory leaks, then there will never occur any garbage in any application. This would rise the question why the GC is enabled by default. If on the other hand memory leaks do appear I have a running example that shows a side effect of enabling the GC similar to what I have brain stormed. As far as I can see, this side effect is neither documented here nor have I found any mentioning in other resources on the net. But according to those Unknown guys here, its of no interest here anyway.

Don't assume that - I don't think any of us are trying to shut the door on anything. If you have some code to post that'd be great. It's just that (as I read it) you made some strong general and sweeping assertions about GC in general that I don't think reflect general use of the GC. Many of the long-term contributors to this group are aware of some issues with the "first generation" GC, and no one's ever claimed GC is a panacea, especially for a systems language like D. Only that using the GC shouldn't be ruled out for general programming chores unless proved otherwise. Yes, the primary mode of memory mgmt. for D is GC but of course it's not the only one precisely because it will never be perfect for every job.
Jul 25 2006
parent Karen Lanrap <karen digitaldaemon.com> writes:
Dave wrote:
 you made some strong general and sweeping assertions about GC in
 general that I don't think reflect general use of the GC.

You may be right, because I introduced at least two faults: 1) I used the word "typical behaviour" where I would have been better off with "behaviour in general" or "not excludable behaviour". 2) I still have not found an example where D's memory management allows for steadily growing allocated memory only interrupted by starting of some GC sweeps. I would be glad if someone can hint to an argument that such behaviour is impossible with this GC---then I can stop that search. The side effects I detected are rooted in the fact, that the sweeps of the GC break the locality of data accesses. 1. Observation: There are cases of only one application, where a memory leak causes severe performance degradation, although the GC is enabled. 2. Observation: If there are more than one application poisoned by memory leaks and although the GC is enabled, there are cases where caused by the memory leaks 2.a. all but one application are that slow, that they seem to be dead. 2.b. the capability of a system to run a number of applications decreases by the factor "online data" / "all data" approximately If these cases are of no unterest, then it is useless to post any code.
Jul 25 2006
prev sibling next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Karen,

Your response seems to indicate a lack of knowledge about garbage 
collection, but perhaps I'm only reading what you said wrong.

First of all, let's get this clear:

1. Every call to malloc will not cause a collect.
2. It is, in fact, unlikely that two subsequent calls should ever 
trigger two subsequent collects - because of pooling.
3. Pooling does increase memory use, but it also means less collects.

Any program which triggers collections frequently is written badly.  If 
you must ACTIVELY and continuously allocate chunks of ram larger than 
64k you either:

   - need to avoid using the GC for those allocations.
   - need to disable the GC while allocating that data.
   - have a serious design flaw.
   - are not a reputable or skilled programmer.

Assuming you won't agree with the above, though, clearly garbage 
collection simply does not work for the *uncommon* and *impractical* 
case of constant and large allocations.  If you do not agree that this 
is an uncommon thing in computer programming, please say so.  I will not 
bother responding to you any further.

Furthermore, it is entirely practical to write generational garbage 
collectors, or other garbage collectors utilizing different methods or 
processes.  This is not done in the current implementation of D.  Yet, 
if it were then this problem could be avoided.

Regardless, I maintain that such a program would perform poorly.  I 
don't care if you have 20 gigs of main system memory.  Any program that 
is constantly allocating and filling large amounts of memory WILL BE 
SLOW, at least in my experience.

Please understand that the garbage collector, at least in D, works 
something like this (as far as I understand):

1. A batch of memory is allocated.  I believe this happens in fixed 
chunks of 64k, but it may scale the size of them.

2. From this memory, parts are dolled out.

3. If a "large" allocation happens, there is special code to handle this.

For more information, please see the source code to Phobos' garbage 
collector, available in src/phobos/internal/gc/gcx.d.

You could, theoretically, tell your garbage collector not to scan the 
memory range you allocated so it would never get swapped in (unless this 
range also contains pointers.)  In such a case, I again point to the 
programmer as the one at fault.

Thus you could avoid swapping in those special cases where you need 
large amounts of memory.  Again, I do not believe such things are 
common.  If you are unable to program with efficiency in respect to 
memory, I suggest you find a new occupation.

I hope you do not take offense to that, but I truly believe too many 
people these days try to force themselves into things they just aren't 
any good at.  Some people would make wonderful lawyers but they think 
being a doctor is cooler, so they make their lives horrible.

Honestly, I feel like I'm debating how dangerous it would be to be hit 
by a sedan or an SUV.  I really don't care, it's going to hurt either 
way.  A lot.  The answer is not to get hit, not to say that we should 
all break our bones with sedans because it's not as bad.

I mean, really.  It's one thing to argue about theoretical problems but 
it's quite another to argue about impractical ones and accuse a 
methodology of being flawed because it could fail in these impractical 
cases.  That's just not the logic I was taught.  Doesn't jive.

I really don't care to prove you wrong.  I've said what I'm going to 
say.  I may respond again if you seem reasonable and bring up something 
new; but if you bring nothing else new in (as with this post)... you've 
lost my interest.

Of course, this is only my opinion and understanding.

-[Unknown]


 Unknown W. Brackets wrote:
 
 I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 
 GB memory, from which 0.7 GB to 0.9 GB are vital data. If you run 
 this program on a machine equipped with 1 GB, the OS will swap out 
 the 0.8 GB data that is accessed infrequently. Therefore this 
 program cause swapping only if it accesses data from the swapped 
 out part of data and the size of the swapped data will be 
 approximately bounded by doubling the size of the data needed to be 
 swapped back.
 
 This changes dramatically if you GC it, because on every allocation 
 the available main memory is exhausted and the GC requires the OS 
 to swap all 0.8 GB back, doesn't it. 
 
 
 I'm afraid I'm not terribly familiar with the dining
 philosopher's problem, but again I think this is a problem only
 somewhat aggravated by garbage collection.

 Most of your post seems to be wholly concerned with applications
 that use at least the exact figure of Too Much Memory (tm). 

It is not only somewhat aggravated. Assume the example given above is doubled by two instances of that program and the main memory is not only doubled to 2GB but increased to 4GB or even more. Again both non GC'ed version of the program run without any performance problems, but the GC'ed versions do not---although the memory size is increased by a factor that enables the OS to not swap out any allocated data in case of the non GC'ed versions. This is because both programs at least slowly increase their allocations of main memory. This goes without performance problems unitl the available main memory is exhausted. The first program that hits the limit starts GC'ing its allocated memory---and forces the OS to swap all in. Hence this first program is getting in the danger that all memory freed by its GC is immetiately eaten up by the other instance, that continues running unaffected because its thirst for main memory is accompülished by the GC of the other instance, if that GC is freeing memory as the GC recognizes it. At the time when this GC run ends there are at least two cases distinguishable: a) the main memory at the end of the run is still insufficient, because the other application ate it all up. Then this instance stops with "out of memory". b) the main memory at the end of the run by chance is sufficient, because the other application was not that hungry. Then this instance will start being performant again. But only for the short time until the limit is reached again. This is a simple example with only one processor and two competing applications---and I believe that case a) can happen. So I feel unable to prove that on multi-core machines running several GC'ed applications the case a) will never happen. And even if case a) never happens there might be always at least one application that is running its GC. Hence swapping si always on the run.
 A sweeping statement that garbage collection causes
 a dining philosopher's problem just doesn't seem correct to me.

Then prove me wrong.

Jul 24 2006
parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Unknown W. Brackets wrote:

 I've said what I'm going to say.

Yes. Near to nothing about the brain stormed problem but much down from the thrown you have built under yourself.
Jul 25 2006
parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
I have no throne, Karen, but if you want to argue effectively with 
people, repeating yourself just won't work.

If you don't believe I've addressed your comments, I'm sorry.  I believe 
I have, in any practical and useful application.  I really don't have 
the spare time to theorize about completely impractical cases.

Obviously, I'm only one person.  I'm just letting you know that you've 
basically lost my interest.  You really shouldn't care about that, since 
I'm only one person.

That said, if you've lost my interest or not gotten your point through 
to me such that I have addressed it, it's not impossible that your 
argument is unconvincing.  I suggest you strengthen it, if you don't (as 
is clear) agree with any of my assertions.  There are many people who 
prefer to deal in the practical and not the impractical.

Again, I'm sorry if you feel I've been condescending.  The only things I 
said that could make you feel that are that I don't consider people who 
have poor memory management people who should be programmers (I meant 
that in general, and was not saying you did or did not have said skill) 
and that I tire of arguing about what I feel to be impractical issues.

If I thought I was better than you, I probably wouldn't have spent the 
time typing responses to you.  After all, my time (just as yours) is 
valuable and I could be doing other productive things with my time, just 
as you could.

In another comment in this thread you said, "according to those Unknown 
guys here, its of no interest".  It is simply of no further interest to 
me.  I've heard your argument, I stand unconvinced, and you have not 
added anything new or addressed what I feel are flaws in your argument.

But, please understand, that is only me.  I really hope I haven't hurt 
your or anyone's feelings.  It sounds like I have.

-[Unknown]


 Unknown W. Brackets wrote:
 
 I've said what I'm going to say.

Yes. Near to nothing about the brain stormed problem but much down from the thrown you have built under yourself.

Jul 25 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Karen Lanrap wrote:
 I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 
 GB memory, from which 0.7 GB to 0.9 GB are vital data. If you run 
 this program on a machine equipped with 1 GB, the OS will swap out 
 the 0.8 GB data that is accessed infrequently. Therefore this 
 program cause swapping only if it accesses data from the swapped 
 out part of data and the size of the swapped data will be 
 approximately bounded by doubling the size of the data needed to be 
 swapped back.
 
 This changes dramatically if you GC it, because on every allocation 
 the available main memory is exhausted and the GC requires the OS 
 to swap all 0.8 GB back, doesn't it. 

No, it doesn't require it to all be swapped in. It fact, it doesn't require any of it to be swapped in, unless a full collect is done. Full collects are not performed on every allocation - that would be a terrible design if it did.
Jul 27 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Karen Lanrap wrote:
 I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 GB 
 memory, from which 0.7 GB to 0.9 GB are vital data. If you run this 
 program on a machine equipped with 1 GB, the OS will swap out the 0.8 
 GB data that is accessed infrequently. Therefore this program cause 
 swapping only if it accesses data from the swapped out part of data 
 and the size of the swapped data will be approximately bounded by 
 doubling the size of the data needed to be swapped back.

 This changes dramatically if you GC it, because on every allocation 
 the available main memory is exhausted and the GC requires the OS to 
 swap all 0.8 GB back, doesn't it. 

No, it doesn't require it to all be swapped in. It fact, it doesn't require any of it to be swapped in, unless a full collect is done. Full collects are not performed on every allocation - that would be a terrible design if it did.

By the way, there's no reason that even a full collect must swap all 0.8 GB back in. Some GCs use an alternate approach where pages are scanned and marked stale when the VMM swaps them to disk, so no page faults occur on collection runs. Sean
Jul 27 2006
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 By the way, there's no reason that even a full collect must swap all 0.8 
 GB back in.  Some GCs use an alternate approach where pages are scanned 
 and marked stale when the VMM swaps them to disk, so no page faults 
 occur on collection runs.

I only know of one such collector, and it required a specially patched (Linux) kernel that notifies a process before it swaps its pages to disk and allows it to specify which pages to swap (since it touches the pages on receiving the swap warning, which for a normal LRU-like scheme stops it from being swapped). So the reason a full collect must swap all 0.8 GB back in might be the absence of such a patched kernel, for one :). (I wouldn't want to require a user to patching his OS just to run my GC-ed program) Also, that particular GC would sometimes do an actual full collect of the memory, since otherwise swapped-out garbage might never be collected.
Jul 27 2006
parent reply Sean Kelly <sean f4.ca> writes:
Frits van Bommel wrote:
 Sean Kelly wrote:
 By the way, there's no reason that even a full collect must swap all 
 0.8 GB back in.  Some GCs use an alternate approach where pages are 
 scanned and marked stale when the VMM swaps them to disk, so no page 
 faults occur on collection runs.

I only know of one such collector, and it required a specially patched (Linux) kernel that notifies a process before it swaps its pages to disk and allows it to specify which pages to swap (since it touches the pages on receiving the swap warning, which for a normal LRU-like scheme stops it from being swapped).

Yes, the scheme isn't supported everywhere, though I had thought it was possible on Linux without a kernel patch.
 Also, that particular GC would sometimes do an actual full collect of 
 the memory, since otherwise swapped-out garbage might never be collected.

True. It would be somewhat similar to a generational GC in some respects, with stale pages representing "mature" data. Sean
Jul 27 2006
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 Frits van Bommel wrote:
 Sean Kelly wrote:
 By the way, there's no reason that even a full collect must swap 



scanned and marked stale when the VMM swaps them to disk, so no page faults occur on collection runs.
 I only know of one such collector, and it required a specially 


to disk and allows it to specify which pages to swap (since it touches the pages on receiving the swap warning, which for a normal LRU-like scheme stops it from being swapped).
 Yes, the scheme isn't supported everywhere, though I had thought it 

The one I was talking about is described in http://www.cs.umass.edu/~emery/pubs/04-16.pdf (and at least one other paper at http://www.cs.umass.edu/~emery/pubs/). A quote from page 5: ----- 4. Kernel Support The Hippocratic collector improves garbage collection paging performance primarily by cooperating with the virtual memory manager. In this section, we describe our extensions to the Linux kernel that enable cooperative garbage collection. [...] ----- Of course, it is entirely possible that their patch or one with similar effects has been accepted in the kernel since then.
 Also, that particular GC would sometimes do an actual full collect 


 True.  It would be somewhat similar to a generational GC in some 

Yep, except instead of the objects in them having existed for a long time they haven't been touched in a while (which I suppose implies they've existed for all that time :) ). One of the other cool thing this collector does IIRC: it tries to move all objects out of a page that's scheduled to be swapped out if there's space in other (memory resident) pages. If that's successful it then tells the kernel to forget about saving the contents to disk since they won't be needed anymore.
Jul 27 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Frits van Bommel wrote:
 One of the other cool thing this collector does IIRC: it tries to move 
 all objects out of a page that's scheduled to be swapped out if there's 
 space in other (memory resident) pages. If that's successful it then 
 tells the kernel to forget about saving the contents to disk since they 
 won't be needed anymore.

The GC can be improved if it can cooperate with/use the VM hardware. I've thought for some time that GC ought to be an OS system service for that reason, rather than an app library.
Jul 27 2006
parent Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Frits van Bommel wrote:
 One of the other cool thing this collector does IIRC: it tries to move 
 all objects out of a page that's scheduled to be swapped out if 
 there's space in other (memory resident) pages. If that's successful 
 it then tells the kernel to forget about saving the contents to disk 
 since they won't be needed anymore.

The GC can be improved if it can cooperate with/use the VM hardware. I've thought for some time that GC ought to be an OS system service for that reason, rather than an app library.

Same here. Between thread scheduling and memory management, having the GC as an OS service seems a natural fit. I'll admit to being somewhat curious about whether MS tries this to improve .NET performance. Sean
Jul 27 2006
prev sibling parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Walter Bright wrote:

 that would be a terrible design if it did.

Then have a look what the code below compiled (dmd 0.163; win) is doing under XPSP2. 1) Start it with for example the command <exeName> 400 100 in as shell 2) wait until initializing is done 3) veriify in task manager, that allocated memory is about 400M 4) minimize shell 5) observe in task manager, that allocated memory reduces to 100M 6) Start it with for example the command <exeName> 400 100 -lg in as shell 7) wait until initializing is done 8) veriify in task manager, that allocated memory is about 400M 9) minimize shell 10) observe in task manager, that allocated memory does not redcue Note: -lg means leak and use GC Note: If main mmeory has 1GB, the call "<exeName> 1200 400 -lg" will cause thrashing. Note: If main memory has 1GB you can have several shells running the command "<exeName> 500 100" but not with the command "<exeName 500 100 -lg"
Jul 28 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Karen Lanrap wrote:
 Walter Bright wrote:
 
 that would be a terrible design if it did.

Then have a look what the code below compiled (dmd 0.163; win) is doing under XPSP2.

Assuming this is actually a problem then it's likely just with the allocator strategy, not with GC as a technique. For example, I have a small program to test multithreading in D and XP reports its memory use climbing steadily as it runs, even though it should remain roughly constant insofar as the code itself is concerned. I've wondered whether some leapfrogging is going on, but haven't bothered to look into it. Sean
Jul 28 2006
prev sibling next sibling parent BCS <BCS pathlink.com> writes:
Karen Lanrap wrote:
 Walter Bright wrote:
 
 
that would be a terrible design if it did.

Then have a look what the code below compiled (dmd 0.163; win) is doing under XPSP2.

 3) veriify in task manager, that allocated memory is about 400M
 4) minimize shell
 5) observe in task manager, that allocated memory reduces to 100M
 

I don't see how this demonstrates that a GC'd app will cause more thrashing than a non-GC'd app. First of all, is that 400/100M quote the actual Physical memory usage, or the virtual memory usage? Even if the program IS using more (virtual) RAM it won't inherently thrash the system. Thrashing is caused by /access/ to more memory than is available, not by having more memory /allocated/ than is available. Having lots of "lost" memory around is quite unlikely to cause thrashing because, In all likelihood, the extra ram is inaccessible due to lack of pointers to it. As such It won't get accessed and will never get swapped back in. An exception to this is if the GC maintains some sort of tagging of memory block that is located at the block it's self. Then some kinds of GC actions will cause the "lost" blocks to be read, and thus swapped in (and probably deallocated a short time later). The accessible memory on the other hand, _will_ get scanned by the GC on a full collect. This could cause some thrashing. However, unless the GC only frees up enough memory for each allocation (this would not only be a bad design, but a stupid one as well) most allocations won't result in any collection at all. As to swapping, the same logic applys to the OS's swapping algorithm. Typically OS's try to keep a pool of free memory available by swapping stuff out when it has the time. Most of the time, allocations and page faults don't require anything to be swapped out.
Jul 28 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Karen Lanrap wrote:
 Walter Bright wrote:
 
 that would be a terrible design if it did.

Then have a look what the code below compiled (dmd 0.163; win) is doing under XPSP2.

I think you omitted the code.
Jul 28 2006
parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Walter Bright wrote:

 I think you omitted the code.

Sorry, did not want to send that out with all those mistakes. Anyway, here ist the code: static this(){ writefln("This code demonstrates some behaviour of the GC"); writefln("Author: Karen Lanrap"); writefln("License: public domain"); } void usage(char[] name){ writefln("usage: %s alloc vital ([-lg] | [-l] [-g]) ", getBaseName(name)); writefln(" alloc: size of all allocated memory"); writefln(" vital: size of vital memory"); writefln(" -l: leak memory"); writefln(" -g: use GC"); } const uint chunksize = 1_000_000; const uint chainsize = 4; class Hold{ byte[chunksize / chainsize] data; Hold[chainsize] next; } Hold[] h; Hold strongConnect(){ for(int i=0; i< chainsize; i++) h[i] = new Hold; for(int i=0; i< chainsize; i++) for(int j = 0; j < chainsize; j++) h[i].next[j] = h[j]; return h[rnd(0,chainsize)]; } void main(char[][] args) { h.length = chainsize; if(args.length < 3) usage(args[0]); assert(args.length > 2, "needs sizes of all and vital data"); auto all = atoi(args[1]); auto vital = atoi(args[2]); assert(all>= vital, "vital data cannot be greater than all data"); args.length = 5; auto leak = args[3]== "-l" || args[4]== "-l"|| args[3]== "-lg"; auto useGC = args[3]== "-g" || args[4]== "-g"|| args[3]== "- lg"; if(!useGC) { std.gc.disable(); }; fwritef(stderr, "initializing... "); Hold[] arr; arr.length = all; for(int i = 1; i < all; i++) arr[i] = strongConnect(); fwritefln(stderr, "done."); do { fwritef(stderr, "["); for(int v = 0; v< vital; v++){ uint inx = rnd(all-vital, vital); if(!leak) { // delete the strongConnect in arr[inx] for(int i=0 ; i< chainsize; i++) if(arr[inx] != arr[inx].next[i]) delete arr[inx].next[i]; delete arr[inx]; } arr[inx] = strongConnect(); fwritef(stderr, "."); } fwritef(stderr, "]"); } while(true); } uint rnd(uint base, uint range) { return cast(uint)( base + (1.0*range*rand()) / (uint.max+1.0) ); } import std.gc, std.random, gcstats; import std.stdio, std.string, std.outofmemory, std.conv, std.path;
Jul 28 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Karen Lanrap wrote:
 Anyway, here ist the code:

I'm not sure what this code represents. ("vital" is not any memory allocation term I'm familiar with.) 1) Certainly, allocating huge numbers of megabyte arrays all pointing to each other is not at all normal use. 2) Memory is not going to be recycled if there exist pointers to it from other memory blocks that are in use. 3) D's GC doesn't return memory to the operating system, it keeps the "high water mark" allocated to the process. But this doesn't actually matter, since it is only consuming *virtual* address space if it is unused. Physical memory is only consumed if it is actually and actively referenced. 4) Most C malloc/free and C++ new/delete implementations don't return memory to the operating system after the free, either. 5) When you use -lg, what the code appears to do is allocate new memory blocks. But since the old blocks are *still actively pointed to*, they won't be released by the GC.
Jul 28 2006
parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Walter Bright wrote:

 5) When you use -lg, what the code appears to do is allocate new
 memory blocks. But since the old blocks are *still actively
 pointed to*, they won't be released by the GC.

They are not *actively pointed to* from any location and the GC is releasing them. Otherwise the program would acquire more and more memory as one can see with the "-l" option which enables only the leak but not the GC. There are strong connected components(scc) of about 1MB size. The only pointer to them is assigned to with a new scc, thereby insulating this scc to garbage, ready to be collected in case they are not deleted manually. If they are not deleted manualy and the GC is enabled and collects them, then the OS does not swap out the other data anymore. Why? That is exactly the scheme that was said not to happen. In case of "<exeName> <mem> 100" there are only 100MB of data touched but the OS holds all <mem> data. That is why I believe the GC touches all of the <mem> data in search for blocks to collect. If I am wrong, why is the OS disabled to swap out all untouched data, as soon as the GC is enabled?
Jul 28 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Karen Lanrap wrote:
 Walter Bright wrote:
 
 5) When you use -lg, what the code appears to do is allocate new
 memory blocks. But since the old blocks are *still actively
 pointed to*, they won't be released by the GC.

They are not *actively pointed to* from any location and the GC is releasing them. Otherwise the program would acquire more and more memory as one can see with the "-l" option which enables only the leak but not the GC. There are strong connected components(scc) of about 1MB size. The only pointer to them is assigned to with a new scc, thereby insulating this scc to garbage,

"insulating this scc to garbage" ??
 ready to be collected in case they 
 are not deleted manually.
 
 If they are not deleted manualy and the GC is enabled and collects 
 them, then the OS does not swap out the other data anymore. Why?

The statistics you mentioned do not contain any swapping information. Perhaps I'm not understanding what you mean by swapping.
 That is exactly the scheme that was said not to happen.

I'm having a very hard time understanding you.
 In case of "<exeName> <mem> 100" there are only 100MB of data touched 
 but the OS holds all <mem> data. That is why I believe the GC touches 
 all of the <mem> data in search for blocks to collect.

The GC scans the static data, the registers, and the stack for pointers. Any pointers in that to GC allocated data is called the 'root set'. Any GC allocated data that is pointed to by the 'root set' is also scanned for pointers to GC allocated data, recursively, until there are no more memory blocks to scan. Any GC allocated data that is not so pointed to is *not* scanned, and is added to the available pool of memory. (They are *not* returned to the operating system, thus the 'high water mark' I mentioned previously.) It does not scan all the memory. This is not a matter of belief, you can check the code yourself (it comes with Phobos), and you can turn on various logging features of it. I suggest doing that, I think you'll find it very interesting.
 If I am wrong, why is the OS disabled to swap out all untouched data, 
 as soon as the GC is enabled?  

I think you have a very different idea of what the word "swapping" means in regards to virtual memory and GC than I do. For one thing, nothing at all in your program disables OS swapping. I don't think it is even possible to disable it - it's a very low level service. You're obviously very interested in GC - why not pick up the book on it referenced in www.digitalmars.com/bibilography.html? It's a very good read.
Jul 28 2006
parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Walter Bright wrote:

 The GC scans the static data, the registers, and the stack for
 pointers. Any pointers in that to GC allocated data is called
 the 'root set'. Any GC allocated data that is pointed to by the
 'root set' is also scanned for pointers to GC allocated data,
 recursively, until there are no more memory blocks to scan.

I believed that the GC is working somehow that way. Therefore I raised the question as a "general problem". I repeat your words:
 until there are no more memory blocks to scan.

And this scanning seems to prevent the OS to "page out" the data hold in the first <mem> - <vital> locations in my code---I finally understand that he who told me, that there is no difference bewteen "swapping" and "paging", was wrong.
 why not pick up the book

No, thanks. I am not interested in theoretical foundations. I observed a nasty behaviour of the code I wrote for the contest mentioned in <e9p2qg$18j8$1 digitaldaemon.com>: heavily paging. After detecting and eliminating the memory leak, which lowered the "high water mark" from 2.2GB to 1.8GB (my machine holds 2GB of main memory), that paging was gone. I was puzzled why this could happen although a GC is used. Thanks for all the patience.
Jul 28 2006
next sibling parent Walter Bright <newshound digitalmars.com> writes:
Karen Lanrap wrote:
 Walter Bright wrote:
 
 The GC scans the static data, the registers, and the stack for
 pointers. Any pointers in that to GC allocated data is called
 the 'root set'. Any GC allocated data that is pointed to by the
 'root set' is also scanned for pointers to GC allocated data,
 recursively, until there are no more memory blocks to scan.

I believed that the GC is working somehow that way.

GC's do not "work somehow". They work exactly the way they are programmed to.
 Therefore I 
 raised the question as a "general problem". I repeat your words:
 until there are no more memory blocks to scan.


The only memory blocks scanned are those that have pointers to them. Nothing else.
 
 And this scanning seems to prevent the OS to "page out" the data hold 
 in the first <mem> - <vital> locations in my code---I finally 
 understand that he who told me, that there is no difference bewteen 
 "swapping" and "paging", was wrong.
 
 why not pick up the book

No, thanks. I am not interested in theoretical foundations.

It's not very fruitful for me to try helping you understand GC if you aren't interested in doing a little homework. The GC book is a lot more than theoretical mumbo-jumbo. It is well worth the effort to pick up and look at. I guarantee you'll be able to write much more effective programs that use GC if you understand it. At at a minimum, at least you and I will be using the same language. We'll have the same understanding of what "virtual" vs "physical" memory is, what "swapping" is, and that "vital" has no meaning for this topic.
 I observed a nasty behaviour of the code I wrote for the contest 
 mentioned in <e9p2qg$18j8$1 digitaldaemon.com>: heavily paging.
 
 After detecting and eliminating the memory leak, which lowered the 
 "high water mark" from 2.2GB to 1.8GB (my machine holds 2GB of main 
 memory), that paging was gone.
 
 I was puzzled why this could happen although a GC is used.

I can't help you when you say you're not interested in learning about GC.
Jul 29 2006
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
Karen Lanrap wrote:
 Walter Bright wrote:
 
 why not pick up the book

No, thanks. I am not interested in theoretical foundations.

No offense, but if you intend to criticize GC as a general technique, then it might be useful to do so from an informed perspective.
 I observed a nasty behaviour of the code I wrote for the contest 
 mentioned in <e9p2qg$18j8$1 digitaldaemon.com>: heavily paging.
 
 After detecting and eliminating the memory leak, which lowered the 
 "high water mark" from 2.2GB to 1.8GB (my machine holds 2GB of main 
 memory), that paging was gone.
 
 I was puzzled why this could happen although a GC is used.

Perhaps it would help to look at the GC code? Sean
Jul 29 2006
prev sibling parent reply "Lionello Lunesu" <lio lunesu.remove.com> writes:
"Karen Lanrap" <karen digitaldaemon.com> wrote in message 
news:Xns980F583A91B2Fdigitaldaemoncom 63.105.9.61...
 I observed a nasty behaviour of the code I wrote for the contest
 mentioned in <e9p2qg$18j8$1 digitaldaemon.com>: heavily paging.

 After detecting and eliminating the memory leak, which lowered the
 "high water mark" from 2.2GB to 1.8GB (my machine holds 2GB of main
 memory), that paging was gone.

 I was puzzled why this could happen although a GC is used.

I'm having the similar problems with the UM implementation. It runs slower and slower, memory use keeps growing and the scan keep taking longer and longer.. L.
Jul 31 2006
parent Karen Lanrap <karen digitaldaemon.com> writes:
Lionello Lunesu wrote:

 the scan keep taking longer and longer..

At least one who admits, that there might be a problem. That's a funny community here. They seems to dislike theory and prefer coded examples. But when they have coded examples they pretend to not be able to understand them fully and do not report about their results. Instead they start to nitpick on words, the practical and theoretical background of the contributor, declare the example as non standard---and point to some theory of implementations of GC's. The fact stays unexplored, that if there is no general problem with GC's the implementation of the GC must be at fault.
Aug 01 2006
prev sibling next sibling parent Dave <Dave_member pathlink.com> writes:
Karen Lanrap wrote:
 I see three problems:
 
 1) The typical behaviour of a GC'ed application is to require more 
 and more main memory but not to need it. Hence every GC'ed 
 application forces the OS to diminish the size of the system cache 
 held in main memory until the GC of the application kicks in.
 

That's a pretty big assertion - have you some evidence, particularly w.r.t. mark/sweep which the current D implementation uses? Have you looked at the current implementation to see if your assertion jives? Have you (well) written D applications that show this to be the case? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&calc=Calculate&xfullcpu=1&xmem=1 Yes these are trivial benchmarks, but they were not written to be especially memory efficient (more concerned with wall clock performance although of course they are often one in the same): D does pretty well there and all of the tests use the GC. Plus, with some form of well implemented moving/compacting collector D would even do better with regard to both speed and space. The reason the implementation is important is because now when D requests more memory from the GC, the GC will first try to satisfy that request by collecting. It's not a real "greedy" allocator. So, unless a program is for the length of the program holding onto a lot of memory (e.g.: using a lot of global reference vars) then what you describe should not really happen that much more often than by using the crt to manage memory.
 2) If the available main memory is unsufficient for the true memory 
 requirements of the application and the OS provides virtual memory 
 by swapping out to secondary storage, every run of the GC forces 
 the OS to slowly swap back all data for this application from 
 secondary storage and runs of the GC occur frequently, because main 
 memory is tight.
 

This will and does happen, but it's no different than if you cache a series of large files (or whatever) in memory using space allocated by malloc and then repeatedly access widely different offsets of those chunks of memory. Actually in that case if you'd used the GC you may be _less_ likely to repeatedly swap. In other words, if you're judicious with your memory usage w/ either the GC or crt you may be as likely to run into the problem with either.
 3) If there is more than one GC'ed application running, those 
 applications compete for the available main memory.
 

Yes but if assertion #1 is not necessarily true, then malloc/free is no magic bullet here over a GC either (obviously).
 
 I see four risks:
 
 a) from 1: The overall reaction of the system gets slower in favor 
 for the GC'ed application.
 
 b) from 2: Projects decomposed into several subtasks may face 
 severe runtime problems when integrating the independent and 
 succesful tested modules.
 
 c) from 2 and b: The reduction of man time in the development and 
 maintenance phases for not being forced to avoid memory leaks may 
 be overly compensated by an increase of machine time by a factor of 
 50 or more.
 
 d) from 1 and 3: A more complicated version of the dining 
 philosophers problem is introduced. In this version every 
 philosopher is allowed to rush around the table and grab all unused 
 forks and declare them used, before he starts to eat---and nobody 
 can force him to put them back on the table.
 
 
 Conclusion:
 
 I know that solving the original dining philosophers problem took 
 several years and I do not see any awareness towards this more 
 complicated version arising by using a GC.
 Risks c) and d) are true killers.
 Therefore GC'ed applications currently seem to be suitable only if 
 they are running single instance on a machine well equipped with 
 main memory and no other GC'ed applications are used.
 To assure that these conditions hold, the GC should maintain 
 statistics on the duration of its runs and frequency of calls. This 
 would allows the GC to throw an "Almost out of memory".

Jul 24 2006
prev sibling next sibling parent renox <renosky free.fr> writes:
Karen Lanrap wrote:

 I see three problems:
 
 1) The typical behaviour of a GC'ed application is to require more 
 and more main memory but not to need it. 

I think you're going overboard here, but it's nonetheless true that there are conflicting needs: the less GC pass there are, the more efficient is your application running (for non-interactive application of course, otherwise 'pause time' is a problem) .. until you run out of memory and either swap starts to be a big problem or disk access efficiency is reduced. But there is research to make the OS and the GC cooperates, for example: http://www.cs.umass.edu/~emery/pubs/04-16.pdf Now the problem is obviously that the OS won't implement this until a GC needs it and vice versa, with Linux this could be probably done (that's what the researchers used), for Windows, forget it (unless .Net implement this of course and even in this case, it's not obvious that it would be an 'open' API). Regards, RenoX
Jul 24 2006
prev sibling next sibling parent reply Tommie Gannert <tomime gannert.se> writes:
Karen Lanrap wrote:
 I see three problems:
 
 1) The typical behaviour of a GC'ed application is to require more 
 and more main memory but not to need it. Hence every GC'ed 
 application forces the OS to diminish the size of the system cache 
 held in main memory until the GC of the application kicks in.
 
 2) If the available main memory is unsufficient for the true memory 
 requirements of the application and the OS provides virtual memory 
 by swapping out to secondary storage, every run of the GC forces 
 the OS to slowly swap back all data for this application from 
 secondary storage and runs of the GC occur frequently, because main 
 memory is tight.
 
 3) If there is more than one GC'ed application running, those 
 applications compete for the available main memory.
 

Just a philosophical thought: Perhaps we should look at the GC as a RAD tool for initial development, and that the goal is to replace it with manual memory management (M3?). Then you could do it piece by piece, this might be attractive to two types of coders. Corporate coders in the need of quick deliveries, but without much performance issues (because they can always tell the customer to buy new hardware...) Performance coders doing alpha blending a billion times per second. Their first priority would be to use the GC as little as possible. To aid in M3, some way of tracking allocations would be needed. Maybe a program like coverage or profiling, where you could see which allocations are (most often) freed by the GC and which are manually deallocated. This would probably appeal to a third kind of people: Academical coders who wants everything to be proven, but does not feel they have the time to write deallocating calls as priority one. But they are driven by the urge, or feeling, that beautiful code is code that does not depend on GC and does not leak memory. </philosophy>
Jul 26 2006
parent reply Karen Lanrap <karen digitaldaemon.com> writes:
Tommie Gannert wrote:
 Perhaps we should look at the GC as a RAD tool for initial
 development

Yes, but then the GC has to be disabled by default for the release versions---and for non-releases there has to be a runtime error message, if a sweep starts but the gc is not enabled explicitely.
Jul 27 2006
parent Tommie Gannert <tomime gannert.se> writes:
Karen Lanrap wrote:
 Tommie Gannert wrote:
 Perhaps we should look at the GC as a RAD tool for initial
 development

Yes, but then the GC has to be disabled by default for the release versions---and for non-releases there has to be a runtime error message, if a sweep starts but the gc is not enabled explicitely.

That wasn't my intention. More like "should work towards removing the GC'd objects", but not enforcing it. THe corporate guys won't mind the GC. They'll mind the extra two days it takes to fix deletes everywhere. The message should be available (after GC) including what objects were collected (the profiling tool mentioned later in my post). /T
Jul 27 2006
prev sibling parent Walter Bright <newshound digitalmars.com> writes:
Karen Lanrap wrote:
 I see three problems:
 
 1) The typical behaviour of a GC'ed application is to require more 
 and more main memory but not to need it.

No, that is not at all how GC works. The whole idea behind GC is to "collect" unneeded memory so it can be reused.
 Hence every GC'ed 
 application forces the OS to diminish the size of the system cache 
 held in main memory until the GC of the application kicks in.

This has nothing to do with the system cache. All the system cache is is the most recently accessed memory. Memory that is allocated, but not referenced, is flushed from the system cache.
 2) If the available main memory is unsufficient for the true memory 
 requirements of the application and the OS provides virtual memory 
 by swapping out to secondary storage, every run of the GC forces 
 the OS to slowly swap back all data for this application from 
 secondary storage

What you're describing is called 'thrashing', and happens on any system where the sum of the applications running regularly access more memory than exists on the system, regardless of what kind of memory management system is used.
 and runs of the GC occur frequently, because main
 memory is tight.

This is incorrect, as GC keys off of virtual memory available, not main memory available.
 3) If there is more than one GC'ed application running, those 
 applications compete for the available main memory.

No, they compete for virtual memory. The most recently accessed pages (4k resolution) are swapped into main memory.
 I see four risks:
 
 a) from 1: The overall reaction of the system gets slower in favor 
 for the GC'ed application.

No - GC uses *virtual* address space, it doesn't use any more *physical* memory than any other app would. Remember, physical memory only gets actually used if the memory is referenced. Unused virtual memory, even if allocated, is not put in physical memory.
 
 b) from 2: Projects decomposed into several subtasks may face 
 severe runtime problems when integrating the independent and 
 succesful tested modules.

I seriously doubt that, unless the subtasks all want all of memory, which seems like an extreme, highly unusual case.
 c) from 2 and b: The reduction of man time in the development and 
 maintenance phases for not being forced to avoid memory leaks may 
 be overly compensated by an increase of machine time by a factor of 
 50 or more.

GC apps often run faster than the equivalent explicitly managed apps. Why? Because: 1) GC apps need to do far less allocation 2) GC apps don't consume memory needed for <shared_ptr> or equivalent memory management objects 3) GC apps don't need to spend time doing synchronization of reference counts
 Therefore GC'ed applications currently seem to be suitable only if 
 they are running single instance on a machine well equipped with 
 main memory and no other GC'ed applications are used.

GC is quite mainstream now, and the technology has progressed far beyond such a primitive state. I believe your concerns are unfounded. I suggest the book "Garbage Collection" you can get from www.digitalmars.com/bibliography.html.
Jul 27 2006