digitalmars.D - Should D focus on pointer memory?

Manfred Nowak (30/30) Dec 05 2005 In the last year of my pre master time at university one of our

Oskar Linde (27/40) Dec 05 2005 Why should every language with a GC have to manage virtual memory?

Sean Kelly (5/10) Dec 05 2005 Agreed. Windows has support for ~34 bits of addressable space using
Manfred Nowak (58/68) Dec 05 2005 I understand the work of the GC in that it has to check every

Oskar Linde (53/102) Dec 05 2005 [The following is a convincing argument showing that a GC unable to

Sean Kelly (24/49) Dec 05 2005 I think the common technique is to maintain a list of memory blocks and

Oskar Linde (17/72) Dec 06 2005 Yes. I was commenting on Manfreds suggestion that a GC could keep track

Sean Kelly (25/63) Dec 06 2005 I just read a paper on this today. The technique they describe is to

Manfred Nowak (14/15) Dec 08 2005 [...]

Sean Kelly (14/32) Dec 08 2005 Possibly, yes. As D is a systems language, D lacks a great deal of

Manfred Nowak <svv1999 hotmail.com> writes:

In the last year of my pre master time at university one of our 
Ph.D.'s told me about an algorithm he had adapted to work on a 
pointer machine.

I did not know what he was talking about and asked back.

"A machine without arrays." was the reply.

After his reclaim  not to be teasing I recognized, that in fact the 
main memory is just a special case of all memory vailable.

The special case is represented by the ability to have arrays, i.e. 
in essence the ability to implement for a given natural number n a 
function f from the range [ 1 .. n] to  e1, e2 ... en, with f(i)=ei
and retrieval in constant time, unaffected from the value of n.

The fact that there exists an upper bound N for the value of n is 
usualy left aside and therefore the fact that computers in general 
must handle memory uncapable of having arrays: secondary storage, 
nearline storage, offline storage or pointer memory for short.

Note that virtual memory is another pointer memory with a special 
case.

In the thread starting with
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30773
I made up the suggestion, that every language with a GC must also do 
the management for the virtual memory. This is not disagreed upon. 
Instead the hint was given, that some GC interact with the VMM of the 
OS, which seems wrong to me.

A quick question to a favorite search engine gave this link, which 
shows that some are already aware of the arising problems: 
http://www.onlamp.com/pub/a/onlamp/2005/10/27/memory-management.html

Will D be capable of creating an "array" consisting of 2^34 elements 
on a 32-bit machine and as a special case "plain arrays", that fit 
into the real meory?

-manfred

Dec 05 2005

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Manfred Nowak wrote:

 In the thread starting with
 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30773
 I made up the suggestion, that every language with a GC must also do 
 the management for the virtual memory. This is not disagreed upon. 
 Instead the hint was given, that some GC interact with the VMM of the 
 OS, which seems wrong to me.

Why should every language with a GC have to manage virtual memory? 
Wasn't virtual memory introduced in operating systems just to add an 
abstraction that would allow programs to ignore this issue? For 
instance, there is no OutOfMemory error being thrown when the physical 
memory runs out. To the program, all there is is virtual memory. The 
physical memory is nothing more than a cache.

Of course, a GC that would be aware of the VMM would be able to perform 
better when memory is tight. What is wrong with the GC using this 
possibility?

A GC in the OS would IMHO be the ideal case, but this may not be 
preferred for programs with very specific needs. But in the same way, 
virtual memory is a trade of. Simplicty vs specific needs of specific 
programs.

 A quick question to a favorite search engine gave this link, which 
 shows that some are already aware of the arising problems: 
 http://www.onlamp.com/pub/a/onlamp/2005/10/27/memory-management.html
 
 Will D be capable of creating an "array" consisting of 2^34 elements 
 on a 32-bit machine and as a special case "plain arrays", that fit 
 into the real meory?

In general, a 32 bit platform is only able to index 32 bits of memory. 
This is more or less the definition. You could of course make a very 
specific implementation on a 32 bit platform that had a larger address 
range, but I would say this belongs in the OS.

On a 64 bit machine on the other hand, there is enough space to index 
the entire available memory within the same address space (Hard disk, 
physical memory, virtual memory, etc...) and at the same time probably 
also give each process its own memory range and more. This would be 
great for databases, persistent objects, etc. But as I said, I believe 
this belongs in the OS.

The solution is to switch to 64 bit platforms. :)

Regards,

/Oskar

Dec 05 2005

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 
 In general, a 32 bit platform is only able to index 32 bits of memory. 
 This is more or less the definition. You could of course make a very 
 specific implementation on a 32 bit platform that had a larger address 
 range, but I would say this belongs in the OS.

Agreed.  Windows has support for ~34 bits of addressable space using 
some fancy tricks, but I don't expect them to be very popular now that 
Win64 is out.


Sean

Dec 05 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Oskar Linde wrote:

[...]
 Why should every language with a GC have to manage virtual
 memory?

I understand the work of the GC in that it has to check every 
possible anchor for still pointing to the peace of memory which is 
a candiate for a release. If the anchor is swapped out by the OS 
without any information to the GC, the OS must swap the content of 
the anchor back. In the worst case all virtual memory must be 
swapped back which will result in an overall stop of action.

On the other hand if the VMM informs the GCC of swapping out and 
back, the GC can keep track of the anchors currently swapped out, 
eliminating the need to swap them back.

This information might be passed to the GC by a standardized 
interface, otherwise the GC is bound to the specific implementation 
of the VMM.

But even if there is a standardized interface, the VMM can only 
have decision rules for swapping out that are heuristics because 
the VMM does not know anything about the algorithms running.

This knowledge is a propriety of the implementator partly 
discovered by the compiler.

Therefore only the implementator can efficiently override the 
heuristics of the VMM.

Again this information might be given through a standard interface 
to the VMM, but what is left for the VMM then?

In addition, if there is anything left up to the VMM the 
implementator must acquire knowledge of the heuristics of every VMM 
availabe in order to be able to override them. But the future is 
unforseeable ;-)

In total this is a proof by contradiction. The assumption that 
there is an independent VMM rises the need to totally control it 
and therefore making it dependent.

... and if it is dependent it belongs to the language having the 
GC.


 Wasn't virtual memory introduced in operating systems
 just to add an abstraction that would allow programs to ignore
 this issue?

Maybe that machines capable of virtual memory are selled with such 
argument, but who wants to hear of any losses grounded on the fact 
that somebody trusted in an abstraction?


 The physical memory is nothing more than a cache.

Very true and the cause why I started this thread. Nobody writes 
programs in order to be executed in the L1- or L2-cache of a 
processor. Why should one focus on programs manipulating the cache 
of the virtual memory?
 
 What is wrong with the GC using this possibility?

Nothing is wrong with GC's working together with the VMM as I said 
above. But VMM's doing so eliminate their own usefulness. 
 

 A GC in the OS would IMHO be the ideal case, but this may not be
 preferred for programs with very specific needs.

Now you are at the critical point:

1) if GC's are best placed in the OS then

1a) D, as a language having an own GC, does not fit under that 
OS---or you are able to show, that more than one GC is of any 
benefit, thereby disqualifying D for only having one GC, or

1b) D is a special language suitable only for building OSs, thereby 
disqualifying D to be a general purpose language and at the same 
time rising the need for a VMM in the language, because also VMM's 
are under your assumption best placed in an OS.

Therefore, also the assumption that GC's are best placed in OS's 
leads to the need for D to have also a VMM.

[...]
 The solution is to switch to 64 bit platforms. :)

... and then ignore every upcoming problem until every machine has 
a PB of RAM installed ;-)

-manfred

Dec 05 2005

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Manfred Nowak wrote:
 Oskar Linde wrote:
 
 [...]
 
Why should every language with a GC have to manage virtual
memory?


[The following is a convincing argument showing that a GC unable to 
cooperate with the VMM will perform poorly for programs using lots of 
memory.]

 
 I understand the work of the GC in that it has to check every 
 possible anchor for still pointing to the peace of memory which is 
 a candiate for a release. If the anchor is swapped out by the OS 
 without any information to the GC, the OS must swap the content of 
 the anchor back. In the worst case all virtual memory must be 
 swapped back which will result in an overall stop of action.
 
 On the other hand if the VMM informs the GCC of swapping out and 
 back, the GC can keep track of the anchors currently swapped out, 
 eliminating the need to swap them back.

All right. Assume we have a mark-and-sweep style GC. If the GC could get 
information about a page beeing swapped out or not and also be notified 
whenever a page is about to be swapped out and when a page is swapped 
in, it could, by adding a counter to each gc-region, keep a ref-count of 
swapped out references to each gc region. Those ref-counts would 
probably have to be stored in a never swapped out area of ram. (This 
would transform the GC into a semi-ref-counting GC. I see no other way 
of "keeping track of the ancors currently swapped out".)

There are still potential problems. Allocating many small objects would 
mean that the ref counters in non-swappable memory would in worst case 
take up to half the amount of virtual memory, unless counters referred 
to larger chunks of memory. If counters did that, the GC would be unable 
to free many areas because they were potentially referred to by 
something in swapped out memory. This could lead to fragmentation. At 
some point, the GC would probably have to look at the swapped out memory 
anyway.

When does a GC currently need to sweep the memory?

1. When the program runs out of virtual memory.

2. When requested by the user. (The user would expect this to be a 
potentially expensive operation and only be done at strategic moments.)

3. Potentially: when told to by the OS.

The solution very simple: Increase the size of the virtual memory. This 
makes (1) happen more seldom. Not referenced regions that a GC would 
normally free would only occupy disk-space. They would not be swapped in 
unless the GC got run, at which point, they would be freed. Such memory 
would only be swapped out-in once.

If your program has a working set of memory that occupies less than the 
by other programs free physical memory, its performance would only 
suffer at the few moments the GC sweeps the memory. Those sweeps would 
only happen when you run out of virtual memory. This can be made 
(almost) arbitrarily seldom by increasing the size of virtual memory.

If, on the other hand, your program has a working set of memory larger 
than the free physical memory, its performance would suffer with or 
without a GC.


The physical memory is nothing more than a cache.

 
 
 Very true and the cause why I started this thread. Nobody writes 
 programs in order to be executed in the L1- or L2-cache of a 
 processor. Why should one focus on programs manipulating the cache 
 of the virtual memory?

One often tries to optimize code so that the current working set of 
memory fits within the L1- or L2-caches. This can give dramatic 
speed-ups. In a similar way, one should write programs that uses a 
working set of memory that fits within the free physical memory. The 
performance will suffer severely otherwise.

A GC in the OS would IMHO be the ideal case, but this may not be
preferred for programs with very specific needs.

 
 Now you are at the critical point:
 
 1) if GC's are best placed in the OS then
 
 1a) D, as a language having an own GC, does not fit under that 
 OS---or you are able to show, that more than one GC is of any 
 benefit, thereby disqualifying D for only having one GC, or
 
 1b) D is a special language suitable only for building OSs, thereby 
 disqualifying D to be a general purpose language and at the same 
 time rising the need for a VMM in the language, because also VMM's 
 are under your assumption best placed in an OS.

or

1c) D would fit perfectly in such an OS by utilizing the OS built in GC.

 Therefore, also the assumption that GC's are best placed in OS's 
 leads to the need for D to have also a VMM.

I am unable to follow this conclusion.

 
 [...]
 
The solution is to switch to 64 bit platforms. :)

 
 
 ... and then ignore every upcoming problem until every machine has 
 a PB of RAM installed ;-)

Which by the bastardized version of Moore's law would take at least 40 
years. And at that point, I guess the solution is obvious for the next 
104 years? ;)

(Storing one bit per proton would mean the entire storage capacity of 
Jupiter would be byte-indexable on a 177 bit computer. All 
miscalculations reserved.)

/Oskar

Dec 05 2005

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 
 All right. Assume we have a mark-and-sweep style GC. If the GC could get 
 information about a page beeing swapped out or not and also be notified 
 whenever a page is about to be swapped out and when a page is swapped 
 in, it could, by adding a counter to each gc-region, keep a ref-count of 
 swapped out references to each gc region. Those ref-counts would 
 probably have to be stored in a never swapped out area of ram. (This 
 would transform the GC into a semi-ref-counting GC. I see no other way 
 of "keeping track of the ancors currently swapped out".)

I think the common technique is to maintain a list of memory blocks and 
to store meta information about those blocks separately.  The only thing 
you need for paging is a flag that specifies whether a given block is in 
RAM or not.

 There are still potential problems. Allocating many small objects would 
 mean that the ref counters in non-swappable memory would in worst case 
 take up to half the amount of virtual memory, unless counters referred 
 to larger chunks of memory. If counters did that, the GC would be unable 
 to free many areas because they were potentially referred to by 
 something in swapped out memory. This could lead to fragmentation. At 
 some point, the GC would probably have to look at the swapped out memory 
 anyway.

See above.  You don't need a counter per object, just a flag per block. 
  And since block size is typically equal to page size, there's little 
risk of tracking this information using tons of memory.

 When does a GC currently need to sweep the memory?
 
 1. When the program runs out of virtual memory.
 
 2. When requested by the user. (The user would expect this to be a 
 potentially expensive operation and only be done at strategic moments.)
 
 3. Potentially: when told to by the OS.

4. At idle times.

I've considered calling incremental sweeps from within Thread.sleep() if 
the thread is supposed to sleep for a sufficient time period.  Beyond 
that, it's not uncommon to run a low-priority GC thread to sweep in the 
background or to do mini sweeps during each allocation call.  The tough 
part is making this all concurrent so threads avoid blocking one 
another, and still manage to clean up memory in a timely manner.  I 
still have to do some research to find out how this is addressed in 
modern GCs.  Plain old memory allocators seems pretty straightforward 
(Hoard, for instance), but the GC run throws a wrench in the gears.

This is my current area of research with Ares, so I'll have more to say 
once I do a bit more reading (GC theory is still a relatively new topic 
for me).  Next papers on the stack are "garbage collection without 
paging" and "a generational mostly-concurrent garbage collector."  Fun 
stuff!


Sean

Dec 05 2005

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Sean Kelly wrote:
 Oskar Linde wrote:
 
 All right. Assume we have a mark-and-sweep style GC. If the GC could 
 get information about a page beeing swapped out or not and also be 
 notified whenever a page is about to be swapped out and when a page is 
 swapped in, it could, by adding a counter to each gc-region, keep a 
 ref-count of swapped out references to each gc region. Those 
 ref-counts would probably have to be stored in a never swapped out 
 area of ram. (This would transform the GC into a semi-ref-counting GC. 
 I see no other way of "keeping track of the ancors currently swapped 
 out".)

 
 I think the common technique is to maintain a list of memory blocks and 
 to store meta information about those blocks separately.  The only thing 
 you need for paging is a flag that specifies whether a given block is in 
 RAM or not.

Yes. I was commenting on Manfreds suggestion that a GC could keep track 
of all references held by swapped out memory blocks, so the GC would not 
have to sweep swapped out areas. The only realistic solution I can think 
of is some kind of ref-counting. I need to read up on how VM aware GCs 
handle this.

 There are still potential problems. Allocating many small objects 
 would mean that the ref counters in non-swappable memory would in 
 worst case take up to half the amount of virtual memory, unless 
 counters referred to larger chunks of memory. If counters did that, 
 the GC would be unable to free many areas because they were 
 potentially referred to by something in swapped out memory. This could 
 lead to fragmentation. At some point, the GC would probably have to 
 look at the swapped out memory anyway.

 
 See above.  You don't need a counter per object, just a flag per block. 
  And since block size is typically equal to page size, there's little 
 risk of tracking this information using tons of memory.

Yes. Even a counter per block would not use lots of memory. But if the 
block size is equal to the page size, a GC could potentially waste lots 
of memory? (Once again, I need to do some reading.)

 When does a GC currently need to sweep the memory?

 1. When the program runs out of virtual memory.

 2. When requested by the user. (The user would expect this to be a 
 potentially expensive operation and only be done at strategic moments.)

 3. Potentially: when told to by the OS.

 
 
 4. At idle times.
 
 I've considered calling incremental sweeps from within Thread.sleep() if 
 the thread is supposed to sleep for a sufficient time period.  Beyond 
 that, it's not uncommon to run a low-priority GC thread to sweep in the 
 background or to do mini sweeps during each allocation call.  The tough 
 part is making this all concurrent so threads avoid blocking one 
 another, and still manage to clean up memory in a timely manner.  I 
 still have to do some research to find out how this is addressed in 
 modern GCs.  Plain old memory allocators seems pretty straightforward 
 (Hoard, for instance), but the GC run throws a wrench in the gears.

 This is my current area of research with Ares, so I'll have more to say 
 once I do a bit more reading (GC theory is still a relatively new topic 
 for me).  Next papers on the stack are "garbage collection without 
 paging" and "a generational mostly-concurrent garbage collector."  Fun 
 stuff!

I just skimmed through "garbage collection without paging", an 
interesting paper. I will try to read it fully when I get some time.
It looks like they use a ref-counting method similar to above (the 
Bookmark counter) but not for each object.

Interestingly, they needed to hack the kernel in linux to be able to 
communicate with the VMM. I guess this shows that most current GCs are 
not very VM aware.

/Oskar

Dec 06 2005

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 Sean Kelly wrote:
 Oskar Linde wrote:

 All right. Assume we have a mark-and-sweep style GC. If the GC could 
 get information about a page beeing swapped out or not and also be 
 notified whenever a page is about to be swapped out and when a page 
 is swapped in, it could, by adding a counter to each gc-region, keep 
 a ref-count of swapped out references to each gc region. Those 
 ref-counts would probably have to be stored in a never swapped out 
 area of ram. (This would transform the GC into a semi-ref-counting 
 GC. I see no other way of "keeping track of the ancors currently 
 swapped out".)

 I think the common technique is to maintain a list of memory blocks 
 and to store meta information about those blocks separately.  The only 
 thing you need for paging is a flag that specifies whether a given 
 block is in RAM or not.

 
 Yes. I was commenting on Manfreds suggestion that a GC could keep track 
 of all references held by swapped out memory blocks, so the GC would not 
 have to sweep swapped out areas. The only realistic solution I can think 
 of is some kind of ref-counting. I need to read up on how VM aware GCs 
 handle this.

I just read a paper on this today.  The technique they describe is to 
"bookmark" all objects that are referenced by a page being swapped to 
disk.  This is done by setting a bit in the object metadata.  And while 
this suggests that some dead objects may remain uncollected (because 
there is no way to tell whether the stuff in the swapped out page is 
alive when tracking references), it doesn't seem to be much of a problem 
from their testing.  Performance for this mechanism is amazing, though 
the GC design is fairly complex and requires a small kernel patch on 
Linux to allow the GC to touch pages on their way out without confusing 
the VMM about whether the page should be swapped or not.  But they've 
said they're trying to get the feature into the Linux kernel as an 
official feature, so this may no longer be an issue.

 See above.  You don't need a counter per object, just a flag per 
 block.  And since block size is typically equal to page size, there's 
 little risk of tracking this information using tons of memory.

 
 Yes. Even a counter per block would not use lots of memory. But if the 
 block size is equal to the page size, a GC could potentially waste lots 
 of memory? (Once again, I need to do some reading.)

The allocators I've seen set aside different superblocks for different 
sized allocations in an attempt to reduce internal fragmentation.  And 
the minimum allocation size is equivalent to a cache line (64 bytes on 
most systems from what I understand--I need to read the Intel/AMD tech 
docs to make sure).  This also serves to avoid false sharing at the 
slight expense of wasted memory.

 I just skimmed through "garbage collection without paging", an 
 interesting paper. I will try to read it fully when I get some time.
 It looks like they use a ref-counting method similar to above (the 
 Bookmark counter) but not for each object.

Yup.  That's the one I read today.

 Interestingly, they needed to hack the kernel in linux to be able to 
 communicate with the VMM. I guess this shows that most current GCs are 
 not very VM aware.

A few of the newest ones are, but most are simply generational, as 
that's a good general approach to reduce paging.  The bookmarking idea 
seems far more robust IMO, though the need for a kernel patch in Linux 
implies it's not terribly portable.


Sean

Dec 06 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Oskar Linde wrote:

[...]
 Assume we have a mark-and-sweep style GC.

[...]

Isn't the current D-GC unable to distinguish between anchors and 
normal data? I remember that one of the reasons all data is 
initialized, is that otherwise especially in the data sections of 
arrays might be patterns to be interpreted as pointers to release 
candidates. Then preventing the release.

This would mean, that the current GC must inspect all memory, 
regardless of beeing swapped out or not.

In turn this would be a reason for the massive swapping I observed: 
the program was doing nearly nothing because the GC was constantly 
inspecting all memory, in vain searching for memory to release?

-manfred

Dec 08 2005

Sean Kelly <sean f4.ca> writes:

Manfred Nowak wrote:
 Oskar Linde wrote:
 
 [...]
 Assume we have a mark-and-sweep style GC.

 [...]
 
 Isn't the current D-GC unable to distinguish between anchors and 
 normal data? I remember that one of the reasons all data is 
 initialized, is that otherwise especially in the data sections of 
 arrays might be patterns to be interpreted as pointers to release 
 candidates. Then preventing the release.
 
 This would mean, that the current GC must inspect all memory, 
 regardless of beeing swapped out or not.
 
 In turn this would be a reason for the massive swapping I observed: 
 the program was doing nearly nothing because the GC was constantly 
 inspecting all memory, in vain searching for memory to release?

Possibly, yes.  As D is a systems language, D lacks a great deal of 
language support for GC that you might find in a language like Java.  I 
think things will improve once we get better type info, but it's likely 
that the best case will still be a conservative collector "with hints." 
  Thus there will always be blocks that the GC must scan simply because 
they *may* contain pointer data.  Here, I'm thinking of anything added 
to the GC via addRange, allocated via gc.malloc, etc.  I have a feeling 
that a traditional moving GC will never work with D, though perhaps a 
partially moving GC would.  That said, I don't see this as much of an 
issue, as I've seen examples where moving GCs perform quite badly.  And 
there are in-place generational GCs available as well (Boehm et al. 
wrote one for C/C++).


Sean

Dec 08 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Should D focus on pointer memory?