www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should D focus on pointer memory?

reply Manfred Nowak <svv1999 hotmail.com> writes:
In the last year of my pre master time at university one of our 
Ph.D.'s told me about an algorithm he had adapted to work on a 
pointer machine.

I did not know what he was talking about and asked back.

"A machine without arrays." was the reply.

After his reclaim  not to be teasing I recognized, that in fact the 
main memory is just a special case of all memory vailable.

The special case is represented by the ability to have arrays, i.e. 
in essence the ability to implement for a given natural number n a 
function f from the range [ 1 .. n] to  e1, e2 ... en, with f(i)=ei
and retrieval in constant time, unaffected from the value of n.

The fact that there exists an upper bound N for the value of n is 
usualy left aside and therefore the fact that computers in general 
must handle memory uncapable of having arrays: secondary storage, 
nearline storage, offline storage or pointer memory for short.

Note that virtual memory is another pointer memory with a special 
case.

In the thread starting with
digitalmars.D/30773
I made up the suggestion, that every language with a GC must also do 
the management for the virtual memory. This is not disagreed upon. 
Instead the hint was given, that some GC interact with the VMM of the 
OS, which seems wrong to me.

A quick question to a favorite search engine gave this link, which 
shows that some are already aware of the arising problems: 
http://www.onlamp.com/pub/a/onlamp/2005/10/27/memory-management.html

Will D be capable of creating an "array" consisting of 2^34 elements 
on a 32-bit machine and as a special case "plain arrays", that fit 
into the real meory?

-manfred  
Dec 05 2005
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Manfred Nowak wrote:

 In the thread starting with
 digitalmars.D/30773
 I made up the suggestion, that every language with a GC must also do 
 the management for the virtual memory. This is not disagreed upon. 
 Instead the hint was given, that some GC interact with the VMM of the 
 OS, which seems wrong to me.

Why should every language with a GC have to manage virtual memory? Wasn't virtual memory introduced in operating systems just to add an abstraction that would allow programs to ignore this issue? For instance, there is no OutOfMemory error being thrown when the physical memory runs out. To the program, all there is is virtual memory. The physical memory is nothing more than a cache. Of course, a GC that would be aware of the VMM would be able to perform better when memory is tight. What is wrong with the GC using this possibility? A GC in the OS would IMHO be the ideal case, but this may not be preferred for programs with very specific needs. But in the same way, virtual memory is a trade of. Simplicty vs specific needs of specific programs.
 A quick question to a favorite search engine gave this link, which 
 shows that some are already aware of the arising problems: 
 http://www.onlamp.com/pub/a/onlamp/2005/10/27/memory-management.html
 
 Will D be capable of creating an "array" consisting of 2^34 elements 
 on a 32-bit machine and as a special case "plain arrays", that fit 
 into the real meory?

In general, a 32 bit platform is only able to index 32 bits of memory. This is more or less the definition. You could of course make a very specific implementation on a 32 bit platform that had a larger address range, but I would say this belongs in the OS. On a 64 bit machine on the other hand, there is enough space to index the entire available memory within the same address space (Hard disk, physical memory, virtual memory, etc...) and at the same time probably also give each process its own memory range and more. This would be great for databases, persistent objects, etc. But as I said, I believe this belongs in the OS. The solution is to switch to 64 bit platforms. :) Regards, /Oskar
Dec 05 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 
 In general, a 32 bit platform is only able to index 32 bits of memory. 
 This is more or less the definition. You could of course make a very 
 specific implementation on a 32 bit platform that had a larger address 
 range, but I would say this belongs in the OS.

Agreed. Windows has support for ~34 bits of addressable space using some fancy tricks, but I don't expect them to be very popular now that Win64 is out. Sean
Dec 05 2005
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Oskar Linde wrote:

[...]
 Why should every language with a GC have to manage virtual
 memory?

I understand the work of the GC in that it has to check every possible anchor for still pointing to the peace of memory which is a candiate for a release. If the anchor is swapped out by the OS without any information to the GC, the OS must swap the content of the anchor back. In the worst case all virtual memory must be swapped back which will result in an overall stop of action. On the other hand if the VMM informs the GCC of swapping out and back, the GC can keep track of the anchors currently swapped out, eliminating the need to swap them back. This information might be passed to the GC by a standardized interface, otherwise the GC is bound to the specific implementation of the VMM. But even if there is a standardized interface, the VMM can only have decision rules for swapping out that are heuristics because the VMM does not know anything about the algorithms running. This knowledge is a propriety of the implementator partly discovered by the compiler. Therefore only the implementator can efficiently override the heuristics of the VMM. Again this information might be given through a standard interface to the VMM, but what is left for the VMM then? In addition, if there is anything left up to the VMM the implementator must acquire knowledge of the heuristics of every VMM availabe in order to be able to override them. But the future is unforseeable ;-) In total this is a proof by contradiction. The assumption that there is an independent VMM rises the need to totally control it and therefore making it dependent. ... and if it is dependent it belongs to the language having the GC.
 Wasn't virtual memory introduced in operating systems
 just to add an abstraction that would allow programs to ignore
 this issue?

Maybe that machines capable of virtual memory are selled with such argument, but who wants to hear of any losses grounded on the fact that somebody trusted in an abstraction?
 The physical memory is nothing more than a cache.

Very true and the cause why I started this thread. Nobody writes programs in order to be executed in the L1- or L2-cache of a processor. Why should one focus on programs manipulating the cache of the virtual memory?
 What is wrong with the GC using this possibility?

Nothing is wrong with GC's working together with the VMM as I said above. But VMM's doing so eliminate their own usefulness.
 A GC in the OS would IMHO be the ideal case, but this may not be
 preferred for programs with very specific needs.

Now you are at the critical point: 1) if GC's are best placed in the OS then 1a) D, as a language having an own GC, does not fit under that OS---or you are able to show, that more than one GC is of any benefit, thereby disqualifying D for only having one GC, or 1b) D is a special language suitable only for building OSs, thereby disqualifying D to be a general purpose language and at the same time rising the need for a VMM in the language, because also VMM's are under your assumption best placed in an OS. Therefore, also the assumption that GC's are best placed in OS's leads to the need for D to have also a VMM. [...]
 The solution is to switch to 64 bit platforms. :)

... and then ignore every upcoming problem until every machine has a PB of RAM installed ;-) -manfred
Dec 05 2005
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Manfred Nowak wrote:
 Oskar Linde wrote:
 
 [...]
 
Why should every language with a GC have to manage virtual
memory?


[The following is a convincing argument showing that a GC unable to cooperate with the VMM will perform poorly for programs using lots of memory.]
 
 I understand the work of the GC in that it has to check every 
 possible anchor for still pointing to the peace of memory which is 
 a candiate for a release. If the anchor is swapped out by the OS 
 without any information to the GC, the OS must swap the content of 
 the anchor back. In the worst case all virtual memory must be 
 swapped back which will result in an overall stop of action.
 
 On the other hand if the VMM informs the GCC of swapping out and 
 back, the GC can keep track of the anchors currently swapped out, 
 eliminating the need to swap them back.

All right. Assume we have a mark-and-sweep style GC. If the GC could get information about a page beeing swapped out or not and also be notified whenever a page is about to be swapped out and when a page is swapped in, it could, by adding a counter to each gc-region, keep a ref-count of swapped out references to each gc region. Those ref-counts would probably have to be stored in a never swapped out area of ram. (This would transform the GC into a semi-ref-counting GC. I see no other way of "keeping track of the ancors currently swapped out".) There are still potential problems. Allocating many small objects would mean that the ref counters in non-swappable memory would in worst case take up to half the amount of virtual memory, unless counters referred to larger chunks of memory. If counters did that, the GC would be unable to free many areas because they were potentially referred to by something in swapped out memory. This could lead to fragmentation. At some point, the GC would probably have to look at the swapped out memory anyway. When does a GC currently need to sweep the memory? 1. When the program runs out of virtual memory. 2. When requested by the user. (The user would expect this to be a potentially expensive operation and only be done at strategic moments.) 3. Potentially: when told to by the OS. The solution very simple: Increase the size of the virtual memory. This makes (1) happen more seldom. Not referenced regions that a GC would normally free would only occupy disk-space. They would not be swapped in unless the GC got run, at which point, they would be freed. Such memory would only be swapped out-in once. If your program has a working set of memory that occupies less than the by other programs free physical memory, its performance would only suffer at the few moments the GC sweeps the memory. Those sweeps would only happen when you run out of virtual memory. This can be made (almost) arbitrarily seldom by increasing the size of virtual memory. If, on the other hand, your program has a working set of memory larger than the free physical memory, its performance would suffer with or without a GC.
The physical memory is nothing more than a cache.

Very true and the cause why I started this thread. Nobody writes programs in order to be executed in the L1- or L2-cache of a processor. Why should one focus on programs manipulating the cache of the virtual memory?

One often tries to optimize code so that the current working set of memory fits within the L1- or L2-caches. This can give dramatic speed-ups. In a similar way, one should write programs that uses a working set of memory that fits within the free physical memory. The performance will suffer severely otherwise.
A GC in the OS would IMHO be the ideal case, but this may not be
preferred for programs with very specific needs.

Now you are at the critical point: 1) if GC's are best placed in the OS then 1a) D, as a language having an own GC, does not fit under that OS---or you are able to show, that more than one GC is of any benefit, thereby disqualifying D for only having one GC, or 1b) D is a special language suitable only for building OSs, thereby disqualifying D to be a general purpose language and at the same time rising the need for a VMM in the language, because also VMM's are under your assumption best placed in an OS.

or 1c) D would fit perfectly in such an OS by utilizing the OS built in GC.
 Therefore, also the assumption that GC's are best placed in OS's 
 leads to the need for D to have also a VMM.

I am unable to follow this conclusion.
 
 [...]
 
The solution is to switch to 64 bit platforms. :)

... and then ignore every upcoming problem until every machine has a PB of RAM installed ;-)

Which by the bastardized version of Moore's law would take at least 40 years. And at that point, I guess the solution is obvious for the next 104 years? ;) (Storing one bit per proton would mean the entire storage capacity of Jupiter would be byte-indexable on a 177 bit computer. All miscalculations reserved.) /Oskar
Dec 05 2005
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 
 All right. Assume we have a mark-and-sweep style GC. If the GC could get 
 information about a page beeing swapped out or not and also be notified 
 whenever a page is about to be swapped out and when a page is swapped 
 in, it could, by adding a counter to each gc-region, keep a ref-count of 
 swapped out references to each gc region. Those ref-counts would 
 probably have to be stored in a never swapped out area of ram. (This 
 would transform the GC into a semi-ref-counting GC. I see no other way 
 of "keeping track of the ancors currently swapped out".)

I think the common technique is to maintain a list of memory blocks and to store meta information about those blocks separately. The only thing you need for paging is a flag that specifies whether a given block is in RAM or not.
 There are still potential problems. Allocating many small objects would 
 mean that the ref counters in non-swappable memory would in worst case 
 take up to half the amount of virtual memory, unless counters referred 
 to larger chunks of memory. If counters did that, the GC would be unable 
 to free many areas because they were potentially referred to by 
 something in swapped out memory. This could lead to fragmentation. At 
 some point, the GC would probably have to look at the swapped out memory 
 anyway.

See above. You don't need a counter per object, just a flag per block. And since block size is typically equal to page size, there's little risk of tracking this information using tons of memory.
 When does a GC currently need to sweep the memory?
 
 1. When the program runs out of virtual memory.
 
 2. When requested by the user. (The user would expect this to be a 
 potentially expensive operation and only be done at strategic moments.)
 
 3. Potentially: when told to by the OS.

4. At idle times. I've considered calling incremental sweeps from within Thread.sleep() if the thread is supposed to sleep for a sufficient time period. Beyond that, it's not uncommon to run a low-priority GC thread to sweep in the background or to do mini sweeps during each allocation call. The tough part is making this all concurrent so threads avoid blocking one another, and still manage to clean up memory in a timely manner. I still have to do some research to find out how this is addressed in modern GCs. Plain old memory allocators seems pretty straightforward (Hoard, for instance), but the GC run throws a wrench in the gears. This is my current area of research with Ares, so I'll have more to say once I do a bit more reading (GC theory is still a relatively new topic for me). Next papers on the stack are "garbage collection without paging" and "a generational mostly-concurrent garbage collector." Fun stuff! Sean
Dec 05 2005
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Sean Kelly wrote:
 Oskar Linde wrote:
 
 All right. Assume we have a mark-and-sweep style GC. If the GC could 
 get information about a page beeing swapped out or not and also be 
 notified whenever a page is about to be swapped out and when a page is 
 swapped in, it could, by adding a counter to each gc-region, keep a 
 ref-count of swapped out references to each gc region. Those 
 ref-counts would probably have to be stored in a never swapped out 
 area of ram. (This would transform the GC into a semi-ref-counting GC. 
 I see no other way of "keeping track of the ancors currently swapped 
 out".)

I think the common technique is to maintain a list of memory blocks and to store meta information about those blocks separately. The only thing you need for paging is a flag that specifies whether a given block is in RAM or not.

Yes. I was commenting on Manfreds suggestion that a GC could keep track of all references held by swapped out memory blocks, so the GC would not have to sweep swapped out areas. The only realistic solution I can think of is some kind of ref-counting. I need to read up on how VM aware GCs handle this.
 There are still potential problems. Allocating many small objects 
 would mean that the ref counters in non-swappable memory would in 
 worst case take up to half the amount of virtual memory, unless 
 counters referred to larger chunks of memory. If counters did that, 
 the GC would be unable to free many areas because they were 
 potentially referred to by something in swapped out memory. This could 
 lead to fragmentation. At some point, the GC would probably have to 
 look at the swapped out memory anyway.

See above. You don't need a counter per object, just a flag per block. And since block size is typically equal to page size, there's little risk of tracking this information using tons of memory.

Yes. Even a counter per block would not use lots of memory. But if the block size is equal to the page size, a GC could potentially waste lots of memory? (Once again, I need to do some reading.)
 When does a GC currently need to sweep the memory?

 1. When the program runs out of virtual memory.

 2. When requested by the user. (The user would expect this to be a 
 potentially expensive operation and only be done at strategic moments.)

 3. Potentially: when told to by the OS.

4. At idle times. I've considered calling incremental sweeps from within Thread.sleep() if the thread is supposed to sleep for a sufficient time period. Beyond that, it's not uncommon to run a low-priority GC thread to sweep in the background or to do mini sweeps during each allocation call. The tough part is making this all concurrent so threads avoid blocking one another, and still manage to clean up memory in a timely manner. I still have to do some research to find out how this is addressed in modern GCs. Plain old memory allocators seems pretty straightforward (Hoard, for instance), but the GC run throws a wrench in the gears.

 This is my current area of research with Ares, so I'll have more to say 
 once I do a bit more reading (GC theory is still a relatively new topic 
 for me).  Next papers on the stack are "garbage collection without 
 paging" and "a generational mostly-concurrent garbage collector."  Fun 
 stuff!

I just skimmed through "garbage collection without paging", an interesting paper. I will try to read it fully when I get some time. It looks like they use a ref-counting method similar to above (the Bookmark counter) but not for each object. Interestingly, they needed to hack the kernel in linux to be able to communicate with the VMM. I guess this shows that most current GCs are not very VM aware. /Oskar
Dec 06 2005
parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Sean Kelly wrote:
 Oskar Linde wrote:

 All right. Assume we have a mark-and-sweep style GC. If the GC could 
 get information about a page beeing swapped out or not and also be 
 notified whenever a page is about to be swapped out and when a page 
 is swapped in, it could, by adding a counter to each gc-region, keep 
 a ref-count of swapped out references to each gc region. Those 
 ref-counts would probably have to be stored in a never swapped out 
 area of ram. (This would transform the GC into a semi-ref-counting 
 GC. I see no other way of "keeping track of the ancors currently 
 swapped out".)

I think the common technique is to maintain a list of memory blocks and to store meta information about those blocks separately. The only thing you need for paging is a flag that specifies whether a given block is in RAM or not.

Yes. I was commenting on Manfreds suggestion that a GC could keep track of all references held by swapped out memory blocks, so the GC would not have to sweep swapped out areas. The only realistic solution I can think of is some kind of ref-counting. I need to read up on how VM aware GCs handle this.

I just read a paper on this today. The technique they describe is to "bookmark" all objects that are referenced by a page being swapped to disk. This is done by setting a bit in the object metadata. And while this suggests that some dead objects may remain uncollected (because there is no way to tell whether the stuff in the swapped out page is alive when tracking references), it doesn't seem to be much of a problem from their testing. Performance for this mechanism is amazing, though the GC design is fairly complex and requires a small kernel patch on Linux to allow the GC to touch pages on their way out without confusing the VMM about whether the page should be swapped or not. But they've said they're trying to get the feature into the Linux kernel as an official feature, so this may no longer be an issue.
 See above.  You don't need a counter per object, just a flag per 
 block.  And since block size is typically equal to page size, there's 
 little risk of tracking this information using tons of memory.

Yes. Even a counter per block would not use lots of memory. But if the block size is equal to the page size, a GC could potentially waste lots of memory? (Once again, I need to do some reading.)

The allocators I've seen set aside different superblocks for different sized allocations in an attempt to reduce internal fragmentation. And the minimum allocation size is equivalent to a cache line (64 bytes on most systems from what I understand--I need to read the Intel/AMD tech docs to make sure). This also serves to avoid false sharing at the slight expense of wasted memory.
 I just skimmed through "garbage collection without paging", an 
 interesting paper. I will try to read it fully when I get some time.
 It looks like they use a ref-counting method similar to above (the 
 Bookmark counter) but not for each object.

Yup. That's the one I read today.
 Interestingly, they needed to hack the kernel in linux to be able to 
 communicate with the VMM. I guess this shows that most current GCs are 
 not very VM aware.

A few of the newest ones are, but most are simply generational, as that's a good general approach to reduce paging. The bookmarking idea seems far more robust IMO, though the need for a kernel patch in Linux implies it's not terribly portable. Sean
Dec 06 2005
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Oskar Linde wrote:

[...]
 Assume we have a mark-and-sweep style GC.

Isn't the current D-GC unable to distinguish between anchors and normal data? I remember that one of the reasons all data is initialized, is that otherwise especially in the data sections of arrays might be patterns to be interpreted as pointers to release candidates. Then preventing the release. This would mean, that the current GC must inspect all memory, regardless of beeing swapped out or not. In turn this would be a reason for the massive swapping I observed: the program was doing nearly nothing because the GC was constantly inspecting all memory, in vain searching for memory to release? -manfred
Dec 08 2005
parent Sean Kelly <sean f4.ca> writes:
Manfred Nowak wrote:
 Oskar Linde wrote:
 
 [...]
 Assume we have a mark-and-sweep style GC.

Isn't the current D-GC unable to distinguish between anchors and normal data? I remember that one of the reasons all data is initialized, is that otherwise especially in the data sections of arrays might be patterns to be interpreted as pointers to release candidates. Then preventing the release. This would mean, that the current GC must inspect all memory, regardless of beeing swapped out or not. In turn this would be a reason for the massive swapping I observed: the program was doing nearly nothing because the GC was constantly inspecting all memory, in vain searching for memory to release?

Possibly, yes. As D is a systems language, D lacks a great deal of language support for GC that you might find in a language like Java. I think things will improve once we get better type info, but it's likely that the best case will still be a conservative collector "with hints." Thus there will always be blocks that the GC must scan simply because they *may* contain pointer data. Here, I'm thinking of anything added to the GC via addRange, allocated via gc.malloc, etc. I have a feeling that a traditional moving GC will never work with D, though perhaps a partially moving GC would. That said, I don't see this as much of an issue, as I've seen examples where moving GCs perform quite badly. And there are in-place generational GCs available as well (Boehm et al. wrote one for C/C++). Sean
Dec 08 2005