www.digitalmars.com         C & C++   DMDScript  

D - Big Downside of Mark & Sweep

reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
As I understand OS architecture, SOME operating systems will allocate
memory for you without actually allocating the pages.  I know AIX does
this, and I think Linux does as well.  Basically, the idea is that the
OS will allow a malloc() to happen, and mark off that you have x amount
of pages now legally usable at a given address, but it will not modify
the page table (or allocate physical or virtual memory for you) until
you actually touch the page (and get a page fault on it).

This works very well for situations where you might allocate big blocks
of memory and in some situations not use them all.  However, a mark &
sweep collector ruins this.  The mark & sweep collector, as I understand
it, must scan all allocated memory for references to other memory
blocks; thus, even if you allocate 100 MB of memory and never touch it,
the next time that the GC runs every byte in that block will be
examined, and so thousands of useless page faults will occur.  The OS
may even run out of memory trying to allocate pages for those useless
hits.  (At least on AIX, the OS will allow you to allocate more memory
than its virtual paging can actually handle...but if you use all of it
(and nobody else deallocates in the meantime), you will start to get
"low memory" signals rampaging through the system.)

This problem gets worse if the GC runs periodically in the background.
Imagine a large program, with these large blocks of data kept around for
a long time.  The GC decides to run, and suddenly the whole computer
bogs down as the mark-and-sweep causes a storm of page swapping in the
virtual memory system.  When it's done, all of those pages it touched
have to be moved back to the virtual memory and the useful pages
returned to physical memory.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]
May 15 2002
parent reply "Walter" <walter digitalmars.com> writes:
Those are good points, and why gc is not a panacea for all memory problems.
There are several workarounds:

1) Mark the large allocation as "atomic" which means it doesn't get scanned.
2) Allocate & manage the large object separately from the gc. Nothing in D
requires use of the gc, unlike Java. The gc itself uses memory mapped files
for its pools.
3) Internal improvements can be made to the gc itself when D supports
reflection to avoid unnecessary scanning.

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3CE29544.1AAC2863 deming-os.org...
 As I understand OS architecture, SOME operating systems will allocate
 memory for you without actually allocating the pages.  I know AIX does
 this, and I think Linux does as well.  Basically, the idea is that the
 OS will allow a malloc() to happen, and mark off that you have x amount
 of pages now legally usable at a given address, but it will not modify
 the page table (or allocate physical or virtual memory for you) until
 you actually touch the page (and get a page fault on it).

 This works very well for situations where you might allocate big blocks
 of memory and in some situations not use them all.  However, a mark &
 sweep collector ruins this.  The mark & sweep collector, as I understand
 it, must scan all allocated memory for references to other memory
 blocks; thus, even if you allocate 100 MB of memory and never touch it,
 the next time that the GC runs every byte in that block will be
 examined, and so thousands of useless page faults will occur.  The OS
 may even run out of memory trying to allocate pages for those useless
 hits.  (At least on AIX, the OS will allow you to allocate more memory
 than its virtual paging can actually handle...but if you use all of it
 (and nobody else deallocates in the meantime), you will start to get
 "low memory" signals rampaging through the system.)

 This problem gets worse if the GC runs periodically in the background.
 Imagine a large program, with these large blocks of data kept around for
 a long time.  The GC decides to run, and suddenly the whole computer
 bogs down as the mark-and-sweep causes a storm of page swapping in the
 virtual memory system.  When it's done, all of those pages it touched
 have to be moved back to the virtual memory and the useful pages
 returned to physical memory.

 --
 The Villagers are Online! villagersonline.com

 .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
 .[ (a version.of(English).(precise.more)) is(possible) ]
 ?[ you want.to(help(develop(it))) ]
May 15 2002
parent reply "Matthew Wilson" <mwilson nextgengaming.com> writes:
Walter

Can you point me to a resource describing the details of D's GC
architecture? I had not realised it was using mark and sweep, and would be
very interested in reading around your current design (and any previous
incarnations).

Matthew

"Walter" <walter digitalmars.com> wrote in message
news:abudjd$evt$1 digitaldaemon.com...
 Those are good points, and why gc is not a panacea for all memory
problems.
 There are several workarounds:

 1) Mark the large allocation as "atomic" which means it doesn't get
scanned.
 2) Allocate & manage the large object separately from the gc. Nothing in D
 requires use of the gc, unlike Java. The gc itself uses memory mapped
files
 for its pools.
 3) Internal improvements can be made to the gc itself when D supports
 reflection to avoid unnecessary scanning.

 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:3CE29544.1AAC2863 deming-os.org...
 As I understand OS architecture, SOME operating systems will allocate
 memory for you without actually allocating the pages.  I know AIX does
 this, and I think Linux does as well.  Basically, the idea is that the
 OS will allow a malloc() to happen, and mark off that you have x amount
 of pages now legally usable at a given address, but it will not modify
 the page table (or allocate physical or virtual memory for you) until
 you actually touch the page (and get a page fault on it).

 This works very well for situations where you might allocate big blocks
 of memory and in some situations not use them all.  However, a mark &
 sweep collector ruins this.  The mark & sweep collector, as I understand
 it, must scan all allocated memory for references to other memory
 blocks; thus, even if you allocate 100 MB of memory and never touch it,
 the next time that the GC runs every byte in that block will be
 examined, and so thousands of useless page faults will occur.  The OS
 may even run out of memory trying to allocate pages for those useless
 hits.  (At least on AIX, the OS will allow you to allocate more memory
 than its virtual paging can actually handle...but if you use all of it
 (and nobody else deallocates in the meantime), you will start to get
 "low memory" signals rampaging through the system.)

 This problem gets worse if the GC runs periodically in the background.
 Imagine a large program, with these large blocks of data kept around for
 a long time.  The GC decides to run, and suddenly the whole computer
 bogs down as the mark-and-sweep causes a storm of page swapping in the
 virtual memory system.  When it's done, all of those pages it touched
 have to be moved back to the virtual memory and the useful pages
 returned to physical memory.

 --
 The Villagers are Online! villagersonline.com

 .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
 .[ (a version.of(English).(precise.more)) is(possible) ]
 ?[ you want.to(help(develop(it))) ]
May 15 2002
parent "Walter" <walter digitalmars.com> writes:
Even better, the complete source comes with the D compiler.

"Matthew Wilson" <mwilson nextgengaming.com> wrote in message
news:abvdkj$19d3$1 digitaldaemon.com...
 Walter

 Can you point me to a resource describing the details of D's GC
 architecture? I had not realised it was using mark and sweep, and would be
 very interested in reading around your current design (and any previous
 incarnations).

 Matthew
May 15 2002