D - Big Downside of Mark & Sweep
- Russ Lewis (31/31) May 15 2002 As I understand OS architecture, SOME operating systems will allocate
- Walter (10/41) May 15 2002 Those are good points, and why gc is not a panacea for all memory proble...
- Matthew Wilson (11/58) May 15 2002 Walter
- Walter (3/9) May 15 2002 Even better, the complete source comes with the D compiler.
As I understand OS architecture, SOME operating systems will allocate memory for you without actually allocating the pages. I know AIX does this, and I think Linux does as well. Basically, the idea is that the OS will allow a malloc() to happen, and mark off that you have x amount of pages now legally usable at a given address, but it will not modify the page table (or allocate physical or virtual memory for you) until you actually touch the page (and get a page fault on it). This works very well for situations where you might allocate big blocks of memory and in some situations not use them all. However, a mark & sweep collector ruins this. The mark & sweep collector, as I understand it, must scan all allocated memory for references to other memory blocks; thus, even if you allocate 100 MB of memory and never touch it, the next time that the GC runs every byte in that block will be examined, and so thousands of useless page faults will occur. The OS may even run out of memory trying to allocate pages for those useless hits. (At least on AIX, the OS will allow you to allocate more memory than its virtual paging can actually handle...but if you use all of it (and nobody else deallocates in the meantime), you will start to get "low memory" signals rampaging through the system.) This problem gets worse if the GC runs periodically in the background. Imagine a large program, with these large blocks of data kept around for a long time. The GC decides to run, and suddenly the whole computer bogs down as the mark-and-sweep causes a storm of page swapping in the virtual memory system. When it's done, all of those pages it touched have to be moved back to the virtual memory and the useful pages returned to physical memory. -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
May 15 2002
Those are good points, and why gc is not a panacea for all memory problems. There are several workarounds: 1) Mark the large allocation as "atomic" which means it doesn't get scanned. 2) Allocate & manage the large object separately from the gc. Nothing in D requires use of the gc, unlike Java. The gc itself uses memory mapped files for its pools. 3) Internal improvements can be made to the gc itself when D supports reflection to avoid unnecessary scanning. "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message news:3CE29544.1AAC2863 deming-os.org...As I understand OS architecture, SOME operating systems will allocate memory for you without actually allocating the pages. I know AIX does this, and I think Linux does as well. Basically, the idea is that the OS will allow a malloc() to happen, and mark off that you have x amount of pages now legally usable at a given address, but it will not modify the page table (or allocate physical or virtual memory for you) until you actually touch the page (and get a page fault on it). This works very well for situations where you might allocate big blocks of memory and in some situations not use them all. However, a mark & sweep collector ruins this. The mark & sweep collector, as I understand it, must scan all allocated memory for references to other memory blocks; thus, even if you allocate 100 MB of memory and never touch it, the next time that the GC runs every byte in that block will be examined, and so thousands of useless page faults will occur. The OS may even run out of memory trying to allocate pages for those useless hits. (At least on AIX, the OS will allow you to allocate more memory than its virtual paging can actually handle...but if you use all of it (and nobody else deallocates in the meantime), you will start to get "low memory" signals rampaging through the system.) This problem gets worse if the GC runs periodically in the background. Imagine a large program, with these large blocks of data kept around for a long time. The GC decides to run, and suddenly the whole computer bogs down as the mark-and-sweep causes a storm of page swapping in the virtual memory system. When it's done, all of those pages it touched have to be moved back to the virtual memory and the useful pages returned to physical memory. -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
May 15 2002
Walter Can you point me to a resource describing the details of D's GC architecture? I had not realised it was using mark and sweep, and would be very interested in reading around your current design (and any previous incarnations). Matthew "Walter" <walter digitalmars.com> wrote in message news:abudjd$evt$1 digitaldaemon.com...Those are good points, and why gc is not a panacea for all memoryproblems.There are several workarounds: 1) Mark the large allocation as "atomic" which means it doesn't getscanned.2) Allocate & manage the large object separately from the gc. Nothing in D requires use of the gc, unlike Java. The gc itself uses memory mappedfilesfor its pools. 3) Internal improvements can be made to the gc itself when D supports reflection to avoid unnecessary scanning. "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message news:3CE29544.1AAC2863 deming-os.org...As I understand OS architecture, SOME operating systems will allocate memory for you without actually allocating the pages. I know AIX does this, and I think Linux does as well. Basically, the idea is that the OS will allow a malloc() to happen, and mark off that you have x amount of pages now legally usable at a given address, but it will not modify the page table (or allocate physical or virtual memory for you) until you actually touch the page (and get a page fault on it). This works very well for situations where you might allocate big blocks of memory and in some situations not use them all. However, a mark & sweep collector ruins this. The mark & sweep collector, as I understand it, must scan all allocated memory for references to other memory blocks; thus, even if you allocate 100 MB of memory and never touch it, the next time that the GC runs every byte in that block will be examined, and so thousands of useless page faults will occur. The OS may even run out of memory trying to allocate pages for those useless hits. (At least on AIX, the OS will allow you to allocate more memory than its virtual paging can actually handle...but if you use all of it (and nobody else deallocates in the meantime), you will start to get "low memory" signals rampaging through the system.) This problem gets worse if the GC runs periodically in the background. Imagine a large program, with these large blocks of data kept around for a long time. The GC decides to run, and suddenly the whole computer bogs down as the mark-and-sweep causes a storm of page swapping in the virtual memory system. When it's done, all of those pages it touched have to be moved back to the virtual memory and the useful pages returned to physical memory. -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
May 15 2002
Even better, the complete source comes with the D compiler. "Matthew Wilson" <mwilson nextgengaming.com> wrote in message news:abvdkj$19d3$1 digitaldaemon.com...Walter Can you point me to a resource describing the details of D's GC architecture? I had not realised it was using mark and sweep, and would be very interested in reading around your current design (and any previous incarnations). Matthew
May 15 2002