www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Analysis of D GC

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
My take on D's GC problem, also spoiler - I'm going to build a 
new one soonish.

http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

---
Dmitry Olshansky
Jun 19
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
What is it about Windows that makes you call it a distant 
possibility? Is it just that you are unfamiliar with it or is 
there some specific OS level feature you plan on needing?
Jun 19
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 19 June 2017 at 22:50:05 UTC, Adam D. Ruppe wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?
This is mostly because I wanted to abuse lazy commit of POSIX. Now that I think of it Windows is mostly ok, except for the fork trick used in concurrent GC. As Vladimir pointed out on Windows there are other ways to do it but they are more involved. --- Dmitry Olshansky
Jun 20
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 20 June 2017 at 07:11:10 UTC, Dmitry Olshansky wrote:
 On Monday, 19 June 2017 at 22:50:05 UTC, Adam D. Ruppe wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?
This is mostly because I wanted to abuse lazy commit of POSIX. Now that I think of it Windows is mostly ok, except for the fork trick used in concurrent GC. As Vladimir pointed out on Windows there are other ways to do it but they are more involved. --- Dmitry Olshansky
BTW, Rainer Schuetze has studied this in detail and has written down some of it here: http://rainers.github.io/visuald/druntime/concurrentgc.html
Jun 20
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 06/19/2017 03:35 PM, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a new one
 soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
Very informative, thanks. However, I can think of many reasons like appreciation the efforts of the original authors to tone it down a little bit like changing "mistake" to "optimization opportunity", "criticism" to "observation", etc. :) Ali
Jun 19
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 19 June 2017 at 23:10:43 UTC, Ali Çehreli wrote:
 On 06/19/2017 03:35 PM, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one
 soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
Very informative, thanks. However, I can think of many reasons like appreciation the efforts of the original authors to tone it down a little bit like changing "mistake" to "optimization opportunity", "criticism" to "observation", etc. :)
I could call it a problem :) Still one reason I didn't go to D blog to post this is because it's a critique followed by a promise of action though.
 Ali
--- Dmitry Olshansky
Jun 20
prev sibling next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Jun 19, 2017 at 10:35:42PM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 My take on D's GC problem, also spoiler - I'm going to build a new one
 soonish.
 
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html
[...] Very interesting indeed! One question about killing the no interior pointer attribute: would this be problematic for 32-bit platforms? And if so, what do you plan to do about it? Keep the current GC as version(32bit) and your new version as version(64bit)? One (potentially crazy) idea that occurred to me while reading your post is TLS allocations. I haven't thought through the details of how this would interact with the existing language yet, but would it make sense for some allocations that you know will never be shared across threads to be allocated in a thread-local pool instead of the global pool? I.e., in addition to the global set of memory pools you also have thread-local memory pools. Then you could potentially run collections per-thread rather than stop-the-world. For example, if you have a bunch of threads that call a function that does a bunch of short-lived allocations that are not shared across threads, it seems to wasteful to have these allocations add to the global GC load. Why not have them go into a local pool that can be collected per-thread? Of course, whether the current language can take advantage of this is another matter. Perhaps if the function is pure and returns scope, then you know any allocation it makes can't possibly be shared with other threads, or something like that... On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant possibility?
 Is it just that you are unfamiliar with it or is there some specific
 OS level feature you plan on needing?
He mentioned the "fork trick", which I assume refers to how Linux's implementation of fork() uses copy-on-write rather than immediately duplicating the parent process' memory structures. There was a D1 GC some time ago that depended on this behaviour to speed up the collection cycle. AFAIK, Windows does not have equivalent functionality to this. (Well, for that matter, I'm not sure Posix in general has this feature either, since AFAIK it's Linux-specific. But I surmise that modern-day *nix flavors probably have adopted this in one way or another, since otherwise the very common pattern of fork-and-exec would be inordinately expensive -- copying all the parent's pages only to replace them all pretty much immediately.) T -- Give me some fresh salted fish, please.
Jun 19
next sibling parent reply safety0ff <safety0ff.dev gmail.com> writes:
On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:
 On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via 
 Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?
AFAIK, Windows does not have equivalent functionality to this.
I've read that there is such a function on Windows but you need to use undocumented/unofficial API to access it: e.g. https://github.com/opencollab/scilab/blob/master/scilab/modules/parallel/src/c/forkWindows.c
Jun 19
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
safety0ff wrote:

 On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:
 On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via 
 Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant possibility? 
 Is it just that you are unfamiliar with it or is there some specific OS 
 level feature you plan on needing?
AFAIK, Windows does not have equivalent functionality to this.
I've read that there is such a function on Windows but you need to use undocumented/unofficial API to access it: e.g. https://github.com/opencollab/scilab/blob/master/scilab/modules/parallel/src/c/forkWindows.c
it is higly depends of undocumented windows internals, and not portable between windows versions. more-or-less working implementations of `fork()` were existed at least since NT3 era, but nobody considered 'em as more than a PoC, and even next service pack can break everything.
Jun 19
parent reply Jacob Carlborg <doob me.com> writes:
On 2017-06-20 06:37, ketmar wrote:

 it is higly depends of undocumented windows internals, and not portable 
 between windows versions. more-or-less working implementations of 
 `fork()` were existed at least since NT3 era, but nobody considered 'em 
 as more than a PoC, and even next service pack can break everything.
I'm wondering what Windows 10 is using to implement "fork" for Windows Subsystem for Linux. If it's using these internal functions or something else. -- /Jacob Carlborg
Jun 20
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 20/06/2017 12:41 PM, Jacob Carlborg wrote:
 On 2017-06-20 06:37, ketmar wrote:
 
 it is higly depends of undocumented windows internals, and not 
 portable between windows versions. more-or-less working 
 implementations of `fork()` were existed at least since NT3 era, but 
 nobody considered 'em as more than a PoC, and even next service pack 
 can break everything.
I'm wondering what Windows 10 is using to implement "fork" for Windows Subsystem for Linux. If it's using these internal functions or something else.
It wouldn't surprise me to learn that it was a posix layer specific syscall, meaning we can't from a native Windows process.
Jun 20
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 20 June 2017 at 11:44:41 UTC, rikki cattermole wrote:
 On 20/06/2017 12:41 PM, Jacob Carlborg wrote:
 On 2017-06-20 06:37, ketmar wrote:
 
 it is higly depends of undocumented windows internals, and 
 not portable between windows versions. more-or-less working 
 implementations of `fork()` were existed at least since NT3 
 era, but nobody considered 'em as more than a PoC, and even 
 next service pack can break everything.
I'm wondering what Windows 10 is using to implement "fork" for Windows Subsystem for Linux. If it's using these internal functions or something else.
It wouldn't surprise me to learn that it was a posix layer specific syscall, meaning we can't from a native Windows process.
The Windows Subsystem for Linux is build on a new form processes called picoprocesses. There's a whole API build specifically to service WSL, that's not otherwise available (AFAIR) for security reasons to normal processes. I highly recommend watching this talk: https://www.youtube.com/watch?v=36Ykla27FIo and browsing through this repo: https://github.com/ionescu007/lxss which reveals many interesting details about that part of Windows. I have watched that talk a while ago and maybe I have misremembered something, but my understanding is that using the WSL infrastructure is off limits for normal Win32 processes and as such is not suitable for implementation of CoW pages for D's GC. (I watched that talk specifically because I was interested if some of that could be used in druntime.)
Jun 20
parent Jacob Carlborg <doob me.com> writes:
On 2017-06-20 16:16, Petar Kirov [ZombineDev] wrote:

 I highly recommend watching this talk: 
 https://www.youtube.com/watch?v=36Ykla27FIo and browsing through this 
 repo: https://github.com/ionescu007/lxss which reveals many interesting 
 details about that part of Windows.
Looks interesting. -- /Jacob Carlborg
Jun 20
prev sibling next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
H. S. Teoh wrote:

 He mentioned the "fork trick", which I assume refers to how Linux's
 implementation of fork() uses copy-on-write rather than immediately
 duplicating the parent process' memory structures.  There was a D1 GC
 some time ago that depended on this behaviour to speed up the collection
 cycle.
and it was even ported to D2, and worked. sadly, using `fork()` has it's own set of problems -- `fork()` itself is in no way a flawless expirience. like you can fork while other thread is inside glibc's `malloc()`, and BOOM! alot of glibc is locked forever, as `malloc()` lock is never released in child process. some other libraries may try to intercept `fork()` to do unnecessary "cleanup", and so on. so using "forking GC" require alot of discipline in coding and library use, or it will be an endless source of heisenbugs. new linux kernels got userfaultfd API (so code can simply `select()` on fd, and process protection violation from `mprotect()` without tricks with signals), but... to much of my joy and hapiness, the proposed API was just fine to create GC with mprotect barriers, and the final API that was included gladly omited that exactly necessary API call which allows to make it happen. great work, yeah. it may changed since then, tho, i didn't rechecked.
Jun 19
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 20 June 2017 at 04:35:27 UTC, ketmar wrote:
 H. S. Teoh wrote:

 He mentioned the "fork trick", which I assume refers to how 
 Linux's
 implementation of fork() uses copy-on-write rather than 
 immediately
 duplicating the parent process' memory structures.  There was 
 a D1 GC
 some time ago that depended on this behaviour to speed up the 
 collection
 cycle.
and it was even ported to D2, and worked. sadly, using `fork()` has it's own set of problems -- `fork()` itself is in no way a flawless expirience. like you can fork while other thread is inside glibc's `malloc()`, and BOOM! alot of glibc is locked forever, as `malloc()` lock is never released in child process. some other libraries may try to intercept `fork()` to do unnecessary "cleanup", and so on.
Since we are in control of what child does I see this as no issue. Just call mmap and do bump a pointer allocation.
Jun 20
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:
 On Mon, Jun 19, 2017 at 10:35:42PM +0000, Dmitry Olshansky via 
 Digitalmars-d wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.
 
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html
[...] Very interesting indeed! One question about killing the no interior pointer attribute: would this be problematic for 32-bit platforms? And if so, what do you plan to do about it? Keep the current GC as version(32bit) and your new version as version(64bit)?
Yeah if said 32-bit application makes use of no interior pointer attribute then using old gc is an option. I have no plans for this broken attribute.
 One (potentially crazy) idea that occurred to me while reading 
 your post is TLS allocations. I haven't thought through the 
 details of how this would interact with the existing language 
 yet, but would it make sense for some allocations that you know 
 will never be shared across threads to be allocated in a 
 thread-local pool instead of the global pool? I.e., in addition 
 to the global set of memory pools you also have thread-local 
 memory pools. Then you could potentially run collections 
 per-thread rather than stop-the-world.
This needs spec updateon interaction between TLS and shared, in particular the current trend of lock + cast away shared is problematic. Also the implicit cast to immutable of a result of unique expression.
 On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via 
 Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?
He mentioned the "fork trick", which I assume refers to how Linux's implementation of fork() uses copy-on-write rather than immediately duplicating the parent process' memory structures. There was a D1 GC some time ago that depended on this behaviour to speed up the collection cycle. AFAIK, Windows does not have equivalent functionality to this.
To the best of my knowledge all of D's current target OSes support this save for Windows.
 T
Jun 20
prev sibling next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.
Looks like I'm not the only one itching to have a go at D's GC :) This will very likely be my DConf 2018 project. However, I have slightly different plans: - The GC should be usable as a library (mainly to facilitate testing). - Support for all platforms D already supports from the start. - Use design-by-introspection when applicable and design-by-contract elsewhere to split the design into modular components. - Make the GC configurable (using policies) and swappable at runtime. (No need to get clever, just treat previous implementation's pools as opaque void[]). - Support concurrency on Windows via anonymous memory-mapped files. - Support generational collection using write barriers implemented through memory protection. - Integrate existing GC work - don't reinvent the wheel. - More, much more debugging facilities! Integrate Diamond and Valgrind interoperability. - Gray-marking and compacting. - Still need to look at immix. I have some past work that I'd like to integrate (an experimental generational GC I wrote like 9 years ago for D1, Diamond, and Valgrind integration I have in a fork somewhere.)
Jun 19
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.
Looks like I'm not the only one itching to have a go at D's GC :) This will very likely be my DConf 2018 project. However, I have slightly different plans:
I see no problem in eventually uniting our efforts.
 - The GC should be usable as a library (mainly to facilitate 
 testing).
 - Support for all platforms D already supports from the start.
 - Use design-by-introspection when applicable and 
 design-by-contract elsewhere to split the design into modular 
 components.
Nice. A pool could have many different structures, the collector could then introspect on that. Sadly this almost doubles the effort so I will not go there.
 - Make the GC configurable (using policies) and swappable at 
 runtime. (No need to get clever, just treat previous 
 implementation's pools as opaque void[]).
 - Support concurrency on Windows via anonymous memory-mapped 
 files.
Yeah I recall Rainer and myself discussing this approach, it had some downside such as you need to remap each pool individually. Still doable.
 - Support generational collection using write barriers 
 implemented through memory protection.
Super slow sadly. That being said I belive D is just fine without generational GC. The generational hypothesis just doesn't hold to the extent it holds in say Java. My hypothesis is that most performance minded applications already allocate temporaries using region allocator of sorts (or using C heap).
 - Integrate existing GC work - don't reinvent the wheel.
 - More, much more debugging facilities! Integrate Diamond and 
 Valgrind interoperability.
I could use help on thus one.
 - Gray-marking and compacting.
 - Still need to look at immix.

 I have some past work that I'd like to integrate (an 
 experimental generational GC I wrote like 9 years ago for D1, 
 Diamond, and Valgrind integration I have in a fork somewhere.)
--- Dmitry Olshansky
Jun 20
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jun 20, 2017 at 07:47:13AM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:
[...]
 - Support generational collection using write barriers implemented
 through memory protection.
Super slow sadly. That being said I belive D is just fine without generational GC. The generational hypothesis just doesn't hold to the extent it holds in say Java. My hypothesis is that most performance minded applications already allocate temporaries using region allocator of sorts (or using C heap).
[...] FWIW, here's a data point to the contrary: One of my projects involves constructing a (very large) AA that grows over time, and entries are never deleted. The AA itself is persistent and lasts until the end of the program. Besides the AA, there are a couple of arrays that also grow (more slowly) but eventually become unreferenced. Because of the sheer size of the AA, I've observed that GC collection cycles become slower and slower, yet most of this extra work is completely needless, because the only thing that might need collecting is the arrays, yet the GC has to mark the entire AA each time, only to discover it's still live. After some experimentation I discovered that I could get up to 40-50% performance improvement just by calling GC.disable and scheduling my own GC collection cycles via GC.collect at a slower rate than the current default setting.
From this, it would seem to me that a generational collector would have
helped, since most of the AA will eventually migrate to older generations and most of the time the GC won't bother marking/scanning those parts. Of course, this is only for this particular program, and I can't say that this is typical usage for D programs in general. But I think D would still benefit from a generational collector. T -- What did the alien say to Schubert? "Take me to your lieder."
Jun 20
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 20 June 2017 at 16:49:44 UTC, H. S. Teoh wrote:
 On Tue, Jun 20, 2017 at 07:47:13AM +0000, Dmitry Olshansky via 
 Digitalmars-d wrote:
 On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev 
 wrote:
[...] FWIW, here's a data point to the contrary: One of my projects involves constructing a (very large) AA that grows over time, and entries are never deleted. The AA itself is persistent and lasts until the end of the program. Besides the AA, there are a couple of arrays that also grow (more slowly) but eventually become unreferenced. Because of the sheer size of the AA, I've observed that GC collection cycles become slower and slower, yet most of this extra work is completely needless, because the only thing that might need collecting is the arrays, yet the GC has to mark the entire AA each time, only to discover it's still live. After some experimentation I discovered that I could get up to 40-50% performance improvement just by calling GC.disable and scheduling my own GC collection cycles via GC.collect at a slower rate than the current default setting.
From this, it would seem to me that a generational collector 
would have
helped, since most of the AA will eventually migrate to older generations and most of the time the GC won't bother marking/scanning those parts. Of course, this is only for this particular program, and I can't say that this is typical usage for D programs in general. But I think D would still benefit from a generational collector.
Interestingly the moment you "reallocate" to expand the AA it will be considered a new object. Overall I think your case is more about faulty collection heuristics, that is collecting when there is a slim chance of getting enough of free space after collection.
 T
Jun 20
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jun 20, 2017 at 07:14:11PM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 On Tuesday, 20 June 2017 at 16:49:44 UTC, H. S. Teoh wrote:
[...]
 Interestingly the moment you "reallocate" to expand the AA it will be
 considered a new object.
[...] This is not entirely true. The *table* itself will of course get moved to a new object, but most of the size of the AA comes from its entries, and those are nodes that stay in-place. You'll still have to scan references to the table, of course, but that's a lot better than scanning all the entries as well. T -- The diminished 7th chord is the most flexible and fear-instilling chord. Use it often, use it unsparingly, to subdue your listeners into submission!
Jun 20
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2017-06-20 01:52, Vladimir Panteleev wrote:

 - More, much more debugging facilities! Integrate Diamond and Valgrind 
 interoperability.
Don't for get the Clang sanitizers, assuming they work using LDC. -- /Jacob Carlborg
Jun 20
prev sibling next sibling parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
 if not a single pool is capable to service an allocation a new 
 pool is allocated
should probably be "if a single pool is not capable of servicing ..." Looove the figures! Looking forward to seeing the results.
Jun 19
prev sibling next sibling parent reply safety0ff <safety0ff.dev gmail.com> writes:
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
Good overview, however: the binary search pool lookup is used because it naturally supports variable sized pools. IMHO, simply concluding "A hash table could have saved quite a few cycles." glosses over the issue of handling variable sizes.
Jun 19
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 20 June 2017 at 02:23:48 UTC, safety0ff wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
Good overview, however: the binary search pool lookup is used because it naturally supports variable sized pools. IMHO, simply concluding "A hash table could have saved quite a few cycles." glosses over the issue of handling variable sizes.
Pools are granular to 256kb irc, so the trick is to keep them 256kb aligned in memory. Then a map from 256kb chunks to pools is easily created. --- Dmitry Olshansky
Jun 20
prev sibling next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Dmitry Olshansky wrote:

 My take on D's GC problem, also spoiler - I'm going to build a new one 
 soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html
"...the dubious optimization of no interior pointers..." this is the ONLY (i emphasise it!) way i were able to make my e-mail and irc clients to not leak memory, and keep using GC. on 32-bit systems false pointers *is* a problem, and NO_INTERIOR really helps. turning NO_INTERIOR into something dog-slow (or noop) will make D unusable on 32-bit systems for anything more complex than helloworld and throwaway scripts. particularly, any app that should work for weeks or monthes without restart (yep, i want my mail client to Just Work, and i'm not rebooting my PC that often) will be *forced* to ditch GC. while NO_INTERIOR requires some coding discipline, it is invaluable in IRL apps.
Jun 19
parent reply Jacob Carlborg <doob me.com> writes:
On 2017-06-20 06:54, ketmar wrote:

 "...the dubious optimization of no interior pointers..."
 
 this is the ONLY (i emphasise it!) way i were able to make my e-mail and 
 irc clients to not leak memory, and keep using GC. on 32-bit systems 
 false pointers *is* a problem, and NO_INTERIOR really helps.
 
 turning NO_INTERIOR into something dog-slow (or noop) will make D 
 unusable on 32-bit systems for anything more complex than helloworld and 
 throwaway scripts. particularly, any app that should work for weeks or 
 monthes without restart (yep, i want my mail client to Just Work, and 
 i'm not rebooting my PC that often) will be *forced* to ditch GC.
 
 while NO_INTERIOR requires some coding discipline, it is invaluable in 
 IRL apps.
You need to move to 64bit. Apple is already deprecating support for 32bit apps and after the next version of macOS (High Sierra) they're going to remove the support for 32bit apps. -- /Jacob Carlborg
Jun 20
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 20 June 2017 at 11:49:49 UTC, Jacob Carlborg wrote:
 On 2017-06-20 06:54, ketmar wrote:

 [...]
You need to move to 64bit. Apple is already deprecating support for 32bit apps and after the next version of macOS (High Sierra) they're going to remove the support for 32bit apps.
I highly doubt that ketmar would have any intention of touching macOS regardless ;) Besides, there are many domains where the x32 ABI is a more worthwhile upgrade from i688 than x86_64.
Jun 20
parent Jacob Carlborg <doob me.com> writes:
On 2017-06-20 16:03, Petar Kirov [ZombineDev] wrote:

 I highly doubt that ketmar would have any intention of touching macOS
 regardless ;)
I somehow mixed up ketmar and Guillaume Piolat (which used to go by the alias p0nce). My mistake. -- /Jacob Carlborg
Jun 20
prev sibling parent reply Adrian Matoga <dlang.spam matoga.info> writes:
On Tuesday, 20 June 2017 at 11:49:49 UTC, Jacob Carlborg wrote:
 You need to move to 64bit. Apple is already deprecating support 
 for 32bit apps and after the next version of macOS (High 
 Sierra) they're going to remove the support for 32bit apps.
There are other 32-bit platforms that are going to stay on the market for a while. 32-bit ARMs won't disappear anytime soon.
Jun 25
parent Jacob Carlborg <doob me.com> writes:
On 2017-06-25 17:47, Adrian Matoga wrote:

 There are other 32-bit platforms that are going to stay on the market
 for a while. 32-bit ARMs won't disappear anytime soon.
Sure, but as I mentioned I mixed up ketmar and Guillaume Piolat and Guillaume Piolat is using Apple platforms, as far as I understand. -- /Jacob Carlborg
Jun 26
prev sibling next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
This was posted on reddit: https://www.reddit.com/r/programming/comments/6ic52d/inside_ds_gc/
Jun 20
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/20/2017 12:04 AM, Nicholas Wilson wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html
This was posted on reddit: https://www.reddit.com/r/programming/comments/6ic52d/inside_ds_gc/
Also on hacker news.
Jun 20
prev sibling next sibling parent reply Ecstatic Coder <ecstatic.coder gmail.com> writes:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
Many thanks for your efforts Dmitry :) May I ask you if you plan to make a soft real-time GC similar to the one implemented in the Nim language ? https://nim-lang.org/docs/gc.html https://nim-lang.org/docs/intern.html#debugging-nim-s-memory-management What is great about it is that we can call it regularly to collect memory a bit at a time, giving it a maximum delay for this operation. Being able to manually specify the maximum GC delay is what makes Nim compatible with game development, as collections can be made iteratively, and on a per-thread basis. In the worst case, we know that just one of the application threads will be delayed for a few milliseconds between two frame renderings, which is generally acceptable for games and other similar applications. Moreover this opens to opportunity to call the GC only in the main menu or the pause menu for instance, but not during actual gameplay, so that even these few lost milliseconds will always remain unnoticed. This is probably why Nim's author was once paid to wrap an open source game engine (Urho3D), and improve the language's native compatibility with C++ libraries. https://forum.nim-lang.org/t/870
Jun 20
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 20 June 2017 at 15:16:01 UTC, Ecstatic Coder wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
Many thanks for your efforts Dmitry :) May I ask you if you plan to make a soft real-time GC similar to the one implemented in the Nim language ? https://nim-lang.org/docs/gc.html https://nim-lang.org/docs/intern.html#debugging-nim-s-memory-management What is great about it is that we can call it regularly to collect memory a bit at a time, giving it a maximum delay for this operation.
No incremental GC, sorry. It may grow thread-local collection one day, once spec is precise about what is allowed and what is not.
Jun 20
prev sibling parent Kagamin <spam here.lot> writes:
On Tuesday, 20 June 2017 at 15:16:01 UTC, Ecstatic Coder wrote:
 This is probably why Nim's author was once paid to wrap an open 
 source game engine (Urho3D), and improve the language's native 
 compatibility with C++ libraries.

 https://forum.nim-lang.org/t/870
https://github.com/3dicc/Urhonimo/blob/master/Urho3D-1.32/Source/Engine/Container/Str.h http://dbartolini.github.io/crown/doxygen/structcrown_1_1_dynamic_string.html Is it always like this?
Jun 22
prev sibling next sibling parent safety0ff <safety0ff.dev gmail.com> writes:
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 "But the main unanswered question is why? Why an extra pass?"
It's likely to pave over the many pitfalls of D finalizers. E.g. finalizers corrupting data: class A { size_t i; } class B { A a; this(){ a = new A; } ~this() { a.i = 1; } } // modifying B.a.i is undefined behavior (e.g. it could corrupt the GC's freelist) E.g. finalizers reading undefined data: class A { virtual bool check() { return true; } } class B { A a; this(){ a = new A; } ~this() { a.check(); } } // B.a's object header is undefined (e.g. replaced with GC freelist pointer) There's also invariants, which are prepended to the finalizers, so their code is subject to the same issues. The best thing about the current implementation is that object resurrection has never been supported.
Jun 22
prev sibling parent reply Martin Nowak <code dawg.eu> writes:
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
FYI, we've tried to improve the binary pool search, but there aren't many pools and it's quite hard to beat. A hashtable for a pages in the address range is too big. I'd like to replace all of those separate pools types with a single page heap, similar to what TCMalloc is using. http://goog-perftools.sourceforge.net/doc/tcmalloc.html http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html There was also https://github.com/dlang/druntime/pull/801 which got reverted. One problem that you'll run into with a Thread cache is synchronizing GC attributes. In the stalled work on a thread-cache for the current GC. Using single-reader single-writer queues to would've been an option there to reduce contention. https://github.com/MartinNowakhttps://github.com/dlang/druntime/compare/master...MartinNowak:gcCache#commitcomment-16202536
Jun 24
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Saturday, 24 June 2017 at 15:31:21 UTC, Martin Nowak wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky
FYI, we've tried to improve the binary pool search, but there aren't many pools and it's quite hard to beat. A hashtable for a pages in the address range is too big.
Doesn't have to be for pages. Pool granularity is 256k, aligning the pools at this boundary is enough. On x64 pool granularity could be enlarged.
 I'd like to replace all of those separate pools types with a 
 single page heap, similar to what TCMalloc is using.
 http://goog-perftools.sourceforge.net/doc/tcmalloc.html
 http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html
I still think that separate pool types is better, see eg jemalloc.
Jun 24
parent Martin Nowak <code dawg.eu> writes:
On Saturday, 24 June 2017 at 18:12:43 UTC, Dmitry Olshansky wrote:
 I still think that separate pool types is better, see eg 
 jemalloc.
Right now this leads to some inflation of RSS cause previously used and now freed pages can only be reused when the whole pool (e.g. 4MB or 16MB) is free again. It doesn't seem sensible to reserve 16MB only for big (>PAGESIZE) allocations. In particular once the pages are dirty and mapped, you'd rather want to make use of them.
Jun 25