www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Std Phobos 2 and logging library?

reply Zz <Zz noaddress.com> writes:
Hi,

Are there any plans for a logging library in Std Phobos 2.0?

Zz
Apr 10 2009
next sibling parent reply BLS <windevguy hotmail.de> writes:
Zz wrote:
 Hi,
 
 Are there any plans for a logging library in Std Phobos 2.0?
 
 Zz

In case that you want something special, ask the tango folks. ( beside, logging is avail. there for quite a while) BjŲrn
Apr 10 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
BLS wrote:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz


That's a rather random thing to say, particularly in wake of the recent concerted efforts to improve Phobos and to port it to new OSs. Walter, Don, and myself are working actively on Phobos. Sean is helping a lot with druntime and only lack of time is preventing him from adding to Phobos. Andrei
Apr 10 2009
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from BLS (windevguy hotmail.de)'s article
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?


Phobos has never been a one-man show. Either way, you're guaranteed not to get what you want if you don't even bother to ask for it.
Apr 10 2009
prev sibling next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
BLS wrote:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

In case that you want something special, ask the tango folks. ( beside, logging is avail. there for quite a while) BjŲrn

It's at least a five-person show: Andrei, Walter, Sean Kelly, braddr, and Don have committed to Phobos svn in the past two weeks. Random other people have donated code to it. Granted, Sean probably only concerns himself with druntime compatibility, and Don is probably mostly concerned with std.math and related modules.
Apr 10 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
Christopher Wright, el 10 de abril a las 16:18 me escribiste:
 BLS wrote:
Zz wrote:
Hi,

Are there any plans for a logging library in Std Phobos 2.0?

Zz

In case that you want something special, ask the tango folks. ( beside, logging is avail. there for quite a while) Björn

It's at least a five-person show: Andrei, Walter, Sean Kelly, braddr, and Don have committed to Phobos svn in the past two weeks. Random other people have donated code to it. Granted, Sean probably only concerns himself with druntime compatibility, and Don is probably mostly concerned with std.math and related modules.

And Braddr just made a documentation fix, and Walter only commits portability stuff and an occasional bug fix now and then, so... Yes, it really looks like a five-person show =) I think most work in Phobos now it's done by Andrei, there are other *collaborators* (the four other you named plus people sending patches), but it looks like Andrei's show to me. This is not necessarily bad, it's definitely better than before, when it was Walter's show, now at least he can dedicate his efforts in the compiler and language and Phobos is having a lot more attention. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Did you see the frightened ones? Did you hear the falling bombs? Did you ever wonder why we had to run for shelter when the promise of a brave new world unfurled beneath a clear blue sky?
Apr 10 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Christopher Wright, el 10 de abril a las 16:18 me escribiste:
 BLS wrote:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

In case that you want something special, ask the tango folks. ( beside, logging is avail. there for quite a while) BjŲrn

and Don have committed to Phobos svn in the past two weeks. Random other people have donated code to it. Granted, Sean probably only concerns himself with druntime compatibility, and Don is probably mostly concerned with std.math and related modules.

And Braddr just made a documentation fix, and Walter only commits portability stuff and an occasional bug fix now and then, so... Yes, it really looks like a five-person show =) I think most work in Phobos now it's done by Andrei, there are other *collaborators* (the four other you named plus people sending patches), but it looks like Andrei's show to me. This is not necessarily bad, it's definitely better than before, when it was Walter's show, now at least he can dedicate his efforts in the compiler and language and Phobos is having a lot more attention.

We'll be very happy to integrate credited contributions from anyone, and to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate. Andrei
Apr 10 2009
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 10 de abril a las 16:49 me escribiste:
And Braddr just made a documentation fix, and Walter only commits
portability stuff and an occasional bug fix now and then, so...
Yes, it really looks like a five-person show =)
I think most work in Phobos now it's done by Andrei, there are other
*collaborators* (the four other you named plus people sending patches), but
it looks like Andrei's show to me. This is not necessarily bad, it's
definitely  better than before, when it was Walter's show, now at least he
can dedicate his efforts in the compiler and language and Phobos is having
a lot more attention.

We'll be very happy to integrate credited contributions from anyone, and to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate.

I hope I can come up with something useful with my thesis (improving D's GC) and I can contribute that. Right now all my energies are focused on that, and I'm very close to the point to finally start playing with alternate implementations. BTW, is there any real interest in adding some more power to the GC implementator to allow some kind of moving or generational collector? Here are some good starting points on how to allow better GC support in D: http://d.puremagic.com/issues/show_bug.cgi?id=679 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=35426 Anyway, if you are interested in my progress, I have a blog[1] where I write almost everything I do related to the subject. The blog it's in Planet D, but Planet D seems to be broken =/ [1] http://proj.llucax.com.ar/blog/dgc/blog -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- - Tata Dios lo creó a usté solamente pa despertar al pueblo y fecundar las gayinas. - Otro constrasentido divino... Quieren que yo salga de joda con las hembras y después quieren que madrugue. -- Inodoro Pereyra y un gallo
Apr 10 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 BTW, is there any real interest in adding some more power to the GC
 implementator to allow some kind of moving or generational collector?

That would be awesome!
 Here are some good starting points on how to allow better GC support in D:
 http://d.puremagic.com/issues/show_bug.cgi?id=679
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=35426
 
 Anyway, if you are interested in my progress, I have a blog[1] where
 I write almost everything I do related to the subject. The blog it's in
 Planet D, but Planet D seems to be broken =/
 
 [1] http://proj.llucax.com.ar/blog/dgc/blog
 

Great, I'll follow. Speaking of readings, here's a paper that I found intriguing. Maybe it could provide inspiration: GC assertions: Using the Garbage Collector to Check Heap Properties (I'd send a link but I can't open the browser; I'm ATM on a Windows machine that has a problem. Long story.) Andrei
Apr 10 2009
parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 10 de abril a las 20:51 me escribiste:
 Leandro Lucarella wrote:
BTW, is there any real interest in adding some more power to the GC
implementator to allow some kind of moving or generational collector?

That would be awesome!
Here are some good starting points on how to allow better GC support in D:
http://d.puremagic.com/issues/show_bug.cgi?id=679
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=35426
Anyway, if you are interested in my progress, I have a blog[1] where
I write almost everything I do related to the subject. The blog it's in
Planet D, but Planet D seems to be broken =/
[1] http://proj.llucax.com.ar/blog/dgc/blog

Great, I'll follow. Speaking of readings, here's a paper that I found intriguing. Maybe it could provide inspiration: GC assertions: Using the Garbage Collector to Check Heap Properties (I'd send a link but I can't open the browser; I'm ATM on a Windows machine that has a problem. Long story.)

That looks interesting but it's out of the scope of my thesis. Thanks anyways. BTW, here is the link to the paper: http://www.eecs.tufts.edu/~eaftan/gcassertions-mspc-2008.pdf -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- careful to all animals (never washing spiders down the plughole), keep in contact with old friends (enjoy a drink now and then), will frequently check credit at (moral) bank (hole in the wall),
Apr 12 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Leandro Lucarella (llucax gmail.com)'s article
 Andrei Alexandrescu, el 10 de abril a las 16:49 me escribiste:
And Braddr just made a documentation fix, and Walter only commits
portability stuff and an occasional bug fix now and then, so...
Yes, it really looks like a five-person show =)
I think most work in Phobos now it's done by Andrei, there are other
*collaborators* (the four other you named plus people sending patches), but
it looks like Andrei's show to me. This is not necessarily bad, it's
definitely  better than before, when it was Walter's show, now at least he
can dedicate his efforts in the compiler and language and Phobos is having
a lot more attention.

We'll be very happy to integrate credited contributions from anyone, and to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate.

GC) and I can contribute that. Right now all my energies are focused on that, and I'm very close to the point to finally start playing with alternate implementations. BTW, is there any real interest in adding some more power to the GC implementator to allow some kind of moving or generational collector?

Absolutely. When writing parallel code to do large scale data mining in D, the lack of precision and multithreaded allocation are real killers. My interests are, in order of importance: 1. Being able to allocate at least small chunks of memory without locking. 2. Precise scanning of at least the heap. 3. Collection w/o stopping the world. 4. Moving GC so that allocations can be pointer bumps.
Apr 10 2009
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
dsimcha wrote:
 Absolutely.  When writing parallel code to do large scale data mining in D, the
 lack of precision and multithreaded allocation are real killers.  My interests
 are, in order of importance:
 
 1.  Being able to allocate at least small chunks of memory without locking.

My next big project for Druntime will be to write a GC with per-thread heaps, but I don't know when that will be. I've been pretty busy lately.
Apr 11 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
dsimcha, el 11 de abril a las 05:21 me escribiste:
 == Quote from Leandro Lucarella (llucax gmail.com)'s article
 Andrei Alexandrescu, el 10 de abril a las 16:49 me escribiste:
And Braddr just made a documentation fix, and Walter only commits
portability stuff and an occasional bug fix now and then, so...
Yes, it really looks like a five-person show =)
I think most work in Phobos now it's done by Andrei, there are other
*collaborators* (the four other you named plus people sending patches), but
it looks like Andrei's show to me. This is not necessarily bad, it's
definitely  better than before, when it was Walter's show, now at least he
can dedicate his efforts in the compiler and language and Phobos is having
a lot more attention.

We'll be very happy to integrate credited contributions from anyone, and to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate.

GC) and I can contribute that. Right now all my energies are focused on that, and I'm very close to the point to finally start playing with alternate implementations. BTW, is there any real interest in adding some more power to the GC implementator to allow some kind of moving or generational collector?

Absolutely. When writing parallel code to do large scale data mining in D, the lack of precision and multithreaded allocation are real killers. My interests are, in order of importance: 1. Being able to allocate at least small chunks of memory without locking. 2. Precise scanning of at least the heap. 3. Collection w/o stopping the world. 4. Moving GC so that allocations can be pointer bumps.

3. is my main goal right now. I think 1. can be done using thread-specific free lists/pools. 2. Is possible too, but bigger changes are needed, specially in the compiler side (1. and 3. can be completely done in the GC implementation). 4. is not 100% possible because we can never have a 100% precise GC, but can be very close if 2. is fixed =) Do you have example program that I can use for a benchmark suite? Thank you. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- He cometido pecados, he hecho el mal, he sido víctima de la envidia, el egoísmo, la ambición, la mentira y la frivolidad, pero siempre he sido un padre argentino que quiere que su hijo triunfe en la vida. -- Ricardo Vaporeso
Apr 12 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Leandro Lucarella wrote:
 dsimcha, el 11 de abril a las 05:21 me escribiste:
 == Quote from Leandro Lucarella (llucax gmail.com)'s article
 Andrei Alexandrescu, el 10 de abril a las 16:49 me escribiste:
 And Braddr just made a documentation fix, and Walter only commits
 portability stuff and an occasional bug fix now and then, so...
 Yes, it really looks like a five-person show =)
 I think most work in Phobos now it's done by Andrei, there are other
 *collaborators* (the four other you named plus people sending patches), but
 it looks like Andrei's show to me. This is not necessarily bad, it's
 definitely  better than before, when it was Walter's show, now at least he
 can dedicate his efforts in the compiler and language and Phobos is having
 a lot more attention.

to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate.

GC) and I can contribute that. Right now all my energies are focused on that, and I'm very close to the point to finally start playing with alternate implementations. BTW, is there any real interest in adding some more power to the GC implementator to allow some kind of moving or generational collector?

lack of precision and multithreaded allocation are real killers. My interests are, in order of importance: 1. Being able to allocate at least small chunks of memory without locking. 2. Precise scanning of at least the heap. 3. Collection w/o stopping the world. 4. Moving GC so that allocations can be pointer bumps.

3. is my main goal right now. I think 1. can be done using thread-specific free lists/pools. 2. Is possible too, but bigger changes are needed, specially in the compiler side (1. and 3. can be completely done in the GC implementation). 4. is not 100% possible because we can never have a 100% precise GC, but can be very close if 2. is fixed =)

You can create StackInfo similar to TypeInfo, I suppose, and thus get an entirely precise GC. Or you can pin anything that's referenced from the stack, and move anything that is only referenced from the heap.
Apr 12 2009
next sibling parent grauzone <none example.net> writes:
 You can create StackInfo similar to TypeInfo, I suppose, and thus get an 
 entirely precise GC.

What about the registers? It isn't that simple.
Apr 12 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Christopher Wright, el 12 de abril a las 17:54 me escribiste:
Absolutely.  When writing parallel code to do large scale data mining in D, the
lack of precision and multithreaded allocation are real killers.  My interests
are, in order of importance:

1.  Being able to allocate at least small chunks of memory without locking.
2.  Precise scanning of at least the heap.
3.  Collection w/o stopping the world.
4.  Moving GC so that allocations can be pointer bumps.

free lists/pools. 2. Is possible too, but bigger changes are needed, specially in the compiler side (1. and 3. can be completely done in the GC implementation). 4. is not 100% possible because we can never have a 100% precise GC, but can be very close if 2. is fixed =)

You can create StackInfo similar to TypeInfo, I suppose, and thus get an entirely precise GC.

Sure. This is a big (compiler) change, and you probably have to drop C compatibility (what would you do with C functions stacks frames without StackInfo? How do you know it a stack frame is from an "untyped" C function or a "typed" D one? Where do you search for that StackInfo?). But it's definitely possible in theory.
 Or you can pin anything that's referenced from the stack, and move
 anything that is only referenced from the heap.

That's more likely to happen, but it requires a compiler change too (provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump". What I mean is you can be as precise as you want, but as long as union and void[] is there, there always be "might be a pointer" fields, and cells pointed by that type of fields should not be moved, ever. So, even after a fresh collection, your heap can be still fragmented. You have to store information about the "holes" and take care of them. This can be very light too (in comparison with the actual allocation algorithm), but it can never be as simple as a "pointer bump" (as requested by David =). So technically, you'll always have to deal with memory fragmentation in D (I don't think anyone wants to drop unions and void[] =), and it's true that it can be minimized to almost nothing. But since it's technically possible, you can never get away from the extra complexity for managing those rare cases. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- EL "PITUFO ENRIQUE" LLEGO A LA BAILANTA -- Crónica TV
Apr 12 2009
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Leandro Lucarella wrote:
 Christopher Wright, el 12 de abril a las 17:54 me escribiste:
 Absolutely.  When writing parallel code to do large scale data mining in D, the
 lack of precision and multithreaded allocation are real killers.  My interests
 are, in order of importance:

 1.  Being able to allocate at least small chunks of memory without locking.
 2.  Precise scanning of at least the heap.
 3.  Collection w/o stopping the world.
 4.  Moving GC so that allocations can be pointer bumps.

free lists/pools. 2. Is possible too, but bigger changes are needed, specially in the compiler side (1. and 3. can be completely done in the GC implementation). 4. is not 100% possible because we can never have a 100% precise GC, but can be very close if 2. is fixed =)

entirely precise GC.

Sure. This is a big (compiler) change, and you probably have to drop C compatibility (what would you do with C functions stacks frames without StackInfo? How do you know it a stack frame is from an "untyped" C function or a "typed" D one? Where do you search for that StackInfo?). But it's definitely possible in theory.

Actually, it's not possible in D as it stands. Consider: union U { size_t i; void* p; } There's no way for the GC to know whether an instance of this type is storing a pointer or an integer that happens to look like a pointer. So unless we're dropping support for unions (and void[]s as they exist currently), any GC needs to support some things that may either be pointers or non-pointers, and (implicitly?) pin allocations accordingly. So stack frames not described by a StackInfo instance can just be considered to consist of data that may or not be pointers, just like the union above.
 Or you can pin anything that's referenced from the stack, and move
 anything that is only referenced from the heap.

That's more likely to happen, but it requires a compiler change too (provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

The compiler already passes a TypeInfo on allocations IIRC. And TypeInfo can produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)
 What I mean is you can be as precise as you want, but as long as union and
 void[] is there, there always be "might be a pointer" fields, and cells

Oh, I hadn't read that part yet when I started typing this post :)
 pointed by that type of fields should not be moved, ever. So, even after
 a fresh collection, your heap can be still fragmented. You have to store
 information about the "holes" and take care of them. This can be very
 light too (in comparison with the actual allocation algorithm), but it can
 never be as simple as a "pointer bump" (as requested by David =).

Well, it may technically be possible to move a heap object right before assignment to a union/void[] or passing to C if the compiler calls a library function before doing something like that. Then pinned objects could be allocated on a separate part of the heap that never gets moved (unless no more references in untyped memory are live, maybe?) and allocations could still be a pointer bump in the movable part of the heap. I have no idea how efficient this would be, however. My guess would be not very.
 So technically, you'll always have to deal with memory fragmentation in
 D (I don't think anyone wants to drop unions and void[] =), and it's true
 that it can be minimized to almost nothing. But since it's technically
 possible, you can never get away from the extra complexity for managing
 those rare cases.

Apr 13 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
Or you can pin anything that's referenced from the stack, and move
anything that is only referenced from the heap.

(provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

The compiler already passes a TypeInfo on allocations IIRC. And TypeInfo can produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)

Well, this is nice to know (even when it's not used yet, it's better than nothing). And how can the GC obtain this kind of information?
What I mean is you can be as precise as you want, but as long as union and
void[] is there, there always be "might be a pointer" fields, and cells

Oh, I hadn't read that part yet when I started typing this post :)

=)
pointed by that type of fields should not be moved, ever. So, even after
a fresh collection, your heap can be still fragmented. You have to store
information about the "holes" and take care of them. This can be very
light too (in comparison with the actual allocation algorithm), but it can
never be as simple as a "pointer bump" (as requested by David =).

Well, it may technically be possible to move a heap object right before assignment to a union/void[] or passing to C if the compiler calls a library function before doing something like that.

Yes, I guess it's technically possible, but again, it needs (AFAIK) non-trivial compiler changes.
 Then pinned objects could be allocated on a separate part of the heap
 that never gets moved (unless no more references in untyped memory are
 live, maybe?) and allocations could still be a pointer bump in the
 movable part of the heap.

Sure. And what about the non-movable part of the heap ;) You still have to manage that, you can't simply ignore it. That's what I meant with this:
So technically, you'll always have to deal with memory fragmentation in
D (I don't think anyone wants to drop unions and void[] =), and it's true
that it can be minimized to almost nothing. But since it's technically
possible, you can never get away from the extra complexity for managing
those rare cases.


[...]
 I have no idea how efficient this would be, however. My guess would be
 not very.

I'm not concerned about efficiency, I'm more concerned in non-trivial compiler changes. Anyway I think the important thing here is to at least get a precise heap (I would be nice if one could provide type information for the root set too, I guess). For me there's almost no difference between having a non-precise stack and unions/voids[] or having just non-precise unions/voids[]. You have to support non-movable objects anyway, and I guess the stack is small enough to be a non-problem in practice. I think the cost/benefits ratio of having a precise stack doesn't worth the trouble. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
Apr 13 2009
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
 Or you can pin anything that's referenced from the stack, and move
 anything that is only referenced from the heap.

(provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)

Well, this is nice to know (even when it's not used yet, it's better than nothing). And how can the GC obtain this kind of information?

Well, since the allocation routines should all get a TypeInfo reference from the compiler, the GC can store the typeinfo for each memory block somewhere, and later use it. It can then call ti->offTi() which should return an array of OffsetTypeInfo structs (see object.d[i]). The only caveat is that those array return values should be statically allocated; the GC probably won't like an allocation happening during collections...
 pointed by that type of fields should not be moved, ever. So, even after
 a fresh collection, your heap can be still fragmented. You have to store
 information about the "holes" and take care of them. This can be very
 light too (in comparison with the actual allocation algorithm), but it can
 never be as simple as a "pointer bump" (as requested by David =).

assignment to a union/void[] or passing to C if the compiler calls a library function before doing something like that.

Yes, I guess it's technically possible, but again, it needs (AFAIK) non-trivial compiler changes.

Well, the change to the compiler might not be that big. Detection of unions, void[]s and C calls should be pretty simple. The lib routine might be "a bit" more complicated... Though this all assumes the compiler first provides enough information about the stack & registers for a moving collector to be feasible -- which would probably be a much bigger task.
 Then pinned objects could be allocated on a separate part of the heap
 that never gets moved (unless no more references in untyped memory are
 live, maybe?) and allocations could still be a pointer bump in the
 movable part of the heap.

Sure. And what about the non-movable part of the heap ;) You still have to manage that, you can't simply ignore it. That's what I meant with this:
 So technically, you'll always have to deal with memory fragmentation in
 D (I don't think anyone wants to drop unions and void[] =), and it's true
 that it can be minimized to almost nothing. But since it's technically
 possible, you can never get away from the extra complexity for managing
 those rare cases.



Well yeah, you'll still have a non-movable part. Hopefully it'll be much smaller than the movable part though. And like I said, allocations can still be pointer bumps -- it's the assignments to unions, void[]s and C calls that suffer...
 [...]
 
 I have no idea how efficient this would be, however. My guess would be
 not very.

I'm not concerned about efficiency, I'm more concerned in non-trivial compiler changes.

Well, efficiency is important too. This has the potential to trigger what is effectively a marking of the entire heap (to find all references to an object that needs to be moved *now*) much more often than would otherwise happen. Like I said, my guess would be this isn't very efficient.
 Anyway I think the important thing here is to at least get a precise heap
 (I would be nice if one could provide type information for the root set
 too, I guess).
 
 For me there's almost no difference between having a non-precise stack and
 unions/voids[] or having just non-precise unions/voids[]. You have to
 support non-movable objects anyway, and I guess the stack is small enough
 to be a non-problem in practice. I think the cost/benefits ratio of having
 a precise stack doesn't worth the trouble.

A precise heap would certainly be a nice starting point, but adding precise stack and registers might be a nice improvement over it. Especially for things like the Tango allocating large stack buffers to avoid heap allocs. They're pointerless, but the GC doesn't know that... IIRC there have been some talks on the LLVM mailing list about how to emit stack and register maps, so at some point in the future LDC might actually support all that...
Apr 13 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
Frits van Bommel, el 13 de abril a las 19:36 me escribiste:
 Leandro Lucarella wrote:
Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
Or you can pin anything that's referenced from the stack, and move
anything that is only referenced from the heap.

(provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)

nothing). And how can the GC obtain this kind of information?

Well, since the allocation routines should all get a TypeInfo reference from the compiler, the GC can store the typeinfo for each memory block somewhere, and later use it. It can then call ti->offTi() which should return an array of OffsetTypeInfo structs (see object.d[i]). The only caveat is that those array return values should be statically allocated; the GC probably won't like an allocation happening during collections...

But right now gc_malloc() doesn't take any TypeInfo argument. I can't see where I can get the TypeInfo in the first place =/
I have no idea how efficient this would be, however. My guess would be
not very.

compiler changes.

Well, efficiency is important too.

Sure, and it's really hard to assume how efficient that could it be (you loose some efficiency in some cases but you probably gain a lot in other cases if most allocations are a pointer bump). What I meant is that I can test efficiency, to see if this is really viable or not, but it's very hard for me to change the compiler (and it's much harder that those changes would be accepted in "upstream", and one of my thesis goals is to make something useful, that can be easily adopted, not just an academic curiosity =).
Anyway I think the important thing here is to at least get a precise heap
(I would be nice if one could provide type information for the root set
too, I guess).
For me there's almost no difference between having a non-precise stack and
unions/voids[] or having just non-precise unions/voids[]. You have to
support non-movable objects anyway, and I guess the stack is small enough
to be a non-problem in practice. I think the cost/benefits ratio of having
a precise stack doesn't worth the trouble.

A precise heap would certainly be a nice starting point, but adding precise stack and registers might be a nice improvement over it. Especially for things like the Tango allocating large stack buffers to avoid heap allocs. They're pointerless, but the GC doesn't know that...

A call to gc_addRange() can be done to inform the GC, but of course it would be really nice if that's not necessary =)
 IIRC there have been some talks on the LLVM mailing list about how to
 emit stack and register maps, so at some point in the future LDC might
 actually support all that...

That's nice. But for now I prefer to target a more general solution (even when I'm using LDC for the project). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
Apr 13 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 19:36 me escribiste:
 Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
 Or you can pin anything that's referenced from the stack, and move
 anything that is only referenced from the heap.

(provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)

nothing). And how can the GC obtain this kind of information?

from the compiler, the GC can store the typeinfo for each memory block somewhere, and later use it. It can then call ti->offTi() which should return an array of OffsetTypeInfo structs (see object.d[i]). The only caveat is that those array return values should be statically allocated; the GC probably won't like an allocation happening during collections...

But right now gc_malloc() doesn't take any TypeInfo argument. I can't see where I can get the TypeInfo in the first place =/

The call would have to be modified. Right now the best you can do is pass BlkAttr.NO_SCAN. And storing a pointer per block could add a good bit of bookkeeping overhead for small objects, of course. Perhaps the TypeInfo array could be converted to a bitmap or some such.
Apr 13 2009
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 Leandro Lucarella wrote:
 But right now gc_malloc() doesn't take any TypeInfo argument. I can't see
 where I can get the TypeInfo in the first place =/

The call would have to be modified. Right now the best you can do is pass BlkAttr.NO_SCAN. And storing a pointer per block could add a good bit of bookkeeping overhead for small objects, of course. Perhaps the TypeInfo array could be converted to a bitmap or some such.

Let's see, you'd need 2 bits per pointer-sized block of bytes, to encode these possibilities: a) Yeah, this is a pointer b) Nope, not a pointer c) Maybe a pointer (union, void[]) c2) (optional) A (somehow) explicitly pinned pointer (treated identical to (c) for GC purposes; needs to be followed during marking, but data pointed to can't be moved) d) (optional, since we have a value left) This is a weak pointer I'd split these up as such: One bit to indicate that it can be read as a pointer (and should thus be followed when marking, for instance) and one to indicate it can be written as a pointer (so it can be moved for (a) or nulled for (d)). That gives us these values for the two-bit field: enum PtrBits { // Actual values JustData = 0b00, MaybePointer = 0b01, PinnedPointer = 0b01, WeakPointer = 0b10, Pointer = 0b11, // For '&' tests ReadableFlag = 0b01, WritableFlag = 0b10, } Like I said, this would cost 2 bits per pointer-sized chunk, so 1/16 of size for 32-bit systems and 1/32th of the memory block size for 64-bit systems. It'd have to be rounded up to a whole number of bytes of course, and possibly T.alignof if stored at the start of the block. (Storing it at the end of the block would avoid that) This could be bounded to one pointer worth of memory per block if the GC treats blocks > 16*4 = 64 bytes (on 32-bit systems) or > 32*8 = 256 bytes (on 64-bit systems) specially by just storing the raw TypeInfo reference instead of the bitfield for the memory block. (Implementer's choice on what to do for (size_t.sizeof-1)*4*size_t.sizeof to size_t.sizeof^2 * 4 bytes, where the bit-encoded data takes up the same number of bytes as a pointer would)
Apr 13 2009
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Robert Jacques wrote:
 On Mon, 13 Apr 2009 14:54:57 -0400, Frits van Bommel 
 <fvbommel remwovexcapss.nl> wrote:

 
 An alternative to this is to encode the information in ClassInfo and use 

It's already there. That's where TypeInfo for classes gets it from :).
 it instead. (You'd have to create a fake ClassInfo for structs and 
 arrays.) Then the GC only has to track the start of each object (i.e. 
 the beginning of a block in the current GC). The advantage is that this 
 has 0 storage requirements for objects and on average < 4 bytes for 
 structs and arrays (thanks to the coarse block sizes of the current GC).

(that'd be < 8 for a 64-bit machine?) An interesting idea. Indeed, since vtables for objects start with a ClassInfo reference, putting a ClassInfo* in front of non-object memory blocks should work, if ClassInfo could be generalized to support structs, unions, ints, floats, etc... Using D2 structs with a moving GC would need some extra bookkeeping data anyway, to work out things like their postblit call. This could be put in the ClassInfo or in the second slot of the fake vtable. (Without the fake classinfo, using a TypeInfo reference instead of the bitfield and putting it in there would work too) Arrays, by the way, would also need some special handling, since you can't return a variable-sized OffsetTypeInfo[] without allocating during collections. (As long as they fit in the limits for the bitfield, that could be repeated though -- as long as it's not an array of structs with postblits...) So maybe a .sizeof should somehow be included, and the offsets assumed to repeat after that? (as long as enough bytes are left for at least one more item) If we go the fake ClassInfo approach, the ClassInfo.init.length field could be used to store this size. Note that this would likely mean initializing unused parts of memory blocks to null[1], since the GC doesn't know how much of them is used and might get false pointers otherwise. All in all, maybe it'd be easier to just go the TypeInfo approach. The extra information needed to support non-class types is already conveniently available there (type sizes, postblits for D2) and they're already available for all types... Or maybe the ClassInfo in the vtable could be changed into a TypeInfo? :) [1]: At least for array blocks. Other blocks likely wouldn't have enough padding for extra elements -- unless the extra pointer for non-objects puts the size over the limit and the block size is doubled.
Apr 14 2009
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Robert Jacques wrote:
 On Tue, 14 Apr 2009 06:04:01 -0400, Frits van Bommel 
 <fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 it instead. (You'd have to create a fake ClassInfo for structs and 
 arrays.) Then the GC only has to track the start of each object (i.e. 
 the beginning of a block in the current GC). The advantage is that 
 this has 0 storage requirements for objects and on average < 4 bytes 
 for structs and arrays (thanks to the coarse block sizes of the 
 current GC).

(that'd be < 8 for a 64-bit machine?)

Yes. The key point it's a per item cost which decreases with item size, as opposed to a fixed 6.25% overhead when using a dense bitmask.

I already mentioned the bitmask overhead could be bounded to pointer-size by falling back to a TypeInfo-based solution for memory blocks where that overhead would otherwise exceed (or match) the size of a pointer.
 Using D2 structs with a moving GC would need some extra bookkeeping 
 data anyway, to work out things like their postblit call.

Postblit is only called when generating an actual copy. For example it is not called on assignment is the source is no longer used. So I don't see any reason why it should, or it would be expected that postblit would run when a struct was moved using the GC.

Oh, I didn't know that. (I haven't done much of anything with D2, I mostly stick to D1) I just presumed they were like C++ copy constructors. As an aside: I can certainly think of some places where it would be useful to have them get called whenever the address changes... (Though "move constructors" would be even better for most of those cases)
 Arrays, by the way, would also need some special handling, since you 
 can't return a variable-sized OffsetTypeInfo[] without allocating 
 during collections.
 (As long as they fit in the limits for the bitfield, that could be 
 repeated though -- as long as it's not an array of structs with 
 postblits...)
 So maybe a .sizeof should somehow be included, and the offsets assumed 
 to repeat after that? (as long as enough bytes are left for at least 
 one more item)

Actually, I'd assume there'd be an isArray flag in the Class/Type Info, which would cause the bitmask to be repeated until the end of the block.

You'd still need to know the size of the bitmask, to know after how many bits to repeat it. But like I said, a ClassInfo would encode the size of the type (as does TypeInfo), so any solution based on either of those should do the trick here.
Apr 14 2009
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Robert Jacques wrote:
 On Tue, 14 Apr 2009 09:27:09 -0400, Frits van Bommel 
 <fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 On Tue, 14 Apr 2009 06:04:01 -0400, Frits van Bommel 
 <fvbommel remwovexcapss.nl> wrote:
 Using D2 structs with a moving GC would need some extra bookkeeping 
 data anyway, to work out things like their postblit call.

it is not called on assignment is the source is no longer used. So I don't see any reason why it should, or it would be expected that postblit would run when a struct was moved using the GC.

Oh, I didn't know that. (I haven't done much of anything with D2, I mostly stick to D1) I just presumed they were like C++ copy constructors. As an aside: I can certainly think of some places where it would be useful to have them get called whenever the address changes... (Though "move constructors" would be even better for most of those cases)

Could you document this use case? (i.e. give some examples as I can't think of any)

Any situation in which structs register themselves somewhere for one reason or another. For example, I read that C++'s shared_ptr<> could be implemented by having the instances keep a doubly-linked list of themselves instead of using an extra heap allocation for the reference count. Such an implementation would need to update the pointers in neighboring nodes when moved, or insert itself before or after the original when copied. Note that shared_ptr<> is not only useful for memory resources, it could also be used to e.g. keep a file handle or socket open until all users are done with it (and not longer, as you might get with a GC'ed file class). Of course, in this case the more traditional approach with a heap-allocated reference (or even storing it in the Monitor structure each object has a pointer to) would be just as viable. But you could also implement weak references in a similar way, to let the GC find them in a linked list and allow them to be nulled when referred-to objects get collected. There are probably other use cases...
Apr 14 2009
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 19:36 me escribiste:
 Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
 Or you can pin anything that's referenced from the stack, and move
 anything that is only referenced from the heap.

(provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)

nothing). And how can the GC obtain this kind of information?

from the compiler, the GC can store the typeinfo for each memory block somewhere, and later use it. It can then call ti->offTi() which should return an array of OffsetTypeInfo structs (see object.d[i]). The only caveat is that those array return values should be statically allocated; the GC probably won't like an allocation happening during collections...

But right now gc_malloc() doesn't take any TypeInfo argument. I can't see where I can get the TypeInfo in the first place =/

Ah, you're right. But if you'll look at your nearest lifetime.d[1] you'll see that all the allocation routines called by the compiler *do* provide a TypeInfo, so apparently it's just not propagated to gc_*. So I guess the first thing to do would be to either (a) change the signature of gc_{malloc,calloc,extend}() or (b) add something like gc_settype(void*, TypeInfo)... [1]: Tango name, and presumably druntime as well; I think it's spread all over the place for Phobos 1.
 I have no idea how efficient this would be, however. My guess would be
 not very.

compiler changes.


Sure, and it's really hard to assume how efficient that could it be (you loose some efficiency in some cases but you probably gain a lot in other cases if most allocations are a pointer bump). What I meant is that I can test efficiency, to see if this is really viable or not, but it's very hard for me to change the compiler (and it's much harder that those changes would be accepted in "upstream", and one of my thesis goals is to make something useful, that can be easily adopted, not just an academic curiosity =).

Well, if it turns out to be a win, I'm sure we could put it into LDC. DMD would be up to Walter.
Apr 13 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Frits van Bommel, el 13 de abril a las 20:33 me escribiste:
But right now gc_malloc() doesn't take any TypeInfo argument. I can't see
where I can get the TypeInfo in the first place =/

Ah, you're right. But if you'll look at your nearest lifetime.d[1] you'll see that all the allocation routines called by the compiler *do* provide a TypeInfo, so apparently it's just not propagated to gc_*. So I guess the first thing to do would be to either (a) change the signature of gc_{malloc,calloc,extend}() or (b) add something like gc_settype(void*, TypeInfo)...

Ok, these are great news! I will certainly experiment with this change to achieve a more precise heap!
 [1]: Tango name, and presumably druntime as well; I think it's spread
 all over the place for Phobos 1.

Great, I will stick to Tango for now because: a) Is more likely to accept changes (because of the "stable D1" policy of Walter). b) I'm using LDC, and there is no Phobos support for LDC right now (I guess you know that ;) c) It's more likely to be (forward) compatible with druntime, and thus, D2. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
Apr 13 2009
prev sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2009-04-13 20:33:53 +0200, Frits van Bommel 
<fvbommel REMwOVExCAPSs.nl> said:

 Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 19:36 me escribiste:
 Leandro Lucarella wrote:
 Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
 Or you can pin anything that's referenced from the stack, and move
 anything that is only referenced from the heap.

(provide type information on allocation). Maybe I wasn't too clear, I didn't mean to say that a moving collector is impossible, what is impossible is to make allocation a "pointer bump".

TypeInfo can produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for user-defined aggregates, and LDC needs a compile-time #define to enable it (because it breaks linking the Tango runtime, IIRC). (For other types, this fact it returns null is a simple library issue)

nothing). And how can the GC obtain this kind of information?

from the compiler, the GC can store the typeinfo for each memory block somewhere, and later use it. It can then call ti->offTi() which should return an array of OffsetTypeInfo structs (see object.d[i]). The only caveat is that those array return values should be statically allocated;



 
 But right now gc_malloc() doesn't take any TypeInfo argument. I can't see
 where I can get the TypeInfo in the first place =/

Ah, you're right. But if you'll look at your nearest lifetime.d[1] you'll see that all the allocation routines called by the compiler *do* provide a TypeInfo, so apparently it's just not propagated to gc_*. So I guess the first thing to do would be to either (a) change the signature of gc_{malloc,calloc,extend}() or (b) add something like gc_settype(void*, TypeInfo)... [1]: Tango name, and presumably druntime as well; I think it's spread all over the place for Phobos 1.
 I have no idea how efficient this would be, however. My guess would be
 not very.

compiler changes.


Sure, and it's really hard to assume how efficient that could it be (you loose some efficiency in some cases but you probably gain a lot in other cases if most allocations are a pointer bump). What I meant is that I can test efficiency, to see if this is really viable or not, but it's very hard for me to change the compiler (and it's much harder that those changes would be accepted in "upstream", and one of my thesis goals is to make something useful, that can be easily adopted, not just an academic curiosity =).

Well, if it turns out to be a win, I'm sure we could put it into LDC. DMD would be up to Walter.

and tango will also for sure welcome a new gc implementation. Most of the issues, and how to modify to get the that were already discussed. Personally I like a blocked approach (i.e. flag+size), more than a full bitmap, in the future one can think of compiler clustering pointer types,... together to reduce the number of blocks. Subclassing means that you will always have some blocks, but it is still probably better than the bitmap, I don't like that at the moment typeinfo takes up so much space (at least the size of the type). To get all the info offTi aside (which are correct only on LDC as far as I know) tango.core.RuntimeTraits could be useful. add support for weak pointers (that at the moment are normally stored as non pointers), fvbommel had a place for them in its enum values at the moment the values in the registers are dumped, but not read back, either you change that, or all those values should be pinned (just as all union/maibe pointer) tango io uses void[] arrays to take advantage of the auto cast, but these are not pointers (and the gc knows this because at the moment the flag used for an array are the one used to allocate it the first time. during the collection you need to stop the threads (at least in the moving gc algorithms, and in the current mark an sweep). While the threads are stopped you have very stringent constraints, basically the same constraints as for a signal handler. You cannot call any non signal safe function, not even acquire posix locks. So try to do the least possible in that phase, and be very careful.
Apr 15 2009
parent Leandro Lucarella <llucax gmail.com> writes:
Fawzi Mohamed, el 15 de abril a las 14:57 me escribiste:
Well, if it turns out to be a win, I'm sure we could put it into LDC.
DMD would be up to Walter.

and tango will also for sure welcome a new gc implementation.

Well, right now I'm working on a minimal, naive, fully documented GC implementation, as an exercise mostly, but I think it can be great for educational / "documentational" purposes. I plan to submit it to Tango/druntime when it's done.
 Most of the issues, and how to modify to get the that were already discussed. 
 Personally I like a blocked approach (i.e. flag+size), more than a full
bitmap, 
 in the future one can think of compiler clustering pointer types,... together
to 
 reduce the number of blocks. Subclassing means that you will always have some 
 blocks, but it is still probably better than the bitmap, I don't like that at 
 the moment typeinfo takes up so much space (at least the size of the type).
 To get all the info offTi aside (which are correct only on LDC as far as I
know) 
 tango.core.RuntimeTraits could be useful.
 
 
 add support for weak pointers (that at the moment are normally stored as
 non pointers), fvbommel had a place for them in its enum values
 
 at the moment the values in the registers are dumped, but not read back,
 either you change that, or all those values should be pinned (just as
 all union/maibe pointer)
 
 tango io uses void[] arrays to take advantage of the auto cast, but
 these are not pointers (and the gc knows this because at the moment the
 flag used for an array are the one used to allocate it the first time.
 
 during the collection you need to stop the threads (at least in the
 moving gc algorithms, and in the current mark an sweep).  While the
 threads are stopped you have very stringent constraints, basically the
 same constraints as for a signal handler.
 You cannot call any non signal safe function, not even acquire posix locks.
 So try to do the least possible in that phase, and be very careful.

Thanks for all the suggestions, they are very useful. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
Apr 15 2009
prev sibling next sibling parent reply "Rioshin an'Harthen" <rharth75 hotmail.com> writes:
"Leandro Lucarella" <llucax gmail.com> kirjoitti viestissä 
news:20090411030416.GA22762 homero.springfield.home...
 BTW, is there any real interest in adding some more power to the GC
 implementator to allow some kind of moving or generational collector?

What I mostly want/need from the GC would be determinism. I want to be able to call delete on a subobject in the destructor of the object being deleted. How many times have I stumbled on this already?
Apr 10 2009
parent reply grauzone <none example.net> writes:
Rioshin an'Harthen wrote:
 "Leandro Lucarella" <llucax gmail.com> kirjoitti viestissä 
 news:20090411030416.GA22762 homero.springfield.home...
 BTW, is there any real interest in adding some more power to the GC
 implementator to allow some kind of moving or generational collector?

What I mostly want/need from the GC would be determinism. I want to be able to call delete on a subobject in the destructor of the object being deleted. How many times have I stumbled on this already?

Actually, this isn't needed: - if you want to manually free an object, you can add an extra destroy() method - when the object is garbage collected, there's no point in deleting referenced objects, because these are either still alive, or get collected as well
Apr 11 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from grauzone (none example.net)'s article
 Rioshin an'Harthen wrote:
 "Leandro Lucarella" <llucax gmail.com> kirjoitti viestissä
 news:20090411030416.GA22762 homero.springfield.home...
 BTW, is there any real interest in adding some more power to the GC
 implementator to allow some kind of moving or generational collector?

What I mostly want/need from the GC would be determinism. I want to be able to call delete on a subobject in the destructor of the object being deleted. How many times have I stumbled on this already?

- if you want to manually free an object, you can add an extra destroy() method - when the object is garbage collected, there's no point in deleting referenced objects, because these are either still alive, or get collected as well

In theory true, but in practice false. If you have a huge array owned by a small class, the huge array can be retained due to false pointers. Before I realized that it's illegal, I used to put delete statements in destructors in these kinds of situations and it seemed to work in practice even though it's illegal according to the spec, although I never tested it rigorously or really thought about how it could break.
Apr 11 2009
parent grauzone <none example.net> writes:
dsimcha wrote:
 == Quote from grauzone (none example.net)'s article
 Rioshin an'Harthen wrote:
 "Leandro Lucarella" <llucax gmail.com> kirjoitti viestissä
 news:20090411030416.GA22762 homero.springfield.home...
 BTW, is there any real interest in adding some more power to the GC
 implementator to allow some kind of moving or generational collector?

able to call delete on a subobject in the destructor of the object being deleted. How many times have I stumbled on this already?

- if you want to manually free an object, you can add an extra destroy() method - when the object is garbage collected, there's no point in deleting referenced objects, because these are either still alive, or get collected as well

In theory true, but in practice false. If you have a huge array owned by a small class, the huge array can be retained due to false pointers. Before I realized that it's illegal, I used to put delete statements in destructors in these kinds of situations and it seemed to work in practice even though it's illegal according to the spec, although I never tested it rigorously or really thought about how it could break.

Then you should simply use malloc.
Apr 11 2009
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Robert Jacques wrote:
 
 On that note, support for per thread GCs in general is another major 
 change. i.e.:
 GC.malloc!(T)();
 to
 Thread.getThis.gc.malloc!(T)(); // Alternatively use thread local storage.

It can remain as GC.malloc(). Exposing a GC handle via thread would allow one thread to use another thread's GC.
Apr 11 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Robert Jacques, el 11 de abril a las 01:05 me escribiste:
 On Fri, 10 Apr 2009 23:04:16 -0400, Leandro Lucarella <llucax gmail.com> wrote:
I hope I can come up with something useful with my thesis (improving D's
GC) and I can contribute that. Right now all my energies are focused on
that, and I'm very close to the point to finally start playing with
alternate implementations.

BTW, is there any real interest in adding some more power to the GC
implementator to allow some kind of moving or generational collector?

Yes.
Here are some good starting points on how to allow better GC support in D:
http://d.puremagic.com/issues/show_bug.cgi?id=679

I think this should be less a spec issue and more a library issue and core.memory seems to already have a BlkAttr.NO_MOVE, which covers memory pinning.

This is just a flag. You need extra information for knowing actually when to set that flag. And for that, you need some type information. A cell can be moved when you know everything pointing to it is an actual pointer, so you can safely overwrite it with the new location.
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=35426

Well, making the GC type aware/semi-precise (i.e. providing support for moving/copying collectors) seems like the most important change,

Exactly.
 The change to support concurrent GCs, effects both performance and code
 gen significantly. Also, if D's thread model supports thread-local
 heaps, the need for a concurrent GC is vastly reduced (its only
 a benefit to the shared heaps (mutable and immutable), while most
 objects would are on the thread-local heaps).

I think I'll target D1 for now. The reasons are: * Stability * Free compilers availability (you know what kind of free I'm talking about =) * Programs availability (I'm trying to gather programs to make a benchmark suite, without much success unfortunately, only Leonardo Maffi answered my request for examples[1], and what I need the most are *real* programs) So for know, I'm not considering anything of that. The only thing I'm vaguely considering is thread-specific heaps, to allow lock-free allocation. This has some disadvantages too, so it's low priority for me right now. [1] http://proj.llucax.com.ar/blog/dgc/blog/post/-1382f6a3 -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- FALTAN 325 DIAS PARA LA PRIMAVERA -- Crónica TV
Apr 12 2009
parent Leandro Lucarella <llucax gmail.com> writes:
Denis Koroskin, el 12 de abril a las 21:26 me escribiste:
I think I'll target D1 for now. The reasons are:
* Stability
* Free compilers availability (you know what kind of free I'm talking
  about =)
* Programs availability (I'm trying to gather programs to make a benchmark
  suite, without much success unfortunately, only Leonardo Maffi answered
  my request for examples[1], and what I need the most are *real* programs)

So for know, I'm not considering anything of that. The only thing I'm
vaguely considering is thread-specific heaps, to allow lock-free
allocation. This has some disadvantages too, so it's low priority for me
right now.

[1] http://proj.llucax.com.ar/blog/dgc/blog/post/-1382f6a3

With "thread-local by default" policy, D2 may be *much* more suitable for your research, so think twice.

I thought it more than twice ;-) That how I came up with the reasons stated above. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Y2K <Aztech_> hmm, nothing major has happend, what an anticlimax <CaPS> yeah <CaPS> really sucks <CaPS> I expected for Australia to sink into the sea or something <CaPS> but nnoooooooo
Apr 12 2009
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Leandro Lucarella wrote:
 Christopher Wright, el 10 de abril a las 16:18 me escribiste:
 BLS wrote:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

In case that you want something special, ask the tango folks. ( beside, logging is avail. there for quite a while) BjŲrn

and Don have committed to Phobos svn in the past two weeks. Random other people have donated code to it. Granted, Sean probably only concerns himself with druntime compatibility, and Don is probably mostly concerned with std.math and related modules.

And Braddr just made a documentation fix, and Walter only commits portability stuff and an occasional bug fix now and then, so... Yes, it really looks like a five-person show =) I think most work in Phobos now it's done by Andrei, there are other *collaborators* (the four other you named plus people sending patches), but it looks like Andrei's show to me. This is not necessarily bad, it's definitely better than before, when it was Walter's show, now at least he can dedicate his efforts in the compiler and language and Phobos is having a lot more attention.

to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate. Andrei

I think part of the problem (this is not a criticism, just a statement of fact, as I believe it to have overall been a good thing) is that you've evolved Phobos so fast lately that noone else can keep up with what the heck is going on. While you appear to have done a great job on the new Phobos and things appear to be settling down now, in the interim trying to figure out what was and wasn't going to be completely turned upside down by ranges and Phobos 2 made contributing small improvements and new features rather difficult. I've definitely worked on projects like this before, where I was the lead person and they were evolving faster than I could keep other people up to date, etc. This gap can be frustrating, but sometimes it's necessary to allow a project to evolve freely.
Apr 10 2009
parent Don <nospam nospam.com> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Leandro Lucarella wrote:
 Christopher Wright, el 10 de abril a las 16:18 me escribiste:
 BLS wrote:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

In case that you want something special, ask the tango folks. ( beside, logging is avail. there for quite a while) BjŲrn

and Don have committed to Phobos svn in the past two weeks. Random other people have donated code to it. Granted, Sean probably only concerns himself with druntime compatibility, and Don is probably mostly concerned with std.math and related modules.

portability stuff and an occasional bug fix now and then, so... Yes, it really looks like a five-person show =) I think most work in Phobos now it's done by Andrei, there are other *collaborators* (the four other you named plus people sending patches), but it looks like Andrei's show to me. This is not necessarily bad, it's definitely better than before, when it was Walter's show, now at least he can dedicate his efforts in the compiler and language and Phobos is having a lot more attention.

to give dsource.org write access to serious participants. What I think right now stands in the way of large participation to Phobos is that we all still learn the ropes of D2; the possibilities are dizzying and we haven't quite zeroed in on a particular style. Nonetheless, as it's been noticed I'm always summoning help from this group. So again, if you feel you want to contribute with ideas and/or code, don't hesitate. Andrei

I think part of the problem (this is not a criticism, just a statement of fact, as I believe it to have overall been a good thing) is that you've evolved Phobos so fast lately that noone else can keep up with what the heck is going on. While you appear to have done a great job on the new Phobos and things appear to be settling down now, in the interim trying to figure out what was and wasn't going to be completely turned upside down by ranges and Phobos 2 made contributing small improvements and new features rather difficult.

I think you're completely right here. There were a couple of compiler bugs which were preventing Andrei from checking his stuff in; that made it impossible for anyone else to do much. It's also worth noting that Janice Caron was a very active contributer to Phobos, before she suddenly disappeared.
 I've definitely worked on projects like this before, where I was the lead
person
 and they were evolving faster than I could keep other people up to date, etc.
 This gap can be frustrating, but sometimes it's necessary to allow a project to
 evolve freely.

Apr 10 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 13 Apr 2009 14:54:57 -0400, Frits van Bommel  
<fvbommel remwovexcapss.nl> wrote:

 Sean Kelly wrote:
 Leandro Lucarella wrote:
 But right now gc_malloc() doesn't take any TypeInfo argument. I can't  
 see
 where I can get the TypeInfo in the first place =/

pass BlkAttr.NO_SCAN. And storing a pointer per block could add a good bit of bookkeeping overhead for small objects, of course. Perhaps the TypeInfo array could be converted to a bitmap or some such.

Let's see, you'd need 2 bits per pointer-sized block of bytes, to encode these possibilities: a) Yeah, this is a pointer b) Nope, not a pointer c) Maybe a pointer (union, void[]) c2) (optional) A (somehow) explicitly pinned pointer (treated identical to (c) for GC purposes; needs to be followed during marking, but data pointed to can't be moved) d) (optional, since we have a value left) This is a weak pointer I'd split these up as such: One bit to indicate that it can be read as a pointer (and should thus be followed when marking, for instance) and one to indicate it can be written as a pointer (so it can be moved for (a) or nulled for (d)). That gives us these values for the two-bit field: enum PtrBits { // Actual values JustData = 0b00, MaybePointer = 0b01, PinnedPointer = 0b01, WeakPointer = 0b10, Pointer = 0b11, // For '&' tests ReadableFlag = 0b01, WritableFlag = 0b10, } Like I said, this would cost 2 bits per pointer-sized chunk, so 1/16 of size for 32-bit systems and 1/32th of the memory block size for 64-bit systems. It'd have to be rounded up to a whole number of bytes of course, and possibly T.alignof if stored at the start of the block. (Storing it at the end of the block would avoid that) This could be bounded to one pointer worth of memory per block if the GC treats blocks > 16*4 = 64 bytes (on 32-bit systems) or > 32*8 = 256 bytes (on 64-bit systems) specially by just storing the raw TypeInfo reference instead of the bitfield for the memory block. (Implementer's choice on what to do for (size_t.sizeof-1)*4*size_t.sizeof to size_t.sizeof^2 * 4 bytes, where the bit-encoded data takes up the same number of bytes as a pointer would)

An alternative to this is to encode the information in ClassInfo and use it instead. (You'd have to create a fake ClassInfo for structs and arrays.) Then the GC only has to track the start of each object (i.e. the beginning of a block in the current GC). The advantage is that this has 0 storage requirements for objects and on average < 4 bytes for structs and arrays (thanks to the coarse block sizes of the current GC).
Apr 13 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 14 Apr 2009 06:04:01 -0400, Frits van Bommel  
<fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 On Mon, 13 Apr 2009 14:54:57 -0400, Frits van Bommel  
 <fvbommel remwovexcapss.nl> wrote:

  An alternative to this is to encode the information in ClassInfo and  
 use

It's already there. That's where TypeInfo for classes gets it from :).
 it instead. (You'd have to create a fake ClassInfo for structs and  
 arrays.) Then the GC only has to track the start of each object (i.e.  
 the beginning of a block in the current GC). The advantage is that this  
 has 0 storage requirements for objects and on average < 4 bytes for  
 structs and arrays (thanks to the coarse block sizes of the current GC).

(that'd be < 8 for a 64-bit machine?)

Yes. The key point it's a per item cost which decreases with item size, as opposed to a fixed 6.25% overhead when using a dense bitmask.
 An interesting idea. Indeed, since vtables for objects start with a  
 ClassInfo reference, putting a ClassInfo* in front of non-object memory  
 blocks should work, if ClassInfo could be generalized to support  
 structs, unions, ints, floats, etc...

 Using D2 structs with a moving GC would need some extra bookkeeping data  
 anyway, to work out things like their postblit call.

Postblit is only called when generating an actual copy. For example it is not called on assignment is the source is no longer used. So I don't see any reason why it should, or it would be expected that postblit would run when a struct was moved using the GC.
 This could be put in the ClassInfo or in the second slot of the fake  
 vtable.
 (Without the fake classinfo, using a TypeInfo reference instead of the  
 bitfield and putting it in there would work too)


 Arrays, by the way, would also need some special handling, since you  
 can't return a variable-sized OffsetTypeInfo[] without allocating during  
 collections.
 (As long as they fit in the limits for the bitfield, that could be  
 repeated though -- as long as it's not an array of structs with  
 postblits...)
 So maybe a .sizeof should somehow be included, and the offsets assumed  
 to repeat after that? (as long as enough bytes are left for at least one  
 more item)

Actually, I'd assume there'd be an isArray flag in the Class/Type Info, which would cause the bitmask to be repeated until the end of the block.
Apr 14 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 14 Apr 2009 09:27:09 -0400, Frits van Bommel  
<fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 On Tue, 14 Apr 2009 06:04:01 -0400, Frits van Bommel  
 <fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 it instead. (You'd have to create a fake ClassInfo for structs and  
 arrays.) Then the GC only has to track the start of each object (i.e.  
 the beginning of a block in the current GC). The advantage is that  
 this has 0 storage requirements for objects and on average < 4 bytes  
 for structs and arrays (thanks to the coarse block sizes of the  
 current GC).

(that'd be < 8 for a 64-bit machine?)

size, as opposed to a fixed 6.25% overhead when using a dense bitmask.

I already mentioned the bitmask overhead could be bounded to pointer-size by falling back to a TypeInfo-based solution for memory blocks where that overhead would otherwise exceed (or match) the size of a pointer.

Sorry, I've been looking at non-frreelist based GCs where that optimization is not available. Also, there are some limitations associated with a variable length page header might be an issue. (i.e. a free page with 512B blocks can't be re-purposed as a page with 256B blocks.)
 Using D2 structs with a moving GC would need some extra bookkeeping  
 data anyway, to work out things like their postblit call.

is not called on assignment is the source is no longer used. So I don't see any reason why it should, or it would be expected that postblit would run when a struct was moved using the GC.

Oh, I didn't know that. (I haven't done much of anything with D2, I mostly stick to D1) I just presumed they were like C++ copy constructors. As an aside: I can certainly think of some places where it would be useful to have them get called whenever the address changes... (Though "move constructors" would be even better for most of those cases)

Could you document this use case? (i.e. give some examples as I can't think of any)
Apr 14 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 14 Apr 2009 11:34:05 -0400, Frits van Bommel  
<fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 On Tue, 14 Apr 2009 09:27:09 -0400, Frits van Bommel  
 <fvbommel remwovexcapss.nl> wrote:
 Robert Jacques wrote:
 On Tue, 14 Apr 2009 06:04:01 -0400, Frits van Bommel  
 <fvbommel remwovexcapss.nl> wrote:
 Using D2 structs with a moving GC would need some extra bookkeeping  
 data anyway, to work out things like their postblit call.

it is not called on assignment is the source is no longer used. So I don't see any reason why it should, or it would be expected that postblit would run when a struct was moved using the GC.

Oh, I didn't know that. (I haven't done much of anything with D2, I mostly stick to D1) I just presumed they were like C++ copy constructors. As an aside: I can certainly think of some places where it would be useful to have them get called whenever the address changes... (Though "move constructors" would be even better for most of those cases)

think of any)

Any situation in which structs register themselves somewhere for one reason or another. For example, I read that C++'s shared_ptr<> could be implemented by having the instances keep a doubly-linked list of themselves instead of using an extra heap allocation for the reference count. Such an implementation would need to update the pointers in neighboring nodes when moved, or insert itself before or after the original when copied.

Umm... aren't stack values not guaranteed to be stable? (i.e. isn't this like play Russian roulette with your optimizer?)
 Note that shared_ptr<> is not only useful for memory resources, it could  
 also be used to e.g. keep a file handle or socket open until all users  
 are done with it (and not longer, as you might get with a GC'ed file  
 class).
 Of course, in this case the more traditional approach with a  
 heap-allocated reference (or even storing it in the Monitor structure  
 each object has a pointer to) would be just as viable.
 But you could also implement weak references in a similar way, to let  
 the GC find them in a linked list and allow them to be nulled when  
 referred-to objects get collected.

Not all moving GCs or copy GCs for that matter, move every single object all the time, so this hack doesn't work. On the other hand, GC-User cache interaction would be nice (i.e. telling the GC about user free-lists, etc)
 There are probably other use cases...

Well, these are a use cases for a language level move operator, as it would allow for slightly better performance than a copy and dtor pair in some cases. (Sorry, I was thinking only about moving GCs when I posted, for which these aren't use cases)
Apr 14 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 10 Apr 2009 23:04:16 -0400, Leandro Lucarella <llucax gmail.com>  
wrote:
 I hope I can come up with something useful with my thesis (improving D's
 GC) and I can contribute that. Right now all my energies are focused on
 that, and I'm very close to the point to finally start playing with
 alternate implementations.

 BTW, is there any real interest in adding some more power to the GC
 implementator to allow some kind of moving or generational collector?

Yes.
 Here are some good starting points on how to allow better GC support in  
 D:
 http://d.puremagic.com/issues/show_bug.cgi?id=679

I think this should be less a spec issue and more a library issue and core.memory seems to already have a BlkAttr.NO_MOVE, which covers memory pinning.
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=35426

Well, making the GC type aware/semi-precise (i.e. providing support for moving/copying collectors) seems like the most important change, i.e. static void* malloc(size_t sz, uint ba = 0); to static void* malloc(T)(uint ba = 0); static void* malloc(T:T[])(uint ba = 0); The change to support concurrent GCs, effects both performance and code gen significantly. Also, if D's thread model supports thread-local heaps, the need for a concurrent GC is vastly reduced (its only a benefit to the shared heaps (mutable and immutable), while most objects would are on the thread-local heaps). On that note, support for per thread GCs in general is another major change. i.e.: GC.malloc!(T)(); to Thread.getThis.gc.malloc!(T)(); // Alternatively use thread local storage. Even without a locality guarantee, this allows for concurrent allocation and (I think) better D DLL behaviour since you don't end up with two separate heaps which don't know about each other.
 Anyway, if you are interested in my progress, I have a blog[1] where
 I write almost everything I do related to the subject. The blog it's in
 Planet D, but Planet D seems to be broken =/

 [1] http://proj.llucax.com.ar/blog/dgc/blog

P.S. Thanks for the blog. (I have been following it for a while now)
Apr 10 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 11 Apr 2009 01:21:04 -0400, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Leandro Lucarella (llucax gmail.com)'s article
 Andrei Alexandrescu, el 10 de abril a las 16:49 me escribiste:
And Braddr just made a documentation fix, and Walter only commits
portability stuff and an occasional bug fix now and then, so...
Yes, it really looks like a five-person show =)
I think most work in Phobos now it's done by Andrei, there are other
*collaborators* (the four other you named plus people sending  


it looks like Andrei's show to me. This is not necessarily bad, it's
definitely  better than before, when it was Walter's show, now at  


can dedicate his efforts in the compiler and language and Phobos is  


a lot more attention.

We'll be very happy to integrate credited contributions from anyone,

 to give dsource.org write access to serious participants. What I think
 right now stands in the way of large participation to Phobos is that  

 all still learn the ropes of D2; the possibilities are dizzying and we
 haven't quite zeroed in on a particular style. Nonetheless, as it's  

 noticed I'm always summoning help from this group. So again, if you  

 you want to contribute with ideas and/or code, don't hesitate.

GC) and I can contribute that. Right now all my energies are focused on that, and I'm very close to the point to finally start playing with alternate implementations. BTW, is there any real interest in adding some more power to the GC implementator to allow some kind of moving or generational collector?

Absolutely. When writing parallel code to do large scale data mining in D, the lack of precision and multithreaded allocation are real killers. My interests are, in order of importance: 1. Being able to allocate at least small chunks of memory without locking.

After reading Leandro's blog about the current GC, converting the free-lists to a lock-free data-structure would be a simple (i.e. library only) way to provide this. Another is to provide per thread heaps, which I realized this morning can also be done without changing the complier.
 2.  Precise scanning of at least the heap.
 3.  Collection w/o stopping the world.

*Sigh*. A concurrent GCs (which is what is generally meant by Collection w/o stopping the world) is actually the wrong choice for you. In data-mining you're generally concerned with throughput. A concurrent collector is used solely for gaining latency back, and does so by sacrificing throughput. i.e. the total time your program spends collecting is increased. A parallel collector is probably what you're looking for, since it decreases the total collection time (i.e. increases your throughput) (It also reduces the latency on multi-core systems, which is why you often see synergistic parallel-concurrent collectors) And if you really want to have your cake (low latency) and eat it too (high throughput) there are thread-local heaps.
 4.  Moving GC so that allocations can be pointer bumps.

Apr 11 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 11 Apr 2009 12:12:07 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 Robert Jacques wrote:
  On that note, support for per thread GCs in general is another major  
 change. i.e.:
 GC.malloc!(T)();
 to
 Thread.getThis.gc.malloc!(T)(); // Alternatively use thread local  
 storage.

It can remain as GC.malloc(). Exposing a GC handle via thread would allow one thread to use another thread's GC.

Yeah, after a night's sleep I realized that GC.malloc() could just wrap the underlying implementation. P.S. The post was meant to illustrate the under-the-hood changes (i.e. not public), but didn't come out that way.
Apr 11 2009
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 12 Apr 2009 21:13:09 +0400, Leandro Lucarella <llucax gmail.com> wrote:

 Robert Jacques, el 11 de abril a las 01:05 me escribiste:
 On Fri, 10 Apr 2009 23:04:16 -0400, Leandro Lucarella  
 <llucax gmail.com> wrote:
I hope I can come up with something useful with my thesis (improving  

GC) and I can contribute that. Right now all my energies are focused on
that, and I'm very close to the point to finally start playing with
alternate implementations.

BTW, is there any real interest in adding some more power to the GC
implementator to allow some kind of moving or generational collector?

Yes.
Here are some good starting points on how to allow better GC support  

http://d.puremagic.com/issues/show_bug.cgi?id=679

I think this should be less a spec issue and more a library issue and core.memory seems to already have a BlkAttr.NO_MOVE, which covers memory pinning.

This is just a flag. You need extra information for knowing actually when to set that flag. And for that, you need some type information. A cell can be moved when you know everything pointing to it is an actual pointer, so you can safely overwrite it with the new location.
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=35426

Well, making the GC type aware/semi-precise (i.e. providing support for moving/copying collectors) seems like the most important change,

Exactly.
 The change to support concurrent GCs, effects both performance and code
 gen significantly. Also, if D's thread model supports thread-local
 heaps, the need for a concurrent GC is vastly reduced (its only
 a benefit to the shared heaps (mutable and immutable), while most
 objects would are on the thread-local heaps).

I think I'll target D1 for now. The reasons are: * Stability * Free compilers availability (you know what kind of free I'm talking about =) * Programs availability (I'm trying to gather programs to make a benchmark suite, without much success unfortunately, only Leonardo Maffi answered my request for examples[1], and what I need the most are *real* programs) So for know, I'm not considering anything of that. The only thing I'm vaguely considering is thread-specific heaps, to allow lock-free allocation. This has some disadvantages too, so it's low priority for me right now. [1] http://proj.llucax.com.ar/blog/dgc/blog/post/-1382f6a3

With "thread-local by default" policy, D2 may be *much* more suitable for your research, so think twice.
Apr 12 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Zz wrote:
 Hi,
 
 Are there any plans for a logging library in Std Phobos 2.0?
 
 Zz

I wanted to add logging support for a while now but am undecided about the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great. Andrei
Apr 10 2009
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Fri, Apr 10, 2009 at 09:20:46AM -0700, Andrei Alexandrescu wrote:
 If anyone has ideas and/or code to contribute, that would be great.

I never understood why they should be complicated. Couldn't you just do something like (pseudocodeish): ====== enum LogLevel { Verbose, Warning, Error } FILE* logStream; LogLevel currentLevel // We need to open the log file ahead of time; this might be from command // line args in a real program. static this() { logStream = stderr; // or fopen("log", "wt"); or whatever currentLevel = LogLevel.Verbose; } static ~this() { fclose(logStream); } void log(LogLevel message, formatted message...) { if( currentLevel >= message) { logStream.writef("%s: ", currentTime() ); logStream.writefln(formatted message....); } } void fun() { log(LogLevel.Verbose, "Entering function %s", __FUNCTION__); if ( crap ) log(LogLevel.Error, "Crap happened!" } ======= Does it really need to be much more complex than that?
 
 Andrei

-- Adam D. Ruppe http://arsdnet.net
Apr 10 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 10 de abril a las 09:20 me escribiste:
 Zz wrote:
Hi,
Are there any plans for a logging library in Std Phobos 2.0?
Zz

I wanted to add logging support for a while now but am undecided about the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great.

I find Python API very convenient and flexible (I think it's inspired in another library but I just used Python's). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- La esperanza es una amiga que nos presta la ilusión.
Apr 10 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 10 Apr 2009 12:20:46 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Zz wrote:
 Hi,
  Are there any plans for a logging library in Std Phobos 2.0?
  Zz

I wanted to add logging support for a while now but am undecided about the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great.

Having experience with Tango's logger, here are the things I like about it: 1. lazy evaluation. This is key, because it removes the whole requirement in log4* which requires you to check if the logger is active before doing some expensive calculation. With lazy evaluation, you move the check into the log function. BTW, this is a *HUGE* potential win for macros (if they are ever implemented), since you can get rid of the lazy eval. See my post: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=12431 2. No heap activity. This is to keep the logger from bogging down the program with memory allocations. 3. Thread safe. Other than that, Tango's is pretty similar to log4* varieties. I think the general design of log4* libs is pretty well tested and solid, but using some nifty features of D that can't be had in other languages makes it even more useful. So I'd start with that design and see what can be improved. Similar to how you approached algorithms (start with stl, see what d features can be applied to it). -Steve
Apr 10 2009
prev sibling parent reply Frank Benoit <keinfarbton googlemail.com> writes:
Andrei Alexandrescu schrieb:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

I wanted to add logging support for a while now but am undecided about the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great. Andrei

Why not start with the one from tango? Why has everything to be different? If it really is not important, why do you have to make it different than tango? Every code that uses tango and phobos, or wants to support both has to reimplemnent an intermediate abstraction layer.
Apr 10 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Frank Benoit wrote:
 Andrei Alexandrescu schrieb:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great. Andrei

Why not start with the one from tango?

Because it's not my code and every discussion on licensing ends up confused. What we can do in Phobos is following e.g. the Log4J API, which as far as I understand Tango implements or at least draws inspiration from. But then by browsing this group a while ago I figured that Tango added a Trace module because some people deemed the logging API too complicated.
 Why has everything to be
 different?

Nobody said it has to be different.
 If it really is not important, why do you have to make it
 different than tango? Every code that uses tango and phobos, or wants to
 support both has to reimplemnent an intermediate abstraction layer.

Again, I don't *have* to make it different, but I can't *copy* it either. There are two other things to consider: (a) Phobos' logging can take advantage of D2 features; (b) Phobos' logging should be well integrated with the rest of itself, e.g. it may be odd to have one way to format things in stdio and an entirely different way in the log, or to have the logging infrastructure incompatible with the stream infrastructure. That all being said, I don't see a lot of point in making Phobos' logging 100% identical with Tango's. Phobos2 and Tango2 will be usable together, so there's no point in the duplication - if you want Tango's logging mechanism, you just use it. So there will be no point in "supporting both" because both can coexist. Andrei
Apr 11 2009
next sibling parent reply grauzone <none example.net> writes:
You make no sense. You can look at the Log4J API, but not at Tango's, 
because Phobos should take advantage of D2.0 features?

Yeah, I know, that's not exactly what you said, but come on.
Apr 11 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 You make no sense. You can look at the Log4J API, but not at Tango's, 
 because Phobos should take advantage of D2.0 features?
 
 Yeah, I know, that's not exactly what you said, but come on.

"Shut that cigarette!" "I'm not smoking and am not a smoker." "Yeah, I know, but come on." Andrei
Apr 11 2009
prev sibling parent reply Zz <Zz nowhere.com> writes:
Andrei Alexandrescu Wrote:

 Frank Benoit wrote:
 Andrei Alexandrescu schrieb:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great. Andrei

Why not start with the one from tango?

Because it's not my code and every discussion on licensing ends up confused. What we can do in Phobos is following e.g. the Log4J API, which as far as I understand Tango implements or at least draws inspiration from. But then by browsing this group a while ago I figured that Tango added a Trace module because some people deemed the logging API too complicated.
 Why has everything to be
 different?

Nobody said it has to be different.
 If it really is not important, why do you have to make it
 different than tango? Every code that uses tango and phobos, or wants to
 support both has to reimplemnent an intermediate abstraction layer.

Again, I don't *have* to make it different, but I can't *copy* it either. There are two other things to consider: (a) Phobos' logging can take advantage of D2 features; (b) Phobos' logging should be well integrated with the rest of itself, e.g. it may be odd to have one way to format things in stdio and an entirely different way in the log, or to have the logging infrastructure incompatible with the stream infrastructure. That all being said, I don't see a lot of point in making Phobos' logging 100% identical with Tango's. Phobos2 and Tango2 will be usable together, so there's no point in the duplication - if you want Tango's logging mechanism, you just use it. So there will be no point in "supporting both" because both can coexist. Andrei

It would be good to have one that makes use of D2's features have you looked at "simple-log", I'm not a java programmer but I do know some people who seem to like it more than Log4J. here is the link https://simple-log.dev.java.net/ Anyway whatever the API looks like one would be welcome. Zz
Apr 11 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Zz wrote:
 Andrei Alexandrescu Wrote:
 
 Frank Benoit wrote:
 Andrei Alexandrescu schrieb:
 Zz wrote:
 Hi,

 Are there any plans for a logging library in Std Phobos 2.0?

 Zz

the API to use. Log4J is quite popular but quite complicated. There are a number of simpler APIs out there but I couldn't figure out which is the best. If anyone has ideas and/or code to contribute, that would be great. Andrei


confused. What we can do in Phobos is following e.g. the Log4J API, which as far as I understand Tango implements or at least draws inspiration from. But then by browsing this group a while ago I figured that Tango added a Trace module because some people deemed the logging API too complicated.
 Why has everything to be
 different?

 If it really is not important, why do you have to make it
 different than tango? Every code that uses tango and phobos, or wants to
 support both has to reimplemnent an intermediate abstraction layer.

either. There are two other things to consider: (a) Phobos' logging can take advantage of D2 features; (b) Phobos' logging should be well integrated with the rest of itself, e.g. it may be odd to have one way to format things in stdio and an entirely different way in the log, or to have the logging infrastructure incompatible with the stream infrastructure. That all being said, I don't see a lot of point in making Phobos' logging 100% identical with Tango's. Phobos2 and Tango2 will be usable together, so there's no point in the duplication - if you want Tango's logging mechanism, you just use it. So there will be no point in "supporting both" because both can coexist. Andrei

It would be good to have one that makes use of D2's features have you looked at "simple-log", I'm not a java programmer but I do know some people who seem to like it more than Log4J. here is the link https://simple-log.dev.java.net/ Anyway whatever the API looks like one would be welcome. Zz

That looks interesting, thanks for the pointer. Andrei
Apr 11 2009
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
grauzone wrote:
 You can create StackInfo similar to TypeInfo, I suppose, and thus get
 an entirely precise GC.

What about the registers? It isn't that simple.

Not to mention the fun of stack slot reuse, register reuse, etc. The stack layout isn't a fixed entity for the entire lifetime of a function but can shift as execution flows through it. Later, Brad
Apr 12 2009
prev sibling parent Steve Teale <steve.teale britseyeview.com> writes:
Content-Type: text/plain

Zz Wrote:

 Hi,
 
 Are there any plans for a logging library in Std Phobos 2.0?
 
 Zz

I've done something simple like this - log4d. To build it you have to use a tweaked version of date.d and dateparse.d. I have zipped them up and they are attached.
Apr 13 2009