www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - compiler support added for precise GC

reply Walter Bright <newshound2 digitalmars.com> writes:
Just checked it in. Of course, it doesn't actually do precise GC, it is just 
thrown over the wall for the library devs who are itching to get started on it.

I added a getGCInfo() method to TypeInfo that returns an immutable(void)*. This 
pointer can be anything - a pointer to data, to code, whatever, that implements 
whatever the GC might need to do precise collections. The value is generated by 
the template GCInfo(T) in object.d.

Some observations:

1. if there are no pointers in the allocated data, the GCInfo(T) should be
null. 
This enables a fast static check with no indirection for this most common case.

2. closure memory is allocated by calling _d_allocmemory. For now, it should 
just use the old conservative mark/sweep. Later, I can add a GCInfo(T) for it.

3. Many types will follow similar patterns:

    ptr .. int .. ptr .. int

    ptr .. ptr

    int .. ptr

I suggest that specializations exist for these to avoid generating innumerable 
identical data structures or functions. In fact, if they are named with names
like:
    scanpipi()
    scanpp()
    scanip()

then the linker will automatically remove duplicates.

4. Stack scanning remains imprecise, and should use the usual conservative
method.

5. The "has pointers" bit array can, of course, be eliminated.

6. I suggest the GCInfo pointer be stored at the end of the allocated block, as 
then it won't affect the alignment of the allocated data.

Release the hounds!
Apr 15 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/15/2012 7:24 PM, Walter Bright wrote:
 3. Many types will follow similar patterns:

 ptr .. int .. ptr .. int

 ptr .. ptr

 int .. ptr

 I suggest that specializations exist for these to avoid generating innumerable
 identical data structures or functions. In fact, if they are named with names
like:
 scanpipi()
 scanpp()
 scanip()

 then the linker will automatically remove duplicates.

I realized after I posted this that the patterns: ptr .. int .. ptr .. int ptr .. int .. ptr ptr .. int .. ptr .. int .. int are the same as far as marking goes, and so should all generate the same function/data.
Apr 15 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/15/2012 7:29 PM, Walter Bright wrote:
 I realized after I posted this that the patterns:

 ptr .. int .. ptr .. int
 ptr .. int .. ptr
 ptr .. int .. ptr .. int .. int

 are the same as far as marking goes, and so should all generate the same
 function/data.

Another possibility is to just emit a bit mask: 0 no pointers 101 ptr .. int .. ptr 1011 ptr .. ptr .. int .. ptr
0xFFFF	a pointer to a function that does the marking

This has the advantage of almost never needing to resort to the indirect function call.
Apr 16 2012
next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 18:32, Walter Bright a écrit :
 On 4/15/2012 7:29 PM, Walter Bright wrote:
 I realized after I posted this that the patterns:

 ptr .. int .. ptr .. int
 ptr .. int .. ptr
 ptr .. int .. ptr .. int .. int

 are the same as far as marking goes, and so should all generate the same
 function/data.

Another possibility is to just emit a bit mask: 0 no pointers 101 ptr .. int .. ptr 1011 ptr .. ptr .. int .. ptr >0xFFFF a pointer to a function that does the marking This has the advantage of almost never needing to resort to the indirect function call.

As long as it fit in a pointer, I see no reason why it can't be done.
Apr 16 2012
prev sibling parent =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 16-04-2012 18:32, Walter Bright wrote:
 On 4/15/2012 7:29 PM, Walter Bright wrote:
 I realized after I posted this that the patterns:

 ptr .. int .. ptr .. int
 ptr .. int .. ptr
 ptr .. int .. ptr .. int .. int

 are the same as far as marking goes, and so should all generate the same
 function/data.

Another possibility is to just emit a bit mask:

This is, in fact, what most production GCs do (e.g. Mono's SGen). It's widely considered a good technique.
 0 no pointers
 101 ptr .. int .. ptr
 1011 ptr .. ptr .. int .. ptr
  >0xFFFF a pointer to a function that does the marking

 This has the advantage of almost never needing to resort to the indirect
 function call.

-- - Alex
Apr 16 2012
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-16 04:24, Walter Bright wrote:
 Just checked it in. Of course, it doesn't actually do precise GC, it is
 just thrown over the wall for the library devs who are itching to get
 started on it.

 I added a getGCInfo() method to TypeInfo that returns an
 immutable(void)*. This pointer can be anything - a pointer to data, to
 code, whatever, that implements whatever the GC might need to do precise
 collections. The value is generated by the template GCInfo(T) in object.d.

Cool, but why was "getMembers" removed? -- /Jacob Carlborg
Apr 15 2012
next sibling parent =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 16-04-2012 08:30, Jacob Carlborg wrote:
 On 2012-04-16 04:24, Walter Bright wrote:
 Just checked it in. Of course, it doesn't actually do precise GC, it is
 just thrown over the wall for the library devs who are itching to get
 started on it.

 I added a getGCInfo() method to TypeInfo that returns an
 immutable(void)*. This pointer can be anything - a pointer to data, to
 code, whatever, that implements whatever the GC might need to do precise
 collections. The value is generated by the template GCInfo(T) in
 object.d.

Cool, but why was "getMembers" removed?

I was wondering too. That seems rather arbitrary and unrelated to precise GC. -- - Alex
Apr 15 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/15/2012 11:30 PM, Jacob Carlborg wrote:
 Cool, but why was "getMembers" removed?

1. It was intended to be used for precise gc. 2. It was never implemented (always returned null). 3. Nobody used it or asked for it. 4. GCInfo seems to be a better design.
Apr 16 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-16 09:09, Walter Bright wrote:
 On 4/15/2012 11:30 PM, Jacob Carlborg wrote:
 Cool, but why was "getMembers" removed?

1. It was intended to be used for precise gc.

I thought it was a first step for runtime reflection.
 2. It was never implemented (always returned null).

I've noticed that.
 3. Nobody used it or asked for it.

I've heard many people ask for a way to get the members of a class using runtime reflection and I always pointed to getMembers. I always thought it was supposed to be getMembers but that it wasn't finished yet. There's a bug report about it: http://d.puremagic.com/issues/show_bug.cgi?id=2844
 4. GCInfo seems to be a better design.

I see. Makes sense, I just thought it was for a different use. -- /Jacob Carlborg
Apr 16 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 1:20 AM, Jacob Carlborg wrote:
 I thought it was a first step for runtime reflection.

The thing about runtime reflection is you only need it for a few classes, while the compiler is doomed to generate the info for all of them. Andrei suggested a better design, which was to use compile time reflection to generate runtime information, as a library routine, on an as-needed basis. In other words, it should be a library feature, not a language feature. I think that by making the precise GC information a library feature, not a compiler one, we may be reaching a tipping point where the language is powerful enough that we may not need to add ever more language features.
Apr 16 2012
parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-16 11:00, Walter Bright wrote:
 On 4/16/2012 1:20 AM, Jacob Carlborg wrote:
 I thought it was a first step for runtime reflection.

The thing about runtime reflection is you only need it for a few classes, while the compiler is doomed to generate the info for all of them. Andrei suggested a better design, which was to use compile time reflection to generate runtime information, as a library routine, on an as-needed basis.

If we can't relay on runtime reflection being there it's basically useless. It's like your idea that the GC shouldn't be optional. Then all library code needs to be written to work without the GC. It's the same thing with runtime reflection. -- /Jacob Carlborg
Apr 16 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 13:24, Jacob Carlborg a écrit :
 On 2012-04-16 11:00, Walter Bright wrote:
 On 4/16/2012 1:20 AM, Jacob Carlborg wrote:
 I thought it was a first step for runtime reflection.

The thing about runtime reflection is you only need it for a few classes, while the compiler is doomed to generate the info for all of them. Andrei suggested a better design, which was to use compile time reflection to generate runtime information, as a library routine, on an as-needed basis.

If we can't relay on runtime reflection being there it's basically useless. It's like your idea that the GC shouldn't be optional. Then all library code needs to be written to work without the GC. It's the same thing with runtime reflection.

This is a lib issue. phobos should provide a standard way to do compiletime -> runtime reflection so each lib doesn't need to provide its own way every time.
Apr 16 2012
next sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 16-04-2012 13:34, deadalnix wrote:
 Le 16/04/2012 13:24, Jacob Carlborg a écrit :
 On 2012-04-16 11:00, Walter Bright wrote:
 On 4/16/2012 1:20 AM, Jacob Carlborg wrote:
 I thought it was a first step for runtime reflection.

The thing about runtime reflection is you only need it for a few classes, while the compiler is doomed to generate the info for all of them. Andrei suggested a better design, which was to use compile time reflection to generate runtime information, as a library routine, on an as-needed basis.

If we can't relay on runtime reflection being there it's basically useless. It's like your idea that the GC shouldn't be optional. Then all library code needs to be written to work without the GC. It's the same thing with runtime reflection.

This is a lib issue. phobos should provide a standard way to do compiletime -> runtime reflection so each lib doesn't need to provide its own way every time.

I think you're misunderstanding. The point is that without built-in runtime reflection, reflection is only available for select classes that the programmer specifically asks to have RTTI for. This is useless. It doesn't enable discovery-based reflection at all, which is what makes runtime reflection in C#, Java, ... so useful. -- - Alex
Apr 16 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-04-16 18:37, Alex Rønne Petersen wrote:

 The point is that without built-in runtime reflection, reflection is
 only available for select classes that the programmer specifically asks
 to have RTTI for. This is useless. It doesn't enable discovery-based
 reflection at all, which is what makes runtime reflection in C#, Java,
 ... so useful.

What he said. -- /Jacob Carlborg
Apr 16 2012
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-16 13:34, deadalnix wrote:

 This is a lib issue. phobos should provide a standard way to do
 compiletime -> runtime reflection so each lib doesn't need to provide
 its own way every time.

Regardless of how the runtime reflection is generated, by a library or the compiler, it needs to be available to all types. -- /Jacob Carlborg
Apr 16 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)
Apr 16 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-16 18:53, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or
 the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

The standard example that comes back is serialization. That can be done without support for runtime reflection but not as good as with the support. As far as I know it's impossible to serialize through a base class reference without registering the subtype with the serializer, if runtime reflection isn't available. -- /Jacob Carlborg
Apr 16 2012
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 04/16/2012 08:42 PM, Jacob Carlborg wrote:
 On 2012-04-16 18:53, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or
 the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

The standard example that comes back is serialization. That can be done without support for runtime reflection but not as good as with the support. As far as I know it's impossible to serialize through a base class reference without registering the subtype with the serializer, if runtime reflection isn't available.

This could be fixed by introducing a simple language feature that allows a supertype to specify mixin templates that are 'inherited' by all subtypes. Eg: interface ISerializable { ubyte[] serialize(); super mixin Serialize; } class A: ISerializable { int x; int y; } class B: A{ A foo; nonserialized B cache; } Would be lowered to: interface ISerializable { ubyte[] serialize(); mixin Serialize; } class A: ISerializable { int x; int y; mixin Serialize; } class B: A{ A foo; nonserialized B cache; mixin Serialize; } This way, all subclasses automatically register themselves. I think this would be very useful in general. I often do this manually.
Apr 16 2012
parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-16 22:42, Timon Gehr wrote:

 This could be fixed by introducing a simple language feature that allows
 a supertype to specify mixin templates that are 'inherited' by all
 subtypes.

 Eg:

 interface ISerializable {
 ubyte[] serialize();

 super mixin Serialize;
 }

Then it won't be possible to serialize third party types if they don't implements ISerializable. In this case, this solution is no better then manually registering types. Actually it's worse, since I can manually register third party types. -- /Jacob Carlborg
Apr 16 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better then
manually
 registering types. Actually it's worse, since I can manually register third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?
Apr 16 2012
parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-17 08:33, Walter Bright wrote:
 On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better
 then manually
 registering types. Actually it's worse, since I can manually register
 third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?

Not all types are serializable of course. For those types you would have to register a function or similar. But most other types are possible to automatically serialize. I don't see a point in making them less serializable by requiring to implement an interface. I also think mostly one would want to serialize objects or an hierarchy of objects. -- /Jacob Carlborg
Apr 16 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 11:50 PM, Jacob Carlborg wrote:
 On 2012-04-17 08:33, Walter Bright wrote:
 On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better
 then manually
 registering types. Actually it's worse, since I can manually register
 third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?

Not all types are serializable of course.

How would you know if they are or aren't, when dynamically loading stuff?
 For those types you would have to
 register a function or similar. But most other types are possible to
 automatically serialize. I don't see a point in making them less serializable
by
 requiring to implement an interface. I also think mostly one would want to
 serialize objects or an hierarchy of objects.

Apr 17 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/17/2012 1:10 AM, Walter Bright wrote:
 On 4/16/2012 11:50 PM, Jacob Carlborg wrote:
 On 2012-04-17 08:33, Walter Bright wrote:
 On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better
 then manually
 registering types. Actually it's worse, since I can manually register
 third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?

Not all types are serializable of course.

How would you know if they are or aren't, when dynamically loading stuff?

Essentially, I'm concerned with a vast amount of data being generated for every type and inserted into the executables (which are already large). Even worse, I'm concerned that such a feature will not "just work" when one wants to serialize a class that wasn't designed to be serialized, and it'll come off as a crappy half-assed buggy misfeature.
Apr 17 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-17 10:13, Walter Bright wrote:
 On 4/17/2012 1:10 AM, Walter Bright wrote:
 How would you know if they are or aren't, when dynamically loading stuff?


I'm not entirely sure what you mean by "dynamically loading stuff". The only types you can create dynamically are objects: auto b = Object.factory("Bar"); D has a limited set of types and a couple of user definable types, like classes, structs and so on. So in my serialization library it's basically hard coded what's serializable or not. All primitive types, int, char, string and so on are serializable. All classes and structs are serializable as well. It essentially works like this. You always need to start with a static type, like this: struct Foo { ... } auto a = deserialize!(Foo)(data); From that I can figure out most of the things via compile time reflection. What's not possible is when the runtime type is different from the static type: class Base { ... } class Sub : Base { ... } Base b = new Sub; In the above code, when inspecting "b" using compile time reflection all information about "Sub" is gone. It's just a regular "Base".
 Essentially, I'm concerned with a vast amount of data being generated
 for every type and inserted into the executables (which are already
 large).

Most of the data is already available in the symbol table anyway. Instance methods, static methods and static variables. What's missing is basically instance variables and an easy way to access them.
 Even worse, I'm concerned that such a feature will not "just
 work" when one wants to serialize a class that wasn't designed to be
 serialized, and it'll come off as a crappy half-assed buggy misfeature.

Sure that can always happen. But as long as the data is there it's up to the serialization library how to use it. When serializing third party types not explicitly made for serializing it would be up to the user to know what he/she's doing. I mean, this is a system programming language just as you can cast away shared an similar. Feel free to use my serialization library and see what it can handle: https://github.com/jacob-carlborg/orange -- /Jacob Carlborg
Apr 17 2012
parent reply "Nick Sabalausky" <SeeWebsiteToContactMe semitwist.com> writes:
"Jacob Carlborg" <doob me.com> wrote in message 
news:jmjb2v$1d2k$1 digitalmars.com...
 What's not possible is when the runtime type is different from the static 
 type:

 class Base { ... }
 class Sub : Base { ... }

 Base b = new Sub;

 In the above code, when inspecting "b" using compile time reflection all 
 information about "Sub" is gone. It's just a regular "Base".

Can't you just query compile-time information to see what classes inherit from Base? Then you could just try downcasting to them.
Apr 17 2012
next sibling parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 18/04/2012 02:11, Nick Sabalausky wrote:
 Can't you just query compile-time information to see what classes inherit
 from Base? Then you could just try downcasting to them.

Last time I checked there was no way to do that (I doubt that will have changed). It's impossible to know all the subclasses until link-time, so you'll get different results depending on how you compile your project. The only way around this that I've found is to put a mixin in all subclasses to register them with the base class... That's a complete hack though. Example: ---- // a.d class Base { /* whatever magic to get derived classes */ } // b.d import a; class Derived : Base {} void main() {} ---- $ dmd -c a.d dmd does not know about b.d when compiling a.d. The only way this could work is to allow link-time code generation. -- Robert http://octarineparrot.com/
Apr 18 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-04-18 12:37, Michal Minich wrote:
 On Wednesday, 18 April 2012 at 10:07:59 UTC, Robert Clipsham wrote:
 // a.d
 class Base { /* whatever magic to get derived classes */ }
 // b.d
 import a;
 class Derived : Base {}
 void main() {}

to find derived classes at runtime: http://forum.dlang.org/thread/mailman.1052.1292505452.21107.digitalmars-d-learn puremagic.com#post-op.vns7nfjpvxi10f:40biotronic-pc.lan

But you can't get the fields of an object at runtime. Again, hence the need for proper runtime reflection. -- /Jacob Carlborg
Apr 18 2012
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-18 03:11, Nick Sabalausky wrote:
 Can't you just query compile-time information to see what classes inherit
 from Base? Then you could just try downcasting to them.

Is that possible? -- /Jacob Carlborg
Apr 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/18/2012 10:20 AM, Steven Schveighoffer wrote:
 On Wed, 18 Apr 2012 09:59:10 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2012-04-18 03:11, Nick Sabalausky wrote:
 Can't you just query compile-time information to see what classes inherit
 from Base? Then you could just try downcasting to them.

Is that possible?

No. Not from within a template that doesn't know about those classes (which presumably your serialization function is). -Steve

You can get a list of classes in the executable at runtime from: foreach (m; ModuleInfo) { if (m) //writefln("module %s, %d", m.name, m.localClasses.length); foreach (c; m.localClasses) { writefln("\tclass %s", c.name); } }
Apr 18 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/18/2012 11:53 AM, Steven Schveighoffer wrote:
 The question was, can you get this information at compile time.

You cannot get a list of all classes in the program at compile time. For the reason that the compiler doesn't know this information.
Apr 18 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-04-18 20:37, Walter Bright wrote:

 You can get a list of classes in the executable at runtime from:

 foreach (m; ModuleInfo)
 {
 if (m)
 //writefln("module %s, %d", m.name, m.localClasses.length);
 foreach (c; m.localClasses)
 {
 writefln("\tclass %s", c.name);
 }
 }

Yeah, and then we're back at the original case: we need runtime reflection. We need to be able to iterate the fields of a class and get/set the values of the fields. -- /Jacob Carlborg
Apr 18 2012
prev sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 17-04-2012 10:13, Walter Bright wrote:
 On 4/17/2012 1:10 AM, Walter Bright wrote:
 On 4/16/2012 11:50 PM, Jacob Carlborg wrote:
 On 2012-04-17 08:33, Walter Bright wrote:
 On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better
 then manually
 registering types. Actually it's worse, since I can manually register
 third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?

Not all types are serializable of course.

How would you know if they are or aren't, when dynamically loading stuff?

Essentially, I'm concerned with a vast amount of data being generated for every type and inserted into the executables (which are already large). Even worse, I'm concerned that such a feature will not "just work" when one wants to serialize a class that wasn't designed to be serialized, and it'll come off as a crappy half-assed buggy misfeature.

Would turning on global reflection data generation with a compiler switch be reasonable? Then when libraries query for reflection information, they just have to check whether it's actually there (and if not, most likely error). Anyway, I don't think reflection has the potential to be a misfeature; that's stretching it. Redundancy can certainly be incurred, but with a switch to enable reflection, it would only happen when the user actually wants it to. As for the just works factor - if the serialization library is designed well enough (which I'm confident Jacob's library either is can be made to be), it isn't really a problem. Serializing a graph of objects is pretty standard (see e.g. .NET binary serialization). Even better, the library could just error out (instead of producing garbage) when it discovers that it can't serialize a type in any sensible fashion. -- - Alex
Apr 17 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 17/04/2012 14:31, Alex Rønne Petersen a écrit :
 On 17-04-2012 10:13, Walter Bright wrote:
 On 4/17/2012 1:10 AM, Walter Bright wrote:
 On 4/16/2012 11:50 PM, Jacob Carlborg wrote:
 On 2012-04-17 08:33, Walter Bright wrote:
 On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they
 don't
 implements ISerializable. In this case, this solution is no better
 then manually
 registering types. Actually it's worse, since I can manually register
 third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?

Not all types are serializable of course.

How would you know if they are or aren't, when dynamically loading stuff?

Essentially, I'm concerned with a vast amount of data being generated for every type and inserted into the executables (which are already large). Even worse, I'm concerned that such a feature will not "just work" when one wants to serialize a class that wasn't designed to be serialized, and it'll come off as a crappy half-assed buggy misfeature.

Would turning on global reflection data generation with a compiler switch be reasonable? Then when libraries query for reflection information, they just have to check whether it's actually there (and if not, most likely error). Anyway, I don't think reflection has the potential to be a misfeature; that's stretching it. Redundancy can certainly be incurred, but with a switch to enable reflection, it would only happen when the user actually wants it to. As for the just works factor - if the serialization library is designed well enough (which I'm confident Jacob's library either is can be made to be), it isn't really a problem. Serializing a graph of objects is pretty standard (see e.g. .NET binary serialization). Even better, the library could just error out (instead of producing garbage) when it discovers that it can't serialize a type in any sensible fashion.

I don't see any need for runtime reflection here. If a type is serialized, at some point it have to be given to the lib, that can generate runtime information from compile time reflection capability. If the type is never serialized/deserialized, then runtime reflection is useless.
Apr 17 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 04/17/2012 04:44 PM, deadalnix wrote:
...
 I don't see any need for runtime reflection here.

 If a type is serialized, at some point it have to be given to the lib,
 that can generate runtime information from compile time reflection
 capability. If the type is never serialized/deserialized, then runtime
 reflection is useless.

Assigning an object to a base class reference loses the relevant compile time information. Object o = new A; auto s = serialize(o); // library cannot know about 'A'
Apr 17 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-04-17 16:44, deadalnix wrote:

 I don't see any need for runtime reflection here.

 If a type is serialized, at some point it have to be given to the lib

No it doesn't. Not if if you serialize through a base class reference. class Base { ... } class Base : Sub { ... } Base b = new Sub; serialize(b); The serializer will never no anything about "Sub". Of course it's possible to register "Sub" with the serializer but that's what I'm trying to avoid in the first place. -- /Jacob Carlborg
Apr 17 2012
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-18 13:06, Steven Schveighoffer wrote:
 On Tue, 17 Apr 2012 02:50:02 -0400, Jacob Carlborg <doob me.com> wrote:

 Can you explain why .NET types are not by default serializable? All of
 them have full runtime reflection AFAIK.

 I'm not using that as an argument, I'm genuinely interested.

I'm wondering that as well. -- /Jacob Carlborg
Apr 18 2012
parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 18/04/2012 17:31, Jacob Carlborg wrote:
 On 2012-04-18 13:06, Steven Schveighoffer wrote:
 On Tue, 17 Apr 2012 02:50:02 -0400, Jacob Carlborg <doob me.com> wrote:

 Can you explain why .NET types are not by default serializable? All of
 them have full runtime reflection AFAIK.

 I'm not using that as an argument, I'm genuinely interested.

I'm wondering that as well.

From my quick google I couldn't find a definitive answer, but: http://stackoverflow.com/questions/4408909/why-classes-are-not-serializable-by-default-in-net http://en.wikipedia.org/wiki/Serialization (reasons listed under Java) -- Robert http://octarineparrot.com/
Apr 18 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-04-18 19:36, Steven Schveighoffer wrote:
 On Wed, 18 Apr 2012 13:18:20 -0400, Robert Clipsham

 From my quick google I couldn't find a definitive answer, but:

 http://stackoverflow.com/questions/4408909/why-classes-are-not-serializable-by-default-in-net


 http://en.wikipedia.org/wiki/Serialization (reasons listed under Java)

I think those answer the question quite well. In summary, just because you *can* serialize a type doesn't mean you *should*, and the risks of serializing something that shouldn't be serialized trumps the convenience of not having to mark it. The latter part is really a subjective statement, but I would agree with it.

I suspected something like that.
 I bet part of the confusion comes from the fact that such attributes are
 named "Serializable", whereas pretty much anything is serializable. It
 should be something more along the lines of "AllowSerialization"

 -Steve

-- /Jacob Carlborg
Apr 18 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 17/04/2012 08:26, Jacob Carlborg a écrit :
 On 2012-04-16 22:42, Timon Gehr wrote:

 This could be fixed by introducing a simple language feature that allows
 a supertype to specify mixin templates that are 'inherited' by all
 subtypes.

 Eg:

 interface ISerializable {
 ubyte[] serialize();

 super mixin Serialize;
 }

Then it won't be possible to serialize third party types if they don't implements ISerializable. In this case, this solution is no better then manually registering types. Actually it's worse, since I can manually register third party types.

Any type that mixin Serialize would be serializable, which is pretty nice. Another way would be to inherit attribute. Then a runtime reflection script can retrieve all declaration with a given attribute.
Apr 17 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-04-17 10:50, deadalnix wrote:

 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better then
 manually registering types. Actually it's worse, since I can manually
 register third party types.

Any type that mixin Serialize would be serializable, which is pretty nice.

It's even nicer if that's not needed. Again, what I said above.
 Another way would be to inherit attribute. Then a runtime reflection
 script can retrieve all declaration with a given attribute.

-- /Jacob Carlborg
Apr 17 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 11:42 AM, Jacob Carlborg wrote:
 On 2012-04-16 18:53, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or
 the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

The standard example that comes back is serialization. That can be done without support for runtime reflection but not as good as with the support. As far as I know it's impossible to serialize through a base class reference without registering the subtype with the serializer, if runtime reflection isn't available.

Serialization does NOT need to deal with all types, only the types that are to be serialized.
Apr 16 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 04/16/2012 10:51 PM, Walter Bright wrote:
 On 4/16/2012 11:42 AM, Jacob Carlborg wrote:
 On 2012-04-16 18:53, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or
 the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

The standard example that comes back is serialization. That can be done without support for runtime reflection but not as good as with the support. As far as I know it's impossible to serialize through a base class reference without registering the subtype with the serializer, if runtime reflection isn't available.

Serialization does NOT need to deal with all types, only the types that are to be serialized.

That is certainly true. Full runtime reflection is a sufficient, but not a necessary solution. The issue to be solved is that it is inconvenient and possibly error prone to manually annotate all types that need specialized runtime type info. Eg. If a class subclasses a serializable class, then by default the subclass should be fully serializable without further action.
Apr 16 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-04-16 22:51, Walter Bright wrote:

 Serialization does NOT need to deal with all types, only the types that
 are to be serialized.

Again, if a third party forgets to mark a type for serialization, or runtime reflection I'm out of luck. In in that case what I already have works better. -- /Jacob Carlborg
Apr 16 2012
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 10:08 AM, H. S. Teoh wrote:
 On Mon, Apr 16, 2012 at 09:53:34AM -0700, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

Perhaps for dynamic loading of class objects?

Yes, and then the designer of those class objects will know which ones are to be discoverable.
Apr 16 2012
prev sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
Because you cannot control third party code.

Maybe I am mistaken and with std.variant together with compile-time 
reflection,
it is possible to cover most use cases that make sense, but I am not 100% 
sure of it.

--
Paulo

"Walter Bright"  wrote in message news:jmhir1$16nn$1 digitalmars.com...

On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a library or the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)
Apr 16 2012
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 09:09, Walter Bright a écrit :
 4. GCInfo seems to be a better design.

This is unrelated. In fact, GCInfo could use getMembers . Precise GC and reflection are brother.
Apr 16 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 11:08, deadalnix a écrit :
 Le 16/04/2012 09:09, Walter Bright a écrit :
 4. GCInfo seems to be a better design.

This is unrelated. In fact, GCInfo could use getMembers . Precise GC and reflection are brother.

Never mind, I thought you were talking about __traits, but you are talking about Runtime version of it. We do agree.
Apr 16 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 18 Apr 2012 14:37:11 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/18/2012 10:20 AM, Steven Schveighoffer wrote:
 On Wed, 18 Apr 2012 09:59:10 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2012-04-18 03:11, Nick Sabalausky wrote:
 Can't you just query compile-time information to see what classes  
 inherit
 from Base? Then you could just try downcasting to them.

Is that possible?

No. Not from within a template that doesn't know about those classes (which presumably your serialization function is). -Steve

You can get a list of classes in the executable at runtime from: foreach (m; ModuleInfo) { if (m) //writefln("module %s, %d", m.name, m.localClasses.length); foreach (c; m.localClasses) { writefln("\tclass %s", c.name); } }

The question was, can you get this information at compile time. -Steve
Apr 18 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Apr 16, 2012 at 09:53:34AM -0700, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
Regardless of how the runtime reflection is generated, by a library or the
compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

Perhaps for dynamic loading of class objects? T -- Not all rumours are as misleading as this one.
Apr 16 2012
prev sibling next sibling parent "Kapps" <opantm2+spam gmail.com> writes:
On Monday, 16 April 2012 at 16:53:53 UTC, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a 
 library or the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

Because you can't take into consideration every possible use for your code. You have to be liberal, because even ONE class forgetting to enable reflection will completely break someone's entire system. For example, you can't serialize just because ONE random collection forgot to enable reflection. As it is right now, the vast majority of the standard library would have to have reflection enabled because the use case is unknown. This is how it would be for most libraries as well. You don't know where your code will be used when writing a library, so you have to play it safe. Opt out is MUCH better than opt in. People who want reflection understand the costs associated with it. People who want reflection build programs where these costs are acceptable.
Apr 16 2012
prev sibling next sibling parent "Francois Chabot" <francois.chabot.dev gmail.com> writes:
On Tuesday, 17 April 2012 at 05:34:17 UTC, Kapps wrote:
 On Monday, 16 April 2012 at 16:53:53 UTC, Walter Bright wrote:
 On 4/16/2012 9:40 AM, Jacob Carlborg wrote:
 Regardless of how the runtime reflection is generated, by a 
 library or the
 compiler, it needs to be available to all types.

Why? (I can see the point for a dynamic language, but not a static one.)

Because you can't take into consideration every possible use for your code. You have to be liberal, because even ONE class forgetting to enable reflection will completely break someone's entire system. For example, you can't serialize just because ONE random collection forgot to enable reflection. As it is right now, the vast majority of the standard library would have to have reflection enabled because the use case is unknown. This is how it would be for most libraries as well. You don't know where your code will be used when writing a library, so you have to play it safe. Opt out is MUCH better than opt in. People who want reflection understand the costs associated with it. People who want reflection build programs where these costs are acceptable.

That's going WAY overboard. An argument can be made about subclass discovery, but it's pretty darn easy to make every aggregated type of a given reflected class be opted-in automatically and recursively.
Apr 16 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Monday, 16 April 2012 at 20:42:19 UTC, Timon Gehr wrote:

 This could be fixed by introducing a simple language feature 
 that allows a supertype to specify mixin templates that are 
 'inherited' by all subtypes.

Similar proposals have been made in the past (http://www.digitalmars.com/d/archives/digitalmars/D/Defining_some_stuff_for_each_class in_turn_97203.html) but failed to get through. BTW, 'alias this' complicates things since most features that are valid for subclassing are now expected to be valid for 'alias this' subtyping as well.
Apr 17 2012
prev sibling next sibling parent "Michal Minich" <michal.minich gmail.com> writes:
On Wednesday, 18 April 2012 at 10:07:59 UTC, Robert Clipsham 
wrote:
 // a.d
 class Base { /* whatever magic to get derived classes */ }
 // b.d
 import a;
 class Derived : Base {}
 void main() {}

to find derived classes at runtime: http://forum.dlang.org/thread/mailman.1052.1292505452.21107.digitalmars-d-learn puremagic.com#post-op.vns7nfjpvxi10f:40biotronic-pc.lan
Apr 18 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 17 Apr 2012 02:50:02 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2012-04-17 08:33, Walter Bright wrote:
 On 4/16/2012 11:26 PM, Jacob Carlborg wrote:
 Then it won't be possible to serialize third party types if they don't
 implements ISerializable. In this case, this solution is no better
 then manually
 registering types. Actually it's worse, since I can manually register
 third
 party types.

I'm not so sure in D that you can serialize arbitrary types without them designed to be serializable. For example, what will you do with unions? Pointers to global data?

Not all types are serializable of course. For those types you would have to register a function or similar. But most other types are possible to automatically serialize. I don't see a point in making them less serializable by requiring to implement an interface. I also think mostly one would want to serialize objects or an hierarchy of objects.

Can you explain why .NET types are not by default serializable? All of them have full runtime reflection AFAIK. I'm not using that as an argument, I'm genuinely interested. -Steve
Apr 18 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 18 Apr 2012 09:59:10 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2012-04-18 03:11, Nick Sabalausky wrote:
 Can't you just query compile-time information to see what classes  
 inherit
 from Base? Then you could just try downcasting to them.

Is that possible?

No. Not from within a template that doesn't know about those classes (which presumably your serialization function is). -Steve
Apr 18 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 18 Apr 2012 13:18:20 -0400, Robert Clipsham  
<robert octarineparrot.com> wrote:

 On 18/04/2012 17:31, Jacob Carlborg wrote:
 On 2012-04-18 13:06, Steven Schveighoffer wrote:
 On Tue, 17 Apr 2012 02:50:02 -0400, Jacob Carlborg <doob me.com> wrote:

 Can you explain why .NET types are not by default serializable? All of
 them have full runtime reflection AFAIK.

 I'm not using that as an argument, I'm genuinely interested.

I'm wondering that as well.

From my quick google I couldn't find a definitive answer, but: http://stackoverflow.com/questions/4408909/why-classes-are-not-serializable-by-default-in-net http://en.wikipedia.org/wiki/Serialization (reasons listed under Java)

I think those answer the question quite well. In summary, just because you *can* serialize a type doesn't mean you *should*, and the risks of serializing something that shouldn't be serialized trumps the convenience of not having to mark it. The latter part is really a subjective statement, but I would agree with it. I bet part of the confusion comes from the fact that such attributes are named "Serializable", whereas pretty much anything is serializable. It should be something more along the lines of "AllowSerialization" -Steve
Apr 18 2012
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 04:24, Walter Bright a écrit :
 Just checked it in. Of course, it doesn't actually do precise GC, it is
 just thrown over the wall for the library devs who are itching to get
 started on it.

 I added a getGCInfo() method to TypeInfo that returns an
 immutable(void)*. This pointer can be anything - a pointer to data, to
 code, whatever, that implements whatever the GC might need to do precise
 collections. The value is generated by the template GCInfo(T) in object.d.

Having this template into object.d seems problematic to me. It is now quite hard to provide any custom GC implementation without messing with Druntime. Providing a user created GC should be as easy as possible.
 Some observations:

 1. if there are no pointers in the allocated data, the GCInfo(T) should
 be null. This enables a fast static check with no indirection for this
 most common case.

 2. closure memory is allocated by calling _d_allocmemory. For now, it
 should just use the old conservative mark/sweep. Later, I can add a
 GCInfo(T) for it.

 3. Many types will follow similar patterns:

 ptr .. int .. ptr .. int

 ptr .. ptr

 int .. ptr

 I suggest that specializations exist for these to avoid generating
 innumerable identical data structures or functions. In fact, if they are
 named with names like:
 scanpipi()
 scanpp()
 scanip()

 then the linker will automatically remove duplicates.

I think this is again solving an implementation issue by a language design decision. Ultimately the useless code bloat must be handled by the toolchain anyway.
 4. Stack scanning remains imprecise, and should use the usual
 conservative method.

 5. The "has pointers" bit array can, of course, be eliminated.

 6. I suggest the GCInfo pointer be stored at the end of the allocated
 block, as then it won't affect the alignment of the allocated data.

This very swap unfriendly. Many pages will have to be unswapped/swapped back in the marking process, even if it is 100% useless for data that doesn't contains pointers.
Apr 16 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 2:05 AM, deadalnix wrote:
 Having this template into object.d seems problematic to me. It is now quite
hard
 to provide any custom GC implementation without messing with Druntime.

 Providing a user created GC should be as easy as possible.

It's never going to be easy for anyone to just write their own GC, especially one that works better than one a lot of people have spent a lot of time on.
 I think this is again solving an implementation issue by a language design
 decision. Ultimately the useless code bloat must be handled by the toolchain
 anyway.

We gotta work with what we have.
 6. I suggest the GCInfo pointer be stored at the end of the allocated
 block, as then it won't affect the alignment of the allocated data.

This very swap unfriendly. Many pages will have to be unswapped/swapped back in the marking process, even if it is 100% useless for data that doesn't contains pointers.

I think there's a misunderstanding. The GC allocates by powers of 2. So a 22 byte struct will be allocated in a 32 byte block, and the GCInfo pointer can go at the end of that. That will not cause swapping. As for data that has no pointers, something has to indicate that. Of course, another strategy is to allocate such data in separate pools. In fact, that might be an excellent idea, as such pools would never have to be read (i.e. swapped in, loaded into cache) during the mark/sweep process.
Apr 16 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 11:25, Walter Bright a écrit :
 On 4/16/2012 2:05 AM, deadalnix wrote:
 Having this template into object.d seems problematic to me. It is now
 quite hard
 to provide any custom GC implementation without messing with Druntime.

 Providing a user created GC should be as easy as possible.

It's never going to be easy for anyone to just write their own GC, especially one that works better than one a lot of people have spent a lot of time on.

I don't think this is easy. But Different GC have different impact on the program. For instance, oracle's JVM provide you 4 different GC, that you can choose with different configuration parameters. Some are made to achieve maximum throughput, others to achieve low pause. None is better that the other, it depends on the application. For a user interface, you'll chose the low pause GC, but for a backend processing application, you may prefer the maximum throughput.
 This very swap unfriendly. Many pages will have to be
 unswapped/swapped back in
 the marking process, even if it is 100% useless for data that doesn't
 contains
 pointers.

I think there's a misunderstanding. The GC allocates by powers of 2. So a 22 byte struct will be allocated in a 32 byte block, and the GCInfo pointer can go at the end of that. That will not cause swapping. As for data that has no pointers, something has to indicate that. Of course, another strategy is to allocate such data in separate pools. In fact, that might be an excellent idea, as such pools would never have to be read (i.e. swapped in, loaded into cache) during the mark/sweep process.

That is exactly what I meant. Metadata about the block shouldn't be stored anywhere near the block, because it will behave horribly wrong when swap come into play. Metadata must be read and written when GC does its job, but the block itself doesn't require it.
Apr 16 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 4:32 AM, deadalnix wrote:
 Le 16/04/2012 11:25, Walter Bright a écrit :
 On 4/16/2012 2:05 AM, deadalnix wrote:
 Having this template into object.d seems problematic to me. It is now
 quite hard
 to provide any custom GC implementation without messing with Druntime.

 Providing a user created GC should be as easy as possible.

It's never going to be easy for anyone to just write their own GC, especially one that works better than one a lot of people have spent a lot of time on.

I don't think this is easy. But Different GC have different impact on the program. For instance, oracle's JVM provide you 4 different GC, that you can choose with different configuration parameters.

Those are not user created GCs.
 That is exactly what I meant. Metadata about the block shouldn't be stored
 anywhere near the block, because it will behave horribly wrong when swap come
 into play. Metadata must be read and written when GC does its job, but the
block
 itself doesn't require it.

I think the point is that it is not up to the compiler how this is done, but to the GC implementer.
Apr 16 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 18:37, Walter Bright a écrit :
 On 4/16/2012 4:32 AM, deadalnix wrote:
 Le 16/04/2012 11:25, Walter Bright a écrit :
 On 4/16/2012 2:05 AM, deadalnix wrote:
 Having this template into object.d seems problematic to me. It is now
 quite hard
 to provide any custom GC implementation without messing with Druntime.

 Providing a user created GC should be as easy as possible.

It's never going to be easy for anyone to just write their own GC, especially one that works better than one a lot of people have spent a lot of time on.

I don't think this is easy. But Different GC have different impact on the program. For instance, oracle's JVM provide you 4 different GC, that you can choose with different configuration parameters.

Those are not user created GCs.
 That is exactly what I meant. Metadata about the block shouldn't be
 stored
 anywhere near the block, because it will behave horribly wrong when
 swap come
 into play. Metadata must be read and written when GC does its job, but
 the block
 itself doesn't require it.

I think the point is that it is not up to the compiler how this is done, but to the GC implementer.

The point was that putting this into object.d isn't, IMO, the best option to provide such a mecanism.
Apr 16 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 10:00 AM, deadalnix wrote:
 The point was that putting this into object.d isn't, IMO, the best option to
 provide such a mecanism.

I don't know of a better way to do it, nor do I think it matters if it is in object.d or someothername.d.
Apr 16 2012
prev sibling next sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 16-04-2012 20:02, Sean Kelly wrote:
 On Apr 16, 2012, at 2:25 AM, Walter Bright wrote:

 On 4/16/2012 2:05 AM, deadalnix wrote:
 Having this template into object.d seems problematic to me. It is now quite
hard
 to provide any custom GC implementation without messing with Druntime.

 Providing a user created GC should be as easy as possible.

It's never going to be easy for anyone to just write their own GC, especially one that works better than one a lot of people have spent a lot of time on.

What I've been going for is to have all functionality that requires knowledge of code generation, (most) platform specifics, etc, live in the compiler runtime portion of Druntime (i.e. in the "rt" package). This is all stuff that the compiler writer knows by necessity, and the GC writer shouldn't be required to know it as well. As for pointer maps, I think it's reasonable to establish a format that these will be made available to the GC, and for them to come from elsewhere in the runtime. I realize that different GC implementations may prefer different formats, but hopefully we can settle on one that's pretty generally usable and efficient. I'd really rather avoid expecting GC writers to know how to meta-process D types to statically generate this themselves. Moving this into the GC would also eliminate the possibility of having the GC chosen at link-time, which is something that's currently still an option.

I think we can safely settle on bitmaps for this. Most real world GCs use this kind of strategy.
 I think this is again solving an implementation issue by a language design
 decision. Ultimately the useless code bloat must be handled by the toolchain
 anyway.

We gotta work with what we have.
 6. I suggest the GCInfo pointer be stored at the end of the allocated
 block, as then it won't affect the alignment of the allocated data.

This very swap unfriendly. Many pages will have to be unswapped/swapped back in the marking process, even if it is 100% useless for data that doesn't contains pointers.

I think there's a misunderstanding. The GC allocates by powers of 2. So a 22 byte struct will be allocated in a 32 byte block, and the GCInfo pointer can go at the end of that. That will not cause swapping. As for data that has no pointers, something has to indicate that. Of course, another strategy is to allocate such data in separate pools. In fact, that might be an excellent idea, as such pools would never have to be read (i.e. swapped in, loaded into cache) during the mark/sweep process.

This is obviously all for the current GC anyway. Another implementation may be better off storing things elsewhere.

-- - Alex
Apr 16 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/16/12 1:31 PM, Alex Rønne Petersen wrote:
 I think we can safely settle on bitmaps for this. Most real world GCs
 use this kind of strategy.

This is a valid argument, but I should say it's very much worth exploring techniques enabled by D specifically. Many unique and superior D idioms would not have existed if we played it safe and did things like everybody else did. Andrei
Apr 16 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 11:02 AM, Sean Kelly wrote:
 On Apr 16, 2012, at 2:25 AM, Walter Bright wrote:
 It's never going to be easy for anyone to just write their own GC,
 especially one that works better than one a lot of people have spent a lot
 of time on.

What I've been going for is to have all functionality that requires knowledge of code generation, (most) platform specifics, etc, live in the compiler runtime portion of Druntime (i.e. in the "rt" package). This is all stuff that the compiler writer knows by necessity, and the GC writer shouldn't be required to know it as well. As for pointer maps, I think it's reasonable to establish a format that these will be made available to the GC, and for them to come from elsewhere in the runtime. I realize that different GC implementations may prefer different formats, but hopefully we can settle on one that's pretty generally usable and efficient. I'd really rather avoid expecting GC writers to know how to meta-process D types to statically generate this themselves. Moving this into the GC would also eliminate the possibility of having the GC chosen at link-time, which is something that's currently still an option.

Either the compiler has to generate the marking stuff, meaning that no user designed GC is very practical, or it has to be generated at compile time with a template, where a user designed GC can experiment with a much wider range of possibilities without needing compiler modifications. Also, budding GC implementers will have an existing meta-processing code example which will go a long way towards helping get up to speed. Building a GC is a fairly advanced programming project. I don't think it's unreasonable to expect them to be pretty familiar with how D works. Switching GCs at link time will only be possible if the GCInfo data is identical between them.
Apr 16 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/16/12 3:49 PM, Walter Bright wrote:
 Either the compiler has to generate the marking stuff, meaning that no
 user designed GC is very practical, or it has to be generated at compile
 time with a template, where a user designed GC can experiment with a
 much wider range of possibilities without needing compiler modifications.

 Also, budding GC implementers will have an existing meta-processing code
 example which will go a long way towards helping get up to speed.

 Building a GC is a fairly advanced programming project. I don't think
 it's unreasonable to expect them to be pretty familiar with how D works.

Zat's da spirit! Andrei
Apr 16 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/16/12 1:02 PM, Sean Kelly wrote:
 As for pointer maps, I think it's reasonable to establish a format
 that these will be made available to the GC, and for them to come
 from elsewhere in the runtime.  I realize that different GC
 implementations may prefer different formats, but hopefully we can
 settle on one that's pretty generally usable and efficient.  I'd
 really rather avoid expecting GC writers to know how to meta-process
 D types to statically generate this themselves.  Moving this into the
 GC would also eliminate the possibility of having the GC chosen at
 link-time, which is something that's currently still an option.

I know you didn't mean it that way, but this gets close enough to a dogma to warrant a protest. "We don't need no steenkin' templates in <sacred area X>" is, I think, an attitude we need to just rid ourselves of. There's the same harm in using templates too much or too little. The scheme Walter proposed has a lot of flexibility - it plants one pointer to function per type. This is very flexible because that pointer could point to the same function and use a bitmap-based scheme, or (as Walter proposed) point to different instances of a template that does scanning in a type-specific manner. Andrei
Apr 16 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 7:20 PM, Andrei Alexandrescu wrote:
 The scheme Walter proposed has a lot of flexibility - it plants one pointer to
 function per type. This is very flexible because that pointer could point to
the
 same function and use a bitmap-based scheme, or (as Walter proposed) point to
 different instances of a template that does scanning in a type-specific manner.

It could also be a pointer to data. It's entirely up to the template what it's a pointer to.
Apr 16 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 17/04/2012 05:22, Walter Bright a écrit :
 On 4/16/2012 7:20 PM, Andrei Alexandrescu wrote:
 The scheme Walter proposed has a lot of flexibility - it plants one
 pointer to
 function per type. This is very flexible because that pointer could
 point to the
 same function and use a bitmap-based scheme, or (as Walter proposed)
 point to
 different instances of a template that does scanning in a
 type-specific manner.

It could also be a pointer to data. It's entirely up to the template what it's a pointer to.

Data can eventually be stored IN the pointer if needed. Does the system allow that ? If it doesn't, is it possible to enable it ?
Apr 17 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/17/2012 1:47 AM, deadalnix wrote:
 Le 17/04/2012 05:22, Walter Bright a écrit :
 On 4/16/2012 7:20 PM, Andrei Alexandrescu wrote:
 The scheme Walter proposed has a lot of flexibility - it plants one
 pointer to
 function per type. This is very flexible because that pointer could
 point to the
 same function and use a bitmap-based scheme, or (as Walter proposed)
 point to
 different instances of a template that does scanning in a
 type-specific manner.

It could also be a pointer to data. It's entirely up to the template what it's a pointer to.

Data can eventually be stored IN the pointer if needed. Does the system allow that ? If it doesn't, is it possible to enable it ?

I don't know what you mean.
Apr 17 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 17/04/2012 11:42, Walter Bright a écrit :
 On 4/17/2012 1:47 AM, deadalnix wrote:
 Le 17/04/2012 05:22, Walter Bright a écrit :
 On 4/16/2012 7:20 PM, Andrei Alexandrescu wrote:
 The scheme Walter proposed has a lot of flexibility - it plants one
 pointer to
 function per type. This is very flexible because that pointer could
 point to the
 same function and use a bitmap-based scheme, or (as Walter proposed)
 point to
 different instances of a template that does scanning in a
 type-specific manner.

It could also be a pointer to data. It's entirely up to the template what it's a pointer to.

Data can eventually be stored IN the pointer if needed. Does the system allow that ? If it doesn't, is it possible to enable it ?

I don't know what you mean.

If the data fit into the pointer (ie 32bits or 64bits), you don't need to point to data. You can just decide that the pointer IS the data. In other terms : void* ptr; size_t data = cast(size_t) ptr; Such a size if sufficient for many types.
Apr 17 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/17/2012 3:31 AM, deadalnix wrote:
 If the data fit into the pointer (ie 32bits or 64bits), you don't need to point
 to data. You can just decide that the pointer IS the data. In other terms :

 void* ptr;
 size_t data = cast(size_t) ptr;

 Such a size if sufficient for many types.

Yes, that's true.
Apr 17 2012
prev sibling next sibling parent "CTFE-4-the-win" <CTFE 4the.win> writes:
On Monday, 16 April 2012 at 11:30:52 UTC, deadalnix wrote:
 Le 16/04/2012 11:25, Walter Bright a écrit :
 As for data that has no pointers, something has to indicate 
 that. Of
 course, another strategy is to allocate such data in separate 
 pools. In
 fact, that might be an excellent idea, as such pools would 
 never have to
 be read (i.e. swapped in, loaded into cache) during the 
 mark/sweep process.

That is exactly what I meant. Metadata about the block shouldn't be stored anywhere near the block, because it will behave horribly wrong when swap come into play. Metadata must be read and written when GC does its job, but the block itself doesn't require it.

+1 Agree, the metadata should be stored in a allocator private pool for cache reasons.
Apr 16 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 16, 2012, at 2:25 AM, Walter Bright wrote:

 On 4/16/2012 2:05 AM, deadalnix wrote:
 Having this template into object.d seems problematic to me. It is now =


 to provide any custom GC implementation without messing with =


=20
 Providing a user created GC should be as easy as possible.

It's never going to be easy for anyone to just write their own GC, =

lot of time on. What I've been going for is to have all functionality that requires = knowledge of code generation, (most) platform specifics, etc, live in = the compiler runtime portion of Druntime (i.e. in the "rt" package). = This is all stuff that the compiler writer knows by necessity, and the = GC writer shouldn't be required to know it as well. As for pointer maps, I think it's reasonable to establish a format that = these will be made available to the GC, and for them to come from = elsewhere in the runtime. I realize that different GC implementations = may prefer different formats, but hopefully we can settle on one that's = pretty generally usable and efficient. I'd really rather avoid = expecting GC writers to know how to meta-process D types to statically = generate this themselves. Moving this into the GC would also eliminate = the possibility of having the GC chosen at link-time, which is something = that's currently still an option.
 I think this is again solving an implementation issue by a language =


 decision. Ultimately the useless code bloat must be handled by the =


 anyway.

We gotta work with what we have. =20
 6. I suggest the GCInfo pointer be stored at the end of the =



 block, as then it won't affect the alignment of the allocated data.

This very swap unfriendly. Many pages will have to be =


 the marking process, even if it is 100% useless for data that doesn't =


 pointers.

I think there's a misunderstanding. The GC allocates by powers of 2. =

pointer can go at the end of that. That will not cause swapping.
=20
 As for data that has no pointers, something has to indicate that. Of =

fact, that might be an excellent idea, as such pools would never have to = be read (i.e. swapped in, loaded into cache) during the mark/sweep = process. This is obviously all for the current GC anyway. Another implementation = may be better off storing things elsewhere.=
Apr 16 2012
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 16, 2012, at 7:20 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.=
org> wrote:

 On 4/16/12 1:02 PM, Sean Kelly wrote:
 As for pointer maps, I think it's reasonable to establish a format
 that these will be made available to the GC, and for them to come
 from elsewhere in the runtime.  I realize that different GC
 implementations may prefer different formats, but hopefully we can
 settle on one that's pretty generally usable and efficient.  I'd
 really rather avoid expecting GC writers to know how to meta-process
 D types to statically generate this themselves.  Moving this into the
 GC would also eliminate the possibility of having the GC chosen at
 link-time, which is something that's currently still an option.

I know you didn't mean it that way, but this gets close enough to a dogma t=

" is, I think, an attitude we need to just rid ourselves of. There's the sa=

=20
 The scheme Walter proposed has a lot of flexibility - it plants one pointe=

nt to the same function and use a bitmap-based scheme, or (as Walter propose= d) point to different instances of a template that does scanning in a type-s= pecific manner. Fair enough. How about core.memory? That's the visible GC interface, and wo= uld be where template-based GC methods are defined. It still limits all GC i= mplementations to a single pointer map representation though, without some t= heatrics.=20=
Apr 16 2012
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 15 Apr 2012 22:24:56 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 6. I suggest the GCInfo pointer be stored at the end of the allocated  
 block, as then it won't affect the alignment of the allocated data.

This conflicts with the array runtime's use of the end of the block to store the block's 'used' length. But it may not be an issue. Note that the 16-byte block is going to get mighty small (only 12 bytes, possibly 11 if it's an appendable block). I also suggest that you look into changing the way structs are allocated if you haven't already. Right now, they are allocated by creating a new array of size 1. While this is convenient in terms of avoiding a new function, it means all struct allocations are arrays, and as such will be typed as arrays, with appendable semantics and with GCInfo of an array. -Steve
Apr 16 2012
parent bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 I also suggest that you look into changing the way structs are allocated  
 if you haven't already.  Right now, they are allocated by creating a new  
 array of size 1.  While this is convenient in terms of avoiding a new  
 function, it means all struct allocations are arrays, and as such will be  
 typed as arrays, with appendable semantics and with GCInfo of an array.

+2 Bye, bearophile
Apr 16 2012
prev sibling next sibling parent dsimcha <dsimcha yahoo.com> writes:
On 4/15/2012 10:24 PM, Walter Bright wrote:
 Just checked it in. Of course, it doesn't actually do precise GC, it is
 just thrown over the wall for the library devs who are itching to get
 started on it.

Excellent!! Maybe I'll get started on this soon.
Apr 16 2012
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 15 Apr 2012 22:24:56 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Just checked it in. Of course, it doesn't actually do precise GC, it is  
 just thrown over the wall for the library devs who are itching to get  
 started on it.

 I added a getGCInfo() method to TypeInfo that returns an  
 immutable(void)*. This pointer can be anything - a pointer to data, to  
 code, whatever, that implements whatever the GC might need to do precise  
 collections. The value is generated by the template GCInfo(T) in  
 object.d.

I feel this actually can be more generic. Can we change this to a more generic term? For example getRTInfo (short for get runtime info) and RTInfo(T)? Here is my thought: the GC is not the only entity that might be interested in permanently storing compile-time info for use at runtime. Yes, the GC could use this (perhaps exclusively), but this actually works as a fairly good hook to generate RTTI necessary to do reflection. Previously, one had to either parse the object file or make multiple passes through the compiler to generate the info and then include it. With this template solution, the compiler generates the info using the template and then stores it in TypeInfo. But we need to change the name early on to avoid conflicts. I don't think a more generic name would be inappropriate, even if the GC is the only client at first. -Steve
Apr 16 2012
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-04-16 22:52, Steven Schveighoffer wrote:

 I feel this actually can be more generic. Can we change this to a more
 generic term? For example getRTInfo (short for get runtime info) and
 RTInfo(T)?

 Here is my thought: the GC is not the only entity that might be
 interested in permanently storing compile-time info for use at runtime.
 Yes, the GC could use this (perhaps exclusively), but this actually
 works as a fairly good hook to generate RTTI necessary to do reflection.

 Previously, one had to either parse the object file or make multiple
 passes through the compiler to generate the info and then include it.
 With this template solution, the compiler generates the info using the
 template and then stores it in TypeInfo.

 But we need to change the name early on to avoid conflicts. I don't
 think a more generic name would be inappropriate, even if the GC is the
 only client at first.

 -Steve

If we get custom attributes perhaps this case be used to store them. BTW, why not make it a property and drop the "get" prefix. -- /Jacob Carlborg
Apr 16 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/16/2012 1:54 PM, Steven Schveighoffer wrote:
 On Mon, 16 Apr 2012 16:52:54 -0400, Steven Schveighoffer <schveiguy yahoo.com>
 wrote:

 But we need to change the name early on to avoid conflicts. I don't think a
 more generic name would be inappropriate, even if the GC is the only client at
 first.

Should have said "But we need to change the name early on to avoid confusion later", i.e. "why is this GCInfo template generating reflection info that the GC doesn't use?"

Not a bad idea.
Apr 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/18/2012 1:19 AM, Walter Bright wrote:
 On 4/16/2012 1:54 PM, Steven Schveighoffer wrote:
 On Mon, 16 Apr 2012 16:52:54 -0400, Steven Schveighoffer <schveiguy yahoo.com>
 wrote:

 But we need to change the name early on to avoid conflicts. I don't think a
 more generic name would be inappropriate, even if the GC is the only client at
 first.

Should have said "But we need to change the name early on to avoid confusion later", i.e. "why is this GCInfo template generating reflection info that the GC doesn't use?"

Not a bad idea.

Done. I think it is a worthwhile idea because it leaves the door open for all kinds of information to be generated for types, without needing to modify the compiler at all. Also incorporated Jacob's suggestion to make it a property.
Apr 18 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/18/2012 1:32 PM, Steven Schveighoffer wrote:
 One interesting question -- which template is instantiated by the compiler, the
 one in object.di or the one in object_.d? It seems they are not identical...

object.di. The compiler does not know about object_.d. The template should be identical.
 Another interesting question, in what context does the RTInfo(T) template get
 instantiated? In other words, if you needed more helper templates for GC
 purposes for instance, what file would have to include that, object.di,
 object_.d, or some other module?

object.di
Apr 18 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 16 Apr 2012 16:52:54 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 But we need to change the name early on to avoid conflicts.  I don't  
 think a more generic name would be inappropriate, even if the GC is the  
 only client at first.

Should have said "But we need to change the name early on to avoid confusion later", i.e. "why is this GCInfo template generating reflection info that the GC doesn't use?" -Steve
Apr 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 18 Apr 2012 14:41:30 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/18/2012 1:19 AM, Walter Bright wrote:
 On 4/16/2012 1:54 PM, Steven Schveighoffer wrote:
 On Mon, 16 Apr 2012 16:52:54 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com>
 wrote:

 But we need to change the name early on to avoid conflicts. I don't  
 think a
 more generic name would be inappropriate, even if the GC is the only  
 client at
 first.

Should have said "But we need to change the name early on to avoid confusion later", i.e. "why is this GCInfo template generating reflection info that the GC doesn't use?"

Not a bad idea.

Done. I think it is a worthwhile idea because it leaves the door open for all kinds of information to be generated for types, without needing to modify the compiler at all. Also incorporated Jacob's suggestion to make it a property.

I see, looks good! One interesting question -- which template is instantiated by the compiler, the one in object.di or the one in object_.d? It seems they are not identical... Another interesting question, in what context does the RTInfo(T) template get instantiated? In other words, if you needed more helper templates for GC purposes for instance, what file would have to include that, object.di, object_.d, or some other module? -Steve
Apr 18 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 16/04/2012 04:24, Walter Bright a écrit :
 Just checked it in. Of course, it doesn't actually do precise GC, it is
 just thrown over the wall for the library devs who are itching to get
 started on it.

 I added a getGCInfo() method to TypeInfo that returns an
 immutable(void)*. This pointer can be anything - a pointer to data, to
 code, whatever, that implements whatever the GC might need to do precise
 collections. The value is generated by the template GCInfo(T) in object.d.

 Some observations:

 1. if there are no pointers in the allocated data, the GCInfo(T) should
 be null. This enables a fast static check with no indirection for this
 most common case.

 2. closure memory is allocated by calling _d_allocmemory. For now, it
 should just use the old conservative mark/sweep. Later, I can add a
 GCInfo(T) for it.

 3. Many types will follow similar patterns:

 ptr .. int .. ptr .. int

 ptr .. ptr

 int .. ptr

 I suggest that specializations exist for these to avoid generating
 innumerable identical data structures or functions. In fact, if they are
 named with names like:
 scanpipi()
 scanpp()
 scanip()

 then the linker will automatically remove duplicates.

 4. Stack scanning remains imprecise, and should use the usual
 conservative method.

 5. The "has pointers" bit array can, of course, be eliminated.

 6. I suggest the GCInfo pointer be stored at the end of the allocated
 block, as then it won't affect the alignment of the allocated data.

 Release the hounds!

BTW, what about the capability to extends the bahavior for a given type. I'm thinking about XOR linked list for instance.
Apr 19 2012