www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Idea: "Explicit" Data Types

reply "Craig Black" <craigblack2 cox.net> writes:
Before I get into my proposal, I want to vote for stack maps to be added to 
D.  IMO, stack maps are the next logical step to making the GC faster.  They 
don't require a fundamental shift in the library like a moving GC would. 
Once stack maps are added, then perhaps the following proposal should be 
considered to glean additional GC performance.

I'm not stuck on terminology here, so if you don't like the term "explicit" 
because it's too overloaded, that's fine with me.  Pick another term.  The 
concept is what's important.  This proposal is about getting GC and explicit 
memory management to play well together.  The idea is to give the compiler 
information that allows the GC to scan less data, and hence perform better. 
Let's start with a class that uses explcit memory management.

class Foo
{
public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
}

This works fine, but doesn't tell the compiler whether data referenced by 
Foo is allocated on the GC heap or not.  If we preceded the class with some 
kind of qualifier, like "explicit", this would indicate to the compiler that 
data referenced by Foo is not allocated on the heap.  Note: this constraint 
can't be enforced by the compiler, but could be enforced via run-time debug 
assertions.

explicit class Foo
{
public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
}

A problem here arises because even though Foo is allocated on the malloc 
heap, it could contain references, pointers, or arrays that touch the GC 
heap.  Thus, making Foo "explicit" also denotes that any reference, pointer 
or array contained by Foo is also explicit, and therefore does not refer to 
data on the GC heap.  Interestingly, this means that "explicit" would have 
to be transitive, like D's const.

Thus, for the explicit qualifier to be useful, it must be able to be applied 
to a struct, class, pointer, reference, or array type.  However, it doesn't 
make sense to apply it to primitive or POD types.  If you follow my logic 
you understand what explicit types can do.  They inform the compiler that no 
GC heap data will be referenced, so that the compiler can exclude explicit 
types from GC scanning.  Further, the use of explicit can be enforced via 
run-time debug assertions.  Note that there are a few implementation details 
that I'm ignoring now for simplicity sake.

-Craig
Apr 01 2008
next sibling parent "Craig Black" <craigblack2 cox.net> writes:
 data referenced by Foo is not allocated on the heap.
Should read: data refereced by Foo is not allocated on the GC heap.
Apr 01 2008
prev sibling next sibling parent reply janderson <askme me.com> writes:
Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be added 
 to D.  IMO, stack maps are the next logical step to making the GC 
 faster.  They don't require a fundamental shift in the library like a 
 moving GC would. Once stack maps are added, then perhaps the following 
 proposal should be considered to glean additional GC performance.
 
 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is about 
 getting GC and explicit memory management to play well together.  The 
 idea is to give the compiler information that allows the GC to scan less 
 data, and hence perform better. Let's start with a class that uses 
 explcit memory management.
 
 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 This works fine, but doesn't tell the compiler whether data referenced 
 by Foo is allocated on the GC heap or not.  If we preceded the class 
 with some kind of qualifier, like "explicit", this would indicate to the 
 compiler that data referenced by Foo is not allocated on the heap.  
 Note: this constraint can't be enforced by the compiler, but could be 
 enforced via run-time debug assertions.
 
 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.
 
 Thus, for the explicit qualifier to be useful, it must be able to be 
 applied to a struct, class, pointer, reference, or array type.  However, 
 it doesn't make sense to apply it to primitive or POD types.  If you 
 follow my logic you understand what explicit types can do.  They inform 
 the compiler that no GC heap data will be referenced, so that the 
 compiler can exclude explicit types from GC scanning.  Further, the use 
 of explicit can be enforced via run-time debug assertions.  Note that 
 there are a few implementation details that I'm ignoring now for 
 simplicity sake.
 
 -Craig
 
I like this idea. ++vote
Apr 01 2008
parent reply "Craig Black" <craigblack2 cox.net> writes:
"janderson" <askme me.com> wrote in message 
news:fsundp$17pd$1 digitalmars.com...
 Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be added 
 to D.  IMO, stack maps are the next logical step to making the GC faster. 
 They don't require a fundamental shift in the library like a moving GC 
 would. Once stack maps are added, then perhaps the following proposal 
 should be considered to glean additional GC performance.

 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is about 
 getting GC and explicit memory management to play well together.  The 
 idea is to give the compiler information that allows the GC to scan less 
 data, and hence perform better. Let's start with a class that uses 
 explcit memory management.

 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 This works fine, but doesn't tell the compiler whether data referenced by 
 Foo is allocated on the GC heap or not.  If we preceded the class with 
 some kind of qualifier, like "explicit", this would indicate to the 
 compiler that data referenced by Foo is not allocated on the heap.  Note: 
 this constraint can't be enforced by the compiler, but could be enforced 
 via run-time debug assertions.

 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.

 Thus, for the explicit qualifier to be useful, it must be able to be 
 applied to a struct, class, pointer, reference, or array type.  However, 
 it doesn't make sense to apply it to primitive or POD types.  If you 
 follow my logic you understand what explicit types can do.  They inform 
 the compiler that no GC heap data will be referenced, so that the 
 compiler can exclude explicit types from GC scanning.  Further, the use 
 of explicit can be enforced via run-time debug assertions.  Note that 
 there are a few implementation details that I'm ignoring now for 
 simplicity sake.

 -Craig
I like this idea. ++vote
I'm waiting for at least three votes before I delve more into the details of the implementation. Seems like everybody's preoccupied with const right now though. -Craig
Apr 02 2008
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Craig Black wrote:
 
 "janderson" <askme me.com> wrote in message 
 news:fsundp$17pd$1 digitalmars.com...
 Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be 
 added to D.  IMO, stack maps are the next logical step to making the 
 GC faster. They don't require a fundamental shift in the library like 
 a moving GC would. Once stack maps are added, then perhaps the 
 following proposal should be considered to glean additional GC 
 performance.

 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is 
 about getting GC and explicit memory management to play well 
 together.  The idea is to give the compiler information that allows 
 the GC to scan less data, and hence perform better. Let's start with 
 a class that uses explcit memory management.

 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 This works fine, but doesn't tell the compiler whether data 
 referenced by Foo is allocated on the GC heap or not.  If we preceded 
 the class with some kind of qualifier, like "explicit", this would 
 indicate to the compiler that data referenced by Foo is not allocated 
 on the heap.  Note: this constraint can't be enforced by the 
 compiler, but could be enforced via run-time debug assertions.

 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 A problem here arises because even though Foo is allocated on the 
 malloc heap, it could contain references, pointers, or arrays that 
 touch the GC heap.  Thus, making Foo "explicit" also denotes that any 
 reference, pointer or array contained by Foo is also explicit, and 
 therefore does not refer to data on the GC heap.  Interestingly, this 
 means that "explicit" would have to be transitive, like D's const.

 Thus, for the explicit qualifier to be useful, it must be able to be 
 applied to a struct, class, pointer, reference, or array type.  
 However, it doesn't make sense to apply it to primitive or POD 
 types.  If you follow my logic you understand what explicit types can 
 do.  They inform the compiler that no GC heap data will be 
 referenced, so that the compiler can exclude explicit types from GC 
 scanning.  Further, the use of explicit can be enforced via run-time 
 debug assertions.  Note that there are a few implementation details 
 that I'm ignoring now for simplicity sake.

 -Craig
I like this idea. ++vote
I'm waiting for at least three votes before I delve more into the details of the implementation. Seems like everybody's preoccupied with const right now though. -Craig
I'm not voting because it sounds like it solves a problem that I don't have. Or else I just haven't understood. I don't know what stack maps are, so you kinda lost me on the first sentence. --bb
Apr 02 2008
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Bill Baxter wrote:
 I'm not voting because it sounds like it solves a problem that I don't 
 have.  Or else I just haven't understood.  I don't know what stack maps 
 are, so you kinda lost me on the first sentence.
 
 --bb
A stack map is just a data structure (a bitvector, possibly) that records what on the stack is a pointer (and possibly what type of pointer it is). Instead of considering every word-size chunk as a pointer, you can be a lot more precise in garbage collection. And possibly a bit slower, but on the other hand, you might not have to go through as much memory on some collections. So you'll take a small, continual hit for occasional gains in speed and probably frequent gains in memory usage.
Apr 02 2008
parent reply "Craig Black" <craigblack2 cox.net> writes:
"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:ft1etp$eaj$1 digitalmars.com...
 Bill Baxter wrote:
 I'm not voting because it sounds like it solves a problem that I don't 
 have.  Or else I just haven't understood.  I don't know what stack maps 
 are, so you kinda lost me on the first sentence.

 --bb
A stack map is just a data structure (a bitvector, possibly) that records what on the stack is a pointer (and possibly what type of pointer it is). Instead of considering every word-size chunk as a pointer, you can be a lot more precise in garbage collection. And possibly a bit slower, but on the other hand, you might not have to go through as much memory on some collections. So you'll take a small, continual hit for occasional gains in speed and probably frequent gains in memory usage.
I admit I may know less about stack maps than you, but in the few cases I've read about them, they always speak of them as having a positive impact on performance. For example, if the GC runs in the middle of a recursive function that doesn't use pointers, there would be a big benefit to this. -Craig
Apr 02 2008
parent Christopher Wright <dhasenan gmail.com> writes:
Craig Black wrote:
 I admit I may know less about stack maps than you, but in the few cases 
 I've read about them, they always speak of them as having a positive 
 impact on performance.  For example, if the GC runs in the middle of a 
 recursive function that doesn't use pointers, there would be a big 
 benefit to this.
 
 -Craig
True. There are use cases where stack maps would hurt performance, though these would be relatively rare and minor.
Apr 02 2008
prev sibling parent "Craig Black" <craigblack2 cox.net> writes:
 I'm not voting because it sounds like it solves a problem that I don't 
 have.  Or else I just haven't understood.  I don't know what stack maps 
 are, so you kinda lost me on the first sentence.
If you never use explicit memory management, and always use GC, then it probably doesn't affect you. If you use explicit memory management, then it will improve GC performance. This is about making the GC even more precise. Stack maps also make the GC more precise, so I thought I would put my vote in for them as well. Most modern GC's use stack maps. -Craig
Apr 02 2008
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be added 
 to D.  IMO, stack maps are the next logical step to making the GC 
 faster.  They don't require a fundamental shift in the library like a 
 moving GC would. Once stack maps are added, then perhaps the following 
 proposal should be considered to glean additional GC performance.
 
 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is about 
 getting GC and explicit memory management to play well together.  The 
 idea is to give the compiler information that allows the GC to scan less 
 data, and hence perform better. Let's start with a class that uses 
 explcit memory management.
 
 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 This works fine, but doesn't tell the compiler whether data referenced 
 by Foo is allocated on the GC heap or not.  If we preceded the class 
 with some kind of qualifier, like "explicit", this would indicate to the 
 compiler that data referenced by Foo is not allocated on the heap.  
 Note: this constraint can't be enforced by the compiler, but could be 
 enforced via run-time debug assertions.
 
 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.
 
That seems an idea with limited to no usefullness. What if you want to have a class which contains references to both GC-managed data and manually-managed data (which would certainly be a most common case)? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Apr 10 2008
parent "Craig Black" <cblack ara.com> writes:
 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.
That seems an idea with limited to no usefullness. What if you want to have a class which contains references to both GC-managed data and manually-managed data (which would certainly be a most common case)?
I strongly disagree that this is useless. I am thinking of porting C++ code to D and this would be very useful for that, since my C++ code has absolutely no GC at all. Further, GC objects could contain both explicit and non-explicit references. BTW, I'm not stuck on this particular idea. Another strategy would be to make "explicit" non-transitive. This would allow for more control, but would require the programmer to label more things "explicit". Either way, the basic concept is what is important. When you have GC and explicit memory managment in the same application, it is beneficial for performance to tell the compiler what pointers and references are definitely not on the GC heap. Otherwise the GC is doing unnecessary work. -Craig
Apr 11 2008