digitalmars.D - Idea: "Explicit" Data Types

Craig Black (44/44) Apr 01 2008 Before I get into my proposal, I want to vote for stack maps to be added...

Craig Black (1/2) Apr 01 2008 Should read: data refereced by Foo is not allocated on the GC heap.
janderson (3/57) Apr 01 2008 I like this idea.

Craig Black (6/63) Apr 02 2008 I'm waiting for at least three votes before I delve more into the detail...

Bill Baxter (5/74) Apr 02 2008 I'm not voting because it sounds like it solves a problem that I don't

Christopher Wright (9/14) Apr 02 2008 A stack map is just a data structure (a bitvector, possibly) that

Craig Black (7/20) Apr 02 2008 I admit I may know less about stack maps than you, but in the few cases ...

Christopher Wright (3/10) Apr 02 2008 True. There are use cases where stack maps would hurt performance,

Craig Black (6/9) Apr 02 2008 If you never use explicit memory management, and always use GC, then it

Bruno Medeiros (8/50) Apr 10 2008 That seems an idea with limited to no usefullness.

Craig Black (12/23) Apr 11 2008 I strongly disagree that this is useless. I am thinking of porting C++ ...

"Craig Black" <craigblack2 cox.net> writes:

Before I get into my proposal, I want to vote for stack maps to be added to 
D.  IMO, stack maps are the next logical step to making the GC faster.  They 
don't require a fundamental shift in the library like a moving GC would. 
Once stack maps are added, then perhaps the following proposal should be 
considered to glean additional GC performance.

I'm not stuck on terminology here, so if you don't like the term "explicit" 
because it's too overloaded, that's fine with me.  Pick another term.  The 
concept is what's important.  This proposal is about getting GC and explicit 
memory management to play well together.  The idea is to give the compiler 
information that allows the GC to scan less data, and hence perform better. 
Let's start with a class that uses explcit memory management.

class Foo
{
public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
}

This works fine, but doesn't tell the compiler whether data referenced by 
Foo is allocated on the GC heap or not.  If we preceded the class with some 
kind of qualifier, like "explicit", this would indicate to the compiler that 
data referenced by Foo is not allocated on the heap.  Note: this constraint 
can't be enforced by the compiler, but could be enforced via run-time debug 
assertions.

explicit class Foo
{
public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
}

A problem here arises because even though Foo is allocated on the malloc 
heap, it could contain references, pointers, or arrays that touch the GC 
heap.  Thus, making Foo "explicit" also denotes that any reference, pointer 
or array contained by Foo is also explicit, and therefore does not refer to 
data on the GC heap.  Interestingly, this means that "explicit" would have 
to be transitive, like D's const.

Thus, for the explicit qualifier to be useful, it must be able to be applied 
to a struct, class, pointer, reference, or array type.  However, it doesn't 
make sense to apply it to primitive or POD types.  If you follow my logic 
you understand what explicit types can do.  They inform the compiler that no 
GC heap data will be referenced, so that the compiler can exclude explicit 
types from GC scanning.  Further, the use of explicit can be enforced via 
run-time debug assertions.  Note that there are a few implementation details 
that I'm ignoring now for simplicity sake.

-Craig

Apr 01 2008

"Craig Black" <craigblack2 cox.net> writes:

 data referenced by Foo is not allocated on the heap.

Should read: data refereced by Foo is not allocated on the GC heap.

Apr 01 2008

janderson <askme me.com> writes:

Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be added 
 to D.  IMO, stack maps are the next logical step to making the GC 
 faster.  They don't require a fundamental shift in the library like a 
 moving GC would. Once stack maps are added, then perhaps the following 
 proposal should be considered to glean additional GC performance.
 
 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is about 
 getting GC and explicit memory management to play well together.  The 
 idea is to give the compiler information that allows the GC to scan less 
 data, and hence perform better. Let's start with a class that uses 
 explcit memory management.
 
 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 This works fine, but doesn't tell the compiler whether data referenced 
 by Foo is allocated on the GC heap or not.  If we preceded the class 
 with some kind of qualifier, like "explicit", this would indicate to the 
 compiler that data referenced by Foo is not allocated on the heap.  
 Note: this constraint can't be enforced by the compiler, but could be 
 enforced via run-time debug assertions.
 
 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.
 
 Thus, for the explicit qualifier to be useful, it must be able to be 
 applied to a struct, class, pointer, reference, or array type.  However, 
 it doesn't make sense to apply it to primitive or POD types.  If you 
 follow my logic you understand what explicit types can do.  They inform 
 the compiler that no GC heap data will be referenced, so that the 
 compiler can exclude explicit types from GC scanning.  Further, the use 
 of explicit can be enforced via run-time debug assertions.  Note that 
 there are a few implementation details that I'm ignoring now for 
 simplicity sake.
 
 -Craig
 

I like this idea.

++vote

Apr 01 2008

"Craig Black" <craigblack2 cox.net> writes:

"janderson" <askme me.com> wrote in message 
news:fsundp$17pd$1 digitalmars.com...
 Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be added 
 to D.  IMO, stack maps are the next logical step to making the GC faster. 
 They don't require a fundamental shift in the library like a moving GC 
 would. Once stack maps are added, then perhaps the following proposal 
 should be considered to glean additional GC performance.

 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is about 
 getting GC and explicit memory management to play well together.  The 
 idea is to give the compiler information that allows the GC to scan less 
 data, and hence perform better. Let's start with a class that uses 
 explcit memory management.

 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 This works fine, but doesn't tell the compiler whether data referenced by 
 Foo is allocated on the GC heap or not.  If we preceded the class with 
 some kind of qualifier, like "explicit", this would indicate to the 
 compiler that data referenced by Foo is not allocated on the heap.  Note: 
 this constraint can't be enforced by the compiler, but could be enforced 
 via run-time debug assertions.

 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.

 Thus, for the explicit qualifier to be useful, it must be able to be 
 applied to a struct, class, pointer, reference, or array type.  However, 
 it doesn't make sense to apply it to primitive or POD types.  If you 
 follow my logic you understand what explicit types can do.  They inform 
 the compiler that no GC heap data will be referenced, so that the 
 compiler can exclude explicit types from GC scanning.  Further, the use 
 of explicit can be enforced via run-time debug assertions.  Note that 
 there are a few implementation details that I'm ignoring now for 
 simplicity sake.

 -Craig

 I like this idea.

 ++vote

I'm waiting for at least three votes before I delve more into the details of 
the implementation. Seems like everybody's preoccupied with const right now 
though.

-Craig

Apr 02 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Craig Black wrote:
 
 "janderson" <askme me.com> wrote in message 
 news:fsundp$17pd$1 digitalmars.com...
 Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be 
 added to D.  IMO, stack maps are the next logical step to making the 
 GC faster. They don't require a fundamental shift in the library like 
 a moving GC would. Once stack maps are added, then perhaps the 
 following proposal should be considered to glean additional GC 
 performance.

 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is 
 about getting GC and explicit memory management to play well 
 together.  The idea is to give the compiler information that allows 
 the GC to scan less data, and hence perform better. Let's start with 
 a class that uses explcit memory management.

 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 This works fine, but doesn't tell the compiler whether data 
 referenced by Foo is allocated on the GC heap or not.  If we preceded 
 the class with some kind of qualifier, like "explicit", this would 
 indicate to the compiler that data referenced by Foo is not allocated 
 on the heap.  Note: this constraint can't be enforced by the 
 compiler, but could be enforced via run-time debug assertions.

 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }

 A problem here arises because even though Foo is allocated on the 
 malloc heap, it could contain references, pointers, or arrays that 
 touch the GC heap.  Thus, making Foo "explicit" also denotes that any 
 reference, pointer or array contained by Foo is also explicit, and 
 therefore does not refer to data on the GC heap.  Interestingly, this 
 means that "explicit" would have to be transitive, like D's const.

 Thus, for the explicit qualifier to be useful, it must be able to be 
 applied to a struct, class, pointer, reference, or array type.  
 However, it doesn't make sense to apply it to primitive or POD 
 types.  If you follow my logic you understand what explicit types can 
 do.  They inform the compiler that no GC heap data will be 
 referenced, so that the compiler can exclude explicit types from GC 
 scanning.  Further, the use of explicit can be enforced via run-time 
 debug assertions.  Note that there are a few implementation details 
 that I'm ignoring now for simplicity sake.

 -Craig

 I like this idea.

 ++vote

 
 I'm waiting for at least three votes before I delve more into the 
 details of the implementation. Seems like everybody's preoccupied with 
 const right now though.
 
 -Craig

I'm not voting because it sounds like it solves a problem that I don't 
have.  Or else I just haven't understood.  I don't know what stack maps 
are, so you kinda lost me on the first sentence.

--bb

Apr 02 2008

Christopher Wright <dhasenan gmail.com> writes:

Bill Baxter wrote:
 I'm not voting because it sounds like it solves a problem that I don't 
 have.  Or else I just haven't understood.  I don't know what stack maps 
 are, so you kinda lost me on the first sentence.
 
 --bb

A stack map is just a data structure (a bitvector, possibly) that 
records what on the stack is a pointer (and possibly what type of 
pointer it is). Instead of considering every word-size chunk as a 
pointer, you can be a lot more precise in garbage collection. And 
possibly a bit slower, but on the other hand, you might not have to go 
through as much memory on some collections. So you'll take a small, 
continual hit for occasional gains in speed and probably frequent gains 
in memory usage.

Apr 02 2008

"Craig Black" <craigblack2 cox.net> writes:

"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:ft1etp$eaj$1 digitalmars.com...
 Bill Baxter wrote:
 I'm not voting because it sounds like it solves a problem that I don't 
 have.  Or else I just haven't understood.  I don't know what stack maps 
 are, so you kinda lost me on the first sentence.

 --bb

 A stack map is just a data structure (a bitvector, possibly) that records 
 what on the stack is a pointer (and possibly what type of pointer it is). 
 Instead of considering every word-size chunk as a pointer, you can be a 
 lot more precise in garbage collection. And possibly a bit slower, but on 
 the other hand, you might not have to go through as much memory on some 
 collections. So you'll take a small, continual hit for occasional gains in 
 speed and probably frequent gains in memory usage.

I admit I may know less about stack maps than you, but in the few cases I've 
read about them, they always speak of them as having a positive impact on 
performance.  For example, if the GC runs in the middle of a recursive 
function that doesn't use pointers, there would be a big benefit to this.

-Craig

Apr 02 2008

Christopher Wright <dhasenan gmail.com> writes:

Craig Black wrote:
 I admit I may know less about stack maps than you, but in the few cases 
 I've read about them, they always speak of them as having a positive 
 impact on performance.  For example, if the GC runs in the middle of a 
 recursive function that doesn't use pointers, there would be a big 
 benefit to this.
 
 -Craig

True. There are use cases where stack maps would hurt performance, 
though these would be relatively rare and minor.

Apr 02 2008

"Craig Black" <craigblack2 cox.net> writes:

 I'm not voting because it sounds like it solves a problem that I don't 
 have.  Or else I just haven't understood.  I don't know what stack maps 
 are, so you kinda lost me on the first sentence.

If you never use explicit memory management, and always use GC, then it 
probably doesn't affect you.  If you use explicit memory management, then it 
will improve GC performance.  This is about making the GC even more precise. 
Stack maps also make the GC more precise, so I thought I would put my vote 
in for them as well.  Most modern GC's use stack maps.

-Craig

Apr 02 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Craig Black wrote:
 Before I get into my proposal, I want to vote for stack maps to be added 
 to D.  IMO, stack maps are the next logical step to making the GC 
 faster.  They don't require a fundamental shift in the library like a 
 moving GC would. Once stack maps are added, then perhaps the following 
 proposal should be considered to glean additional GC performance.
 
 I'm not stuck on terminology here, so if you don't like the term 
 "explicit" because it's too overloaded, that's fine with me.  Pick 
 another term.  The concept is what's important.  This proposal is about 
 getting GC and explicit memory management to play well together.  The 
 idea is to give the compiler information that allows the GC to scan less 
 data, and hence perform better. Let's start with a class that uses 
 explcit memory management.
 
 class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 This works fine, but doesn't tell the compiler whether data referenced 
 by Foo is allocated on the GC heap or not.  If we preceded the class 
 with some kind of qualifier, like "explicit", this would indicate to the 
 compiler that data referenced by Foo is not allocated on the heap.  
 Note: this constraint can't be enforced by the compiler, but could be 
 enforced via run-time debug assertions.
 
 explicit class Foo
 {
 public:
    new(size_t sz) { return std.c.stdlib.malloc(sz); }
    delete(void* p) { std.c.stdlib.free(p); }
 }
 
 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.
 

That seems an idea with limited to no usefullness.
What if you want to have a class which contains references to both 
GC-managed data and manually-managed data (which would certainly be a 
most common case)?


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Apr 10 2008

"Craig Black" <cblack ara.com> writes:

 A problem here arises because even though Foo is allocated on the malloc 
 heap, it could contain references, pointers, or arrays that touch the GC 
 heap.  Thus, making Foo "explicit" also denotes that any reference, 
 pointer or array contained by Foo is also explicit, and therefore does 
 not refer to data on the GC heap.  Interestingly, this means that 
 "explicit" would have to be transitive, like D's const.

 That seems an idea with limited to no usefullness.
 What if you want to have a class which contains references to both 
 GC-managed data and manually-managed data (which would certainly be a most 
 common case)?

I strongly disagree that this is useless.  I am thinking of porting C++ code 
to D and this would be very useful for that, since my C++ code has 
absolutely no GC at all.  Further, GC objects could contain both explicit 
and non-explicit references.

BTW, I'm not stuck on this particular idea.  Another strategy would be to 
make "explicit" non-transitive.  This would allow for more control, but 
would require the programmer to label more things "explicit".

Either way, the basic concept is what is important.  When you have GC and 
explicit memory managment in the same application, it is beneficial for 
performance to tell the compiler what pointers and references are definitely 
not on the GC heap.  Otherwise the GC is doing unnecessary work.

-Craig

Apr 11 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Idea: "Explicit" Data Types