digitalmars.D - My Language Feature Requests

Craig Black (45/45) Dec 22 2007 So now that the const thing seems like it might be settled, I would like...

Craig Black (2/4) Dec 22 2007 Another option would be to only allow fixed pointers to point to the hea...

Craig Black (3/7) Dec 22 2007 When I said "heap", I meant the GC heap of course.

Christopher Wright (46/88) Dec 22 2007 Don't see the point of this. You'd map a single old value to a single

Craig Black (43/128) Dec 22 2007 I'm not exactly sure what you are talking about, but you mention computa...

Craig Black (4/5) Dec 22 2007 Sorry ... I use C++ a lot at work. Should read:
Christopher Wright (9/123) Dec 23 2007 It requires you to store a struct by reference. Thus, performance hit.

Craig Black (6/14) Dec 23 2007 No it doesn't. Structs will be able to be allocated on the stack, witho...

Christopher Wright (30/50) Dec 23 2007 Slicing problem:

Craig Black (23/39) Dec 23 2007 Right, but that's not a problem if you disallow polymorphism for stack

Christopher Wright (42/91) Dec 23 2007 Ideally you'd determine whether your polymorphic struct has inheritors

Craig Black (22/86) Dec 23 2007 That's the approach for both C++ structs and classes. But this decision...

Frits van Bommel (11/18) Dec 23 2007 Since "fixedness" as proposed would be a compile-time property, and you

Christopher Wright (11/30) Dec 23 2007 Yes, I thought of that. Currently, however, offTi isn't populated. Just

Christopher Wright (15/39) Dec 23 2007 You are just making sure that the garbage collector is handling all

Craig Black (12/36) Dec 23 2007 It has nothing to do with the garbage collector run-time stuff. It is

Craig Black (4/8) Dec 23 2007 I hereby detract this statement. These run-time checks could be optiona...
Christopher Wright (20/64) Dec 23 2007 The point of overloading new and delete is to work around the garbage

Craig Black (7/23) Dec 23 2007 I was not proposing that anyone rely on "manually created objects withou...

"Craig Black" <craigblack2 cox.net> writes:

So now that the const thing seems like it might be settled, I would like to 
put my vote in as to what features should be included next.  I can only 
think of 2 features that would be high on my priority list.  My votes go for 
the following language features that would enhance the performance of D.

1)  Adding better support for structs including ctors/dtors, inheritance, 
copy semantics, etc.  This would allow for more efficient data structures 
that perform heap allocation without relying on the GC.  Making applications 
GC-lite is fastest path to high-performance D applications.  The big reason 
I want this is that I would then be able to write an efficient array 
template that does not rely on GC.  Since I use arrays so much, this would 
provide a huge performance improvement for me.

2) Adding language features that would allow for a moving GC.  A modern, 
moving GC would also be a huge performance win.  I think we would have a 
safety problem if we currently implemented a moving GC.  Languages that have 
moving GC greatly restrict what can be done with pointers.  We need to 
provide a syntax that will allow pointers to be used when memory is 
explicitly managed, but disallow pointers for GC memory.

So, here's one idea for making D more safe for moving GC.

a) Disallow overloading new and delete for classes, and make classes 
strictly for GC, perhaps with an exception for classes instantiated on the 
stack using scope.
b) Allow new and delete to work with structs, and allocate them on the 
malloc heap.  I would still want to be able to override new and delete for 
structs, specifically to be able to use nedmalloc.

Then the compiler could disallow taking the address of a class field, since 
we know the resulting pointer would pointer to the GC heap.  Note that this 
would be a compile-time check, and so would not degrade run-time 
performance.


fixed keyword.  In D, it could work like this:

a) Preceding a pointer declaration with fixed would allow that pointer to 
take the address in the GC heap.
b) Pointer arithmetic would be disallowed for fixed pointers.
c) A fixed pointer will mark the corresponding GC object as "pinned" so that 
the GC knows not to move the object.
d) When the fixed pointer is changed or deallocated, it will unpin the 
object, and pin any new object that it refers to.

The fixed pointer will have to know whether or not it points to GC memory so 
that it doesn't pin non-GC objects.  Using the first idea, we can determine 
at compile time whether a pointer points to the heap or not.

Yes, this would be a big change, but not as big as const IMO.  I feel if any 
feature warrants breaking some code, it would be high-performance GC.  But 
maybe someone else can find a solution that doesn't break compatibility.

Thoughts?

-Craig

Dec 22 2007

"Craig Black" <craigblack2 cox.net> writes:

 Using the first idea, we can determine at compile time whether a pointer 
 points to the heap or not.

Another option would be to only allow fixed pointers to point to the heap. 
It might simplify the implementation.

Dec 22 2007

"Craig Black" <craigblack2 cox.net> writes:

"Craig Black" <craigblack2 cox.net> wrote in message 
news:fkk79p$226d$1 digitalmars.com...
 Using the first idea, we can determine at compile time whether a pointer 
 points to the heap or not.

 Another option would be to only allow fixed pointers to point to the heap. 
 It might simplify the implementation.

When I said "heap", I meant the GC heap of course.

Dec 22 2007

Christopher Wright <dhasenan gmail.com> writes:

Craig Black wrote:
 2) Adding language features that would allow for a moving GC.  A modern, 
 moving GC would also be a huge performance win.  I think we would have a 
 safety problem if we currently implemented a moving GC.  Languages that 
 have moving GC greatly restrict what can be done with pointers.  We need 
 to provide a syntax that will allow pointers to be used when memory is 
 explicitly managed, but disallow pointers for GC memory.
 
 So, here's one idea for making D more safe for moving GC.
 
 a) Disallow overloading new and delete for classes, and make classes 
 strictly for GC, perhaps with an exception for classes instantiated on 
 the stack using scope.

Don't see the point of this. You'd map a single old value to a single 
new value...or map an old range to a new one. You're changing one 
equality check and one assignment to two comparisons and an addition. 
And this is when you're looking through the entire address space of the 
program.

 b) Allow new and delete to work with structs, and allocate them on the 
 malloc heap.  I would still want to be able to override new and delete 
 for structs, specifically to be able to use nedmalloc.

This can allow polymorphism for structs, actually, but it is a bit of a 
performance hit.

 Then the compiler could disallow taking the address of a class field, 
 since we know the resulting pointer would pointer to the GC heap.  Note 
 that this would be a compile-time check, and so would not degrade 
 run-time performance.

Ugly.

What do you do for taking the address of a class variable? Well, okay, 
you have to take the address of the reference; you can't take the 
address of the variable directly. The current method is ugly and 
undefined behavior:
*cast(void**)&obj;

And you can assume that all pointers that point to that region of memory 
have to be moved.

The problem is granularity.

class Foo {
    Foo next;
    size_t i, j, k, l, m, n, o, p;
}

Here, the current regime would mark *Foo as hasPointers. If i, j, k, l, 
m, n, o, or p just happened to look like a pointer, they'd be changed. 
You'd need to find where each object begins, then you'd need to go 
through the offset type info to see which elements are really pointers.

Since you're running the garbage collector, that's doable, if the offset 
type info is currently available (I think it wasn't, last I checked, but 
I don't really recall).


 the fixed keyword.  In D, it could work like this:
 
 a) Preceding a pointer declaration with fixed would allow that pointer 
 to take the address in the GC heap.
 b) Pointer arithmetic would be disallowed for fixed pointers.

Why?

fixed float* four_floats = std.gc.malloc(4 * float.sizeof);
fixed float* float_one = four_floats;
fixed float* float_two = four_floats + 1;
fixed float* float_three = four_floats + 2;
fixed float* float_four = four_floats + 3;

Seems fine to me. You might go beyond the allocated space, but that's 
already undefined behavior.

 c) A fixed pointer will mark the corresponding GC object as "pinned" so 
 that the GC knows not to move the object.
 d) When the fixed pointer is changed or deallocated, it will unpin the 
 object, and pin any new object that it refers to.

While there is a fixed reference to the GC object, it is pinned. If that 
reference is rebound to another GC object, the original object is 
unpinned and the new one is pinned.

How to mark these is a difficult problem. On a 64-bit machine, I'd say 
you just use the most significant bit as a flag; you're not going to use 
petabytes of address space.

 The fixed pointer will have to know whether or not it points to GC 
 memory so that it doesn't pin non-GC objects.  Using the first idea, we 
 can determine at compile time whether a pointer points to the heap or not.

The fixed pointer will just stand there shouting "I am a fixed pointer! 
Look on me and despair!" And the garbage collector will look where it's 
pointing; if it is pointing at GC memory, the garbage collector will 
indeed look on it and despair. Otherwise, it will ignore the fixedness.

 Yes, this would be a big change, but not as big as const IMO.  I feel if 
 any feature warrants breaking some code, it would be high-performance 
 GC.  But maybe someone else can find a solution that doesn't break 
 compatibility.
 
 Thoughts?
 
 -Craig

Dec 22 2007

"Craig Black" <craigblack2 cox.net> writes:

"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:fkkm0i$2oa9$1 digitalmars.com...
 Craig Black wrote:
 2) Adding language features that would allow for a moving GC.  A modern, 
 moving GC would also be a huge performance win.  I think we would have a 
 safety problem if we currently implemented a moving GC.  Languages that 
 have moving GC greatly restrict what can be done with pointers.  We need 
 to provide a syntax that will allow pointers to be used when memory is 
 explicitly managed, but disallow pointers for GC memory.

 So, here's one idea for making D more safe for moving GC.

 a) Disallow overloading new and delete for classes, and make classes 
 strictly for GC, perhaps with an exception for classes instantiated on 
 the stack using scope.

 Don't see the point of this. You'd map a single old value to a single new 
 value...or map an old range to a new one. You're changing one equality 
 check and one assignment to two comparisons and an addition. And this is 
 when you're looking through the entire address space of the program.

I'm not exactly sure what you are talking about, but you mention computation 
performed at run-time.  The concept here is that it will be a compile-time 
restriction.

The reason to disallow new and delete is to ensure that all instances of a 
class not instantiated using "scope" will be GC objects.  This gives the 
compiler the information necessary to enforce pointer assignment 
restrictions at compile-time.

 b) Allow new and delete to work with structs, and allocate them on the 
 malloc heap.  I would still want to be able to override new and delete 
 for structs, specifically to be able to use nedmalloc.

 This can allow polymorphism for structs, actually, but it is a bit of a 
 performance hit.

Yes, polymorphism for structs could be allowed.  I don't know why you think 
that would be a performance hit.  C++ structs and classes allow 
polymorphism, but do not take any performance hit or memory overhead when 
polymorphism is not used.  If polymorphism is used, it doesn't affect the 
performance of non-polymorphic functions, and only requires a pointer to be 
stored in each object in order to reference the vtable.

Maybe you think I am implying that ALL structs will be allocated on the 
malloc heap.  No, no, no.  I am suggesting that a struct could be allocated 
on the heap or on the stack.  How would the syntax look?  Structs allocated 
on the stack would retain the same syntax.  The ones allocated on the heap 
would be allocated with the new operator.  These could be referenced using 
pointers, or maybe some form of reference type.  But the reference types 
would need to be explicitly declared like.

struct A { A(int x) {} }

A a = A(1); // stack allocation
A *a = new A(1); // possible syntax for heap allocation
A &a = new A(1); // another possible syntax, I'm sure there are other ideas.

 Then the compiler could disallow taking the address of a class field, 
 since we know the resulting pointer would pointer to the GC heap.  Note 
 that this would be a compile-time check, and so would not degrade 
 run-time performance.

 Ugly.

 What do you do for taking the address of a class variable? Well, okay, you 
 have to take the address of the reference; you can't take the address of 
 the variable directly. The current method is ugly and undefined behavior:
 *cast(void**)&obj;

 And you can assume that all pointers that point to that region of memory 
 have to be moved.

 The problem is granularity.

 class Foo {
    Foo next;
    size_t i, j, k, l, m, n, o, p;
 }

 Here, the current regime would mark *Foo as hasPointers. If i, j, k, l, m, 
 n, o, or p just happened to look like a pointer, they'd be changed. You'd 
 need to find where each object begins, then you'd need to go through the 
 offset type info to see which elements are really pointers.

 Since you're running the garbage collector, that's doable, if the offset 
 type info is currently available (I think it wasn't, last I checked, but I 
 don't really recall).

I'm not sure you understand what I'm proposing.  What you are talking about 
is run-time information used by the garbage collecter.  I'm talking about a 
compile-time restriction.  No checking anything at run-time, and so no 
performance hit.  Maybe the confusion stems from the fact that I didn't 
describe in detail how this would work.  That's because I haven't thought it 
through yet.  But I'm confident that there is a good way this restriction 
could enforced at compile-time.


 the fixed keyword.  In D, it could work like this:

 a) Preceding a pointer declaration with fixed would allow that pointer to 
 take the address in the GC heap.
 b) Pointer arithmetic would be disallowed for fixed pointers.

 Why?

 fixed float* four_floats = std.gc.malloc(4 * float.sizeof);
 fixed float* float_one = four_floats;
 fixed float* float_two = four_floats + 1;
 fixed float* float_three = four_floats + 2;
 fixed float* float_four = four_floats + 3;

 Seems fine to me. You might go beyond the allocated space, but that's 
 already undefined behavior.

Ok, point taken.  Pointer arithmetic might be useful.  I'm just trying to 
make it as safe as possible, and maybe disallowing this is going too far. 
However, your above example could be implemented without pointer arithmetic 
using a static array.

 c) A fixed pointer will mark the corresponding GC object as "pinned" so 
 that the GC knows not to move the object.
 d) When the fixed pointer is changed or deallocated, it will unpin the 
 object, and pin any new object that it refers to.

 While there is a fixed reference to the GC object, it is pinned. If that 
 reference is rebound to another GC object, the original object is unpinned 
 and the new one is pinned.

Right.  Pointer is the wrong word.  Sorry.

 How to mark these is a difficult problem. On a 64-bit machine, I'd say you 
 just use the most significant bit as a flag; you're not going to use 
 petabytes of address space.

I'm not sure what the best way would be because I don't know a lot of 
details about D's GC.

 The fixed pointer will have to know whether or not it points to GC memory 
 so that it doesn't pin non-GC objects.  Using the first idea, we can 
 determine at compile time whether a pointer points to the heap or not.

 The fixed pointer will just stand there shouting "I am a fixed pointer! 
 Look on me and despair!" And the garbage collector will look where it's 
 pointing; if it is pointing at GC memory, the garbage collector will 
 indeed look on it and despair. Otherwise, it will ignore the fixedness.

Yes, that will work, but requires a run-time check (and a branch).  The 
run-time overhead for what you propose might end up being trivial, but I 
think it could be done at compile-time.

 Yes, this would be a big change, but not as big as const IMO.  I feel if 
 any feature warrants breaking some code, it would be high-performance GC. 
 But maybe someone else can find a solution that doesn't break 
 compatibility.

 Thoughts?

 -Craig

Dec 22 2007

"Craig Black" <craigblack2 cox.net> writes:

 struct A { A(int x) {} }

Sorry ... I use C++ a lot at work.  Should read:

struct A { this(int x) {} }

(Further, this code is based on the hypothesis that we may get struct 
ctors.)

Dec 22 2007

Christopher Wright <dhasenan gmail.com> writes:

Craig Black wrote:
 
 "Christopher Wright" <dhasenan gmail.com> wrote in message 
 news:fkkm0i$2oa9$1 digitalmars.com...
 Craig Black wrote:
 2) Adding language features that would allow for a moving GC.  A 
 modern, moving GC would also be a huge performance win.  I think we 
 would have a safety problem if we currently implemented a moving GC.  
 Languages that have moving GC greatly restrict what can be done with 
 pointers.  We need to provide a syntax that will allow pointers to be 
 used when memory is explicitly managed, but disallow pointers for GC 
 memory.

 So, here's one idea for making D more safe for moving GC.

 a) Disallow overloading new and delete for classes, and make classes 
 strictly for GC, perhaps with an exception for classes instantiated 
 on the stack using scope.

 Don't see the point of this. You'd map a single old value to a single 
 new value...or map an old range to a new one. You're changing one 
 equality check and one assignment to two comparisons and an addition. 
 And this is when you're looking through the entire address space of 
 the program.

 
 I'm not exactly sure what you are talking about, but you mention 
 computation performed at run-time.  The concept here is that it will be 
 a compile-time restriction.
 
 The reason to disallow new and delete is to ensure that all instances of 
 a class not instantiated using "scope" will be GC objects.  This gives 
 the compiler the information necessary to enforce pointer assignment 
 restrictions at compile-time.

I misplaced the text and am now feeling stupid.

 b) Allow new and delete to work with structs, and allocate them on 
 the malloc heap.  I would still want to be able to override new and 
 delete for structs, specifically to be able to use nedmalloc.

 This can allow polymorphism for structs, actually, but it is a bit of 
 a performance hit.

 
 Yes, polymorphism for structs could be allowed.  I don't know why you 
 think that would be a performance hit.  C++ structs and classes allow 
 polymorphism, but do not take any performance hit or memory overhead 
 when polymorphism is not used.  If polymorphism is used, it doesn't 
 affect the performance of non-polymorphic functions, and only requires a 
 pointer to be stored in each object in order to reference the vtable.

It requires you to store a struct by reference. Thus, performance hit.

 Maybe you think I am implying that ALL structs will be allocated on the 
 malloc heap.  No, no, no.  I am suggesting that a struct could be 
 allocated on the heap or on the stack.  How would the syntax look?  
 Structs allocated on the stack would retain the same syntax.  The ones 
 allocated on the heap would be allocated with the new operator.  These 
 could be referenced using pointers, or maybe some form of reference 
 type.  But the reference types would need to be explicitly declared like.
 
 struct A { A(int x) {} }
 
 A a = A(1); // stack allocation
 A *a = new A(1); // possible syntax for heap allocation
 A &a = new A(1); // another possible syntax, I'm sure there are other 
 ideas.
 
 Then the compiler could disallow taking the address of a class field, 
 since we know the resulting pointer would pointer to the GC heap.  
 Note that this would be a compile-time check, and so would not 
 degrade run-time performance.

 Ugly.

 What do you do for taking the address of a class variable? Well, okay, 
 you have to take the address of the reference; you can't take the 
 address of the variable directly. The current method is ugly and 
 undefined behavior:
 *cast(void**)&obj;

 And you can assume that all pointers that point to that region of 
 memory have to be moved.

 The problem is granularity.

 class Foo {
    Foo next;
    size_t i, j, k, l, m, n, o, p;
 }

 Here, the current regime would mark *Foo as hasPointers. If i, j, k, 
 l, m, n, o, or p just happened to look like a pointer, they'd be 
 changed. You'd need to find where each object begins, then you'd need 
 to go through the offset type info to see which elements are really 
 pointers.

 Since you're running the garbage collector, that's doable, if the 
 offset type info is currently available (I think it wasn't, last I 
 checked, but I don't really recall).

 
 I'm not sure you understand what I'm proposing.  What you are talking 
 about is run-time information used by the garbage collecter.  I'm 
 talking about a compile-time restriction.  No checking anything at 
 run-time, and so no performance hit.  Maybe the confusion stems from the 
 fact that I didn't describe in detail how this would work.  That's 
 because I haven't thought it through yet.  But I'm confident that there 
 is a good way this restriction could enforced at compile-time.

Okay, I swapped that section of text with the previous one that was out 
of place.

 The fixed pointer will have to know whether or not it points to GC 
 memory so that it doesn't pin non-GC objects.  Using the first idea, 
 we can determine at compile time whether a pointer points to the heap 
 or not.

 The fixed pointer will just stand there shouting "I am a fixed 
 pointer! Look on me and despair!" And the garbage collector will look 
 where it's pointing; if it is pointing at GC memory, the garbage 
 collector will indeed look on it and despair. Otherwise, it will 
 ignore the fixedness.

 
 Yes, that will work, but requires a run-time check (and a branch).  The 
 run-time overhead for what you propose might end up being trivial, but I 
 think it could be done at compile-time.

I'm not so sure. You'd have to make it undefined behavior to assign a 
non-fixed address to a fixed pointer. The reverse is fine, of course.

Since class references are pointers, you'd have to have the fixed 
storage class apply to them as well. Any reference type, really.

Dec 23 2007

"Craig Black" <craigblack2 cox.net> writes:

 It requires you to store a struct by reference. Thus, performance hit.

No it doesn't.  Structs will be able to be allocated on the stack, without 
any referencing.  As an OPTION, you will be able to store a struct by 
reference.  C++ does this very same thing and it is very efficient.

 Yes, that will work, but requires a run-time check (and a branch).  The 
 run-time overhead for what you propose might end up being trivial, but I 
 think it could be done at compile-time.

 I'm not so sure. You'd have to make it undefined behavior to assign a 
 non-fixed address to a fixed pointer. The reverse is fine, of course.

 Since class references are pointers, you'd have to have the fixed storage 
 class apply to them as well. Any reference type, really.

Yes and all class fields would be fixed as well, unless the class object was 
instantiated using scope.  This means that when you take the address of 
them, it results in a fixed pointer.

Dec 23 2007

Christopher Wright <dhasenan gmail.com> writes:

Craig Black wrote:
 
 It requires you to store a struct by reference. Thus, performance hit.

 
 No it doesn't.  Structs will be able to be allocated on the stack, 
 without any referencing.  As an OPTION, you will be able to store a 
 struct by reference.  C++ does this very same thing and it is very 
 efficient.

Slicing problem:
struct A { int i, j; }
struct B : A { long k; }

A foo (A a) {
    static assert (a.sizeof == 8);
    assert (a.sizeof == 8);
    return a;
}

B b;
b.k = 14;
assert (b.sizeof == 16);
b = foo(b);
assert (b.k == 14); // FAIL


Polymorphic structs *have* to be reference types, unless you determine 
stack layout at runtime. And not only that, you have to modify stack 
layout after you've created a stack frame. The only saving grace is that 
you won't have to do that for a stack frame higher than the current one.

 Yes, that will work, but requires a run-time check (and a branch).  
 The run-time overhead for what you propose might end up being 
 trivial, but I think it could be done at compile-time.

 I'm not so sure. You'd have to make it undefined behavior to assign a 
 non-fixed address to a fixed pointer. The reverse is fine, of course.

 Since class references are pointers, you'd have to have the fixed 
 storage class apply to them as well. Any reference type, really.

 
 Yes and all class fields would be fixed as well, unless the class object 
 was instantiated using scope.  This means that when you take the address 
 of them, it results in a fixed pointer.

You're saying:
class Foo {
    int i;
}

Foo f = new Foo();
int* i_ptr = &f.i;

That would be a compile error? f is not fixed; I don't care if the bits 
in i_ptr change, or the bits in the reference f. Why should I?

Just because I took the address of f.i and stored it in an unfixed 
pointer, the garbage collector, which has full authority to change the 
pointer I just got, can't move *f?

Why?

Dec 23 2007

"Craig Black" <craigblack2 cox.net> writes:

 Polymorphic structs *have* to be reference types, unless you determine 
 stack layout at runtime. And not only that, you have to modify stack 
 layout after you've created a stack frame. The only saving grace is that 
 you won't have to do that for a stack frame higher than the current one.

Right, but that's not a problem if you disallow polymorphism for stack 
objects.  This is what C++ does and it works very well.  Rather than 
generating a run-time assertion, your code would simply not compile.  If you 
want polymorphism then you have to instantiate then you would have to 
instantiate the struct on the heap.

struct A { int i, j; }
struct B : A { long k; }

A foo (A a) {  return a; }
B b;
b = foo(b); // compile error:  instance of struct B can't be implicitly 
converted to an instance of struct A

Anyway, this is all moot anyway, because I've thought of an easier solution. 
Pointers can be checked at run-time to determine if they address the GC 
heap.  This check could be removed when compiling in release mode, so there 
will be no performance degradation.

So there's no need to dissallow new and delete for classes and we don't need 
struct polymorphism.

You're saying:
class Foo {
    int i;
}

Foo f = new Foo();
int* i_ptr = &f.i;

That would be a compile error? f is not fixed; I don't care if the bits in 
i_ptr change, or the bits in the reference f. Why should I?

Just because I took the address of f.i and stored it in an unfixed pointer, 
the garbage collector, which has full authority to change the pointer I 
just got, can't move *f?

Why?

I'm not really sure what you are asking.  If the GC moves the relocates f, 
then i_ptr no longer points the appropriate location.  Isn't that obvious?

Are you suggesting that the GC relocate i_ptr as well?  No GC I know of 
relocates raw pointers, so there's probably a good technical reason why they 
don't.  I'm not a GC expert though.

-Craig

Dec 23 2007

Christopher Wright <dhasenan gmail.com> writes:

Craig Black wrote:
 Polymorphic structs *have* to be reference types, unless you determine 
 stack layout at runtime. And not only that, you have to modify stack 
 layout after you've created a stack frame. The only saving grace is 
 that you won't have to do that for a stack frame higher than the 
 current one.

 
 Right, but that's not a problem if you disallow polymorphism for stack 
 objects.  This is what C++ does and it works very well.  Rather than 
 generating a run-time assertion, your code would simply not compile.  If 
 you want polymorphism then you have to instantiate then you would have 
 to instantiate the struct on the heap.

Ideally you'd determine whether your polymorphic struct has inheritors 
or base classes and, if so, put it on the heap, else put it on the 
stack. This is why it'd be better to keep structs as they are, but have 
by-value classes.

 struct A { int i, j; }
 struct B : A { long k; }
 
 A foo (A a) {  return a; }
 B b;
 b = foo(b); // compile error:  instance of struct B can't be implicitly 
 converted to an instance of struct A
 
 Anyway, this is all moot anyway, because I've thought of an easier 
 solution. Pointers can be checked at run-time to determine if they 
 address the GC heap.  This check could be removed when compiling in 
 release mode, so there will be no performance degradation.

That's the current system, and it's basically what I've been saying all 
this time. I guess I was unclear. But you can't remove the check in 
release mode:

static import std.c.stdlib;
static import std.gc;

void main () {
    // gc memory
    auto o = new Object();
    o = null;
    // not gc memory
    void* ptr = std.c.stdlib.malloc(128);
    // gc memory (and memory leak)
    ptr = null;
    ptr = std.gc.malloc(512);
    // At this point, no reference to o, so it's deleted.
    // The first malloc'd memory still exists and can't ever be
    // collected.
    ptr = std.gc.malloc(8);
    // Now the gc collected the previous 512-byte buffer; of course,
    // the 128-byte buffer still exists.
}

 So there's no need to dissallow new and delete for classes and we don't 
 need struct polymorphism.

Well, we don't need any kind of polymorphism, but quite separate from 
the rest of the requests, struct polymorphism would be useful. Though I 
wouldn't refer to them as structs if they're polymorphic, since you 
really can't put them on the stack, and so they have to be reference 
types with value semantics.

 You're saying:
 class Foo {
    int i;
 }

 Foo f = new Foo();
 int* i_ptr = &f.i;

 That would be a compile error? f is not fixed; I don't care if the 
 bits in i_ptr change, or the bits in the reference f. Why should I?

 Just because I took the address of f.i and stored it in an unfixed 
 pointer, the garbage collector, which has full authority to change the 
 pointer I just got, can't move *f?

 Why?

 
 I'm not really sure what you are asking.  If the GC moves the relocates 
 f, then i_ptr no longer points the appropriate location.  Isn't that 
 obvious?
 
 Are you suggesting that the GC relocate i_ptr as well?  No GC I know of 
 relocates raw pointers, so there's probably a good technical reason why 
 they don't.  I'm not a GC expert though.

Because no language besides D allows you to take the address of an class 
member and has native garbage collection, and D doesn't have a moving 
collector. There's the Boehm collector for C++, but that's not a moving 

those blocks you don't have a garbage collector running. And...that's 
it. Maybe we'll see something revolutionary with Objective-C 2.0, but 
probably not, and it's not here yet.

You're already moving raw pointers, so you may as well move all of them. 
Otherwise you're eliminating a decently general use case or causing 
random segfaults or losing a lot of efficiency in memory layout (which 
will make future collections quicker, too) for pretty much nothing.

Dec 23 2007

"Craig Black" <craigblack2 cox.net> writes:

"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:fkmoj4$k0u$1 digitalmars.com...
 Craig Black wrote:
 Polymorphic structs *have* to be reference types, unless you determine 
 stack layout at runtime. And not only that, you have to modify stack 
 layout after you've created a stack frame. The only saving grace is that 
 you won't have to do that for a stack frame higher than the current one.

 Right, but that's not a problem if you disallow polymorphism for stack 
 objects.  This is what C++ does and it works very well.  Rather than 
 generating a run-time assertion, your code would simply not compile.  If 
 you want polymorphism then you have to instantiate then you would have to 
 instantiate the struct on the heap.

 Ideally you'd determine whether your polymorphic struct has inheritors or 
 base classes and, if so, put it on the heap, else put it on the stack.

That's the approach for both C++ structs and classes.  But this decision is 
made by the programmer, not the compiler.  I don't see how the compiler 
could do this, since it is impossible to have knowledge of subclasses that 
exist in an external library.  The compiler would have to revert to a worst 
case, and put everything on the heap.

 This is why it'd be better to keep structs as they are, but have by-value 
 classes.

By-value classes is just another way to do the same thing, but inferior IMO.

 Anyway, this is all moot anyway, because I've thought of an easier 
 solution. Pointers can be checked at run-time to determine if they 
 address the GC heap.  This check could be removed when compiling in 
 release mode, so there will be no performance degradation.

 That's the current system, and it's basically what I've been saying all 
 this time. I guess I was unclear. But you can't remove the check in 
 release mode:

Jeez.  It's like we've been speaking two different languages the whole time. 
I'm not talking about turning off the garbage collector in release mode. 
I'm talking about run-time checks that prohibit raw pointers from pointing 
to the GC heap.  The same thing you said should be "undefined behavior". 
Thus, it would be undefined behavior in release mode, but in debug mode 
there would be a check.  Like array bounds-checking.

 So there's no need to dissallow new and delete for classes and we don't 
 need struct polymorphism.

 Well, we don't need any kind of polymorphism, but quite separate from the 
 rest of the requests, struct polymorphism would be useful. Though I 
 wouldn't refer to them as structs if they're polymorphic, since you really 
 can't put them on the stack, and so they have to be reference types with 
 value semantics.

Sorry, but I don't see the novelty of "reference types with value 
semantics".  What would it be useful for?  The reason I am pushing 
improvements to structs is that I know it will allow for more versatile 
aggregate types that aren't allocated on the heap.  It's important that they 
are not allocated on the heap because that is more efficient.  From my 
perspective, your proposal does nothing for performance, since there is 
still a heap allocation.

 You're saying:
 class Foo {
    int i;
 }

 Foo f = new Foo();
 int* i_ptr = &f.i;

 That would be a compile error? f is not fixed; I don't care if the bits 
 in i_ptr change, or the bits in the reference f. Why should I?

 Just because I took the address of f.i and stored it in an unfixed 
 pointer, the garbage collector, which has full authority to change the 
 pointer I just got, can't move *f?

 Why?

 I'm not really sure what you are asking.  If the GC moves the relocates 
 f, then i_ptr no longer points the appropriate location.  Isn't that 
 obvious?

 Are you suggesting that the GC relocate i_ptr as well?  No GC I know of 
 relocates raw pointers, so there's probably a good technical reason why 
 they don't.  I'm not a GC expert though.

 Because no language besides D allows you to take the address of an class 
 member and has native garbage collection, and D doesn't have a moving 
 collector. There's the Boehm collector for C++, but that's not a moving 

 those blocks you don't have a garbage collector running. And...that's it. 
 Maybe we'll see something revolutionary with Objective-C 2.0, but probably 
 not, and it's not here yet.

 You're already moving raw pointers, so you may as well move all of them. 
 Otherwise you're eliminating a decently general use case or causing random 
 segfaults or losing a lot of efficiency in memory layout (which will make 
 future collections quicker, too) for pretty much nothing.

Heck, you may be right about this.  Like I said I'm no GC expert.

Dec 23 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Christopher Wright wrote:
 While there is a fixed reference to the GC object, it is pinned. If that 
 reference is rebound to another GC object, the original object is 
 unpinned and the new one is pinned.
 
 How to mark these is a difficult problem. On a 64-bit machine, I'd say 
 you just use the most significant bit as a flag; you're not going to use 
 petabytes of address space.

Since "fixedness" as proposed would be a compile-time property, and you 
already need metadata to find pointers to implement a moving GC, such a 
flag could be in that metadata instead of in the pointer itself. (The 
OffsetTypeInfo could say "there's a pointer at offset 8, of type Object, 
and it's fixed")
If run-time pinning is used instead (where whether the GC cell pointed 
to by a pointer is pinned is not known at compile time), it could be a 
simple (synchronized) counter that starts out at 0 for each memory cell, 
that's incremented when pinned and decremented when unpinned. The GC is 
then only allowed to move cells whose counter is 0.

Dec 23 2007

Christopher Wright <dhasenan gmail.com> writes:

Frits van Bommel wrote:
 Christopher Wright wrote:
 While there is a fixed reference to the GC object, it is pinned. If 
 that reference is rebound to another GC object, the original object is 
 unpinned and the new one is pinned.

 How to mark these is a difficult problem. On a 64-bit machine, I'd say 
 you just use the most significant bit as a flag; you're not going to 
 use petabytes of address space.

 
 Since "fixedness" as proposed would be a compile-time property, and you 
 already need metadata to find pointers to implement a moving GC, such a 
 flag could be in that metadata instead of in the pointer itself. (The 
 OffsetTypeInfo could say "there's a pointer at offset 8, of type Object, 
 and it's fixed")

Yes, I thought of that. Currently, however, offTi isn't populated. Just 
like the Interface* that's supposed to be the first element of each 
interface's vtbl pointer. It would be useful if it existed, but no cheese.

 If run-time pinning is used instead (where whether the GC cell pointed 
 to by a pointer is pinned is not known at compile time), it could be a 
 simple (synchronized) counter that starts out at 0 for each memory cell, 
 that's incremented when pinned and decremented when unpinned. The GC is 
 then only allowed to move cells whose counter is 0.

You would do both. During a collection, you mark each block to see if 
it's referenced, and mark it again if it's got a fixed reference. Then 
you collect every section that's not referenced and optionally move the 
sections that aren't marked as pinned to a more advantageous layout.

If you are proposing a compile-time garbage collector, one that 
determines when to delete an object using static analysis, I will be 
quite impressed if you come up with an implementation.

Dec 23 2007

Christopher Wright <dhasenan gmail.com> writes:

This is to fix the stuff I botched with my other reply.

Craig Black wrote:
 a) Disallow overloading new and delete for classes, and make classes 
 strictly for GC, perhaps with an exception for classes instantiated on 
 the stack using scope.

You are just making sure that the garbage collector is handling all 
memory that is associated with objects. I don't see a point to this. The 
collector won't try to move memory that it doesn't control.

You could do bad things with overloading new/delete, but those are 
hardly unique situations.

 Then the compiler could disallow taking the address of a class field,
 since we know the resulting pointer would pointer to the GC heap.
 Note that this would be a compile-time check, and so would not degrade
 run-time performance.

That's not necessary, since you can map a source range to a destination 
range. It would be a simplifying assumption that improves performance, 
by changing two comparisons and an addition for each pointer (plus one 
subtraction per move) to one comparison and one assignment for each 
pointer. But you're going through a large amount of memory, so that's 
not a serious concern, I think.

 a) Preceding a pointer declaration with fixed would allow that pointer 
 to take the address in the GC heap.

It'd be undefined behavior to do otherwise. But safe as long as no 
collections happen before you use the pointer.

 
 The fixed pointer will have to know whether or not it points to GC 
 memory so that it doesn't pin non-GC objects.  Using the first idea, we 
 can determine at compile time whether a pointer points to the heap or not.
 
 Yes, this would be a big change, but not as big as const IMO.  I feel if 
 any feature warrants breaking some code, it would be high-performance 
 GC.  But maybe someone else can find a solution that doesn't break 
 compatibility.
 
 Thoughts?
 
 -Craig

Dec 23 2007

"Craig Black" <craigblack2 cox.net> writes:

"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:fklr1g$1uat$1 digitalmars.com...
 This is to fix the stuff I botched with my other reply.

 Craig Black wrote:
 a) Disallow overloading new and delete for classes, and make classes 
 strictly for GC, perhaps with an exception for classes instantiated on 
 the stack using scope.

 You are just making sure that the garbage collector is handling all memory 
 that is associated with objects. I don't see a point to this. The 
 collector won't try to move memory that it doesn't control.

It has nothing to do with the garbage collector run-time stuff.  It is 
giving the compiler more information so that compile-time checks can be 
done.

 You could do bad things with overloading new/delete, but those are hardly 
 unique situations.

Granted.  There are so many ways to mess things up with pointers.  It's hard 
to make a systems language "safe".  I guess my approach would be to make it 
"safer".

 Then the compiler could disallow taking the address of a class field,
 since we know the resulting pointer would pointer to the GC heap.
 Note that this would be a compile-time check, and so would not degrade
 run-time performance.

 That's not necessary, since you can map a source range to a destination 
 range. It would be a simplifying assumption that improves performance, by 
 changing two comparisons and an addition for each pointer (plus one 
 subtraction per move) to one comparison and one assignment for each 
 pointer. But you're going through a large amount of memory, so that's not 
 a serious concern, I think.

 a) Preceding a pointer declaration with fixed would allow that pointer to 
 take the address in the GC heap.

 It'd be undefined behavior to do otherwise. But safe as long as no 
 collections happen before you use the pointer.

Unless I am missing something, this would require a run-time check for each 
pointer assignment or pointer arithmetic operation.  Personally, I would 
make every effort to avoid this overhead.  Pointers should be lightweight 
and fast.

Dec 23 2007

"Craig Black" <craigblack2 cox.net> writes:

 Unless I am missing something, this would require a run-time check for 
 each pointer assignment or pointer arithmetic operation.  Personally, I 
 would make every effort to avoid this overhead.  Pointers should be 
 lightweight and fast.

I hereby detract this statement.  These run-time checks could be optional. 
They could be like asserts or bounds-checking that is removed in release 
mode.  Then all that compile-time stuff I mentioned before would be 
unnecessary I think.

Dec 23 2007

Christopher Wright <dhasenan gmail.com> writes:

Craig Black wrote:
 
 "Christopher Wright" <dhasenan gmail.com> wrote in message 
 news:fklr1g$1uat$1 digitalmars.com...
 This is to fix the stuff I botched with my other reply.

 Craig Black wrote:
 a) Disallow overloading new and delete for classes, and make classes 
 strictly for GC, perhaps with an exception for classes instantiated 
 on the stack using scope.

 You are just making sure that the garbage collector is handling all 
 memory that is associated with objects. I don't see a point to this. 
 The collector won't try to move memory that it doesn't control.

 
 It has nothing to do with the garbage collector run-time stuff.  It is 
 giving the compiler more information so that compile-time checks can be 
 done.

The point of overloading new and delete is to work around the garbage 
collector. It's not smart enough, it doesn't have the knowledge about my 
specific problem, so I'm going to fix the problem myself.

The most common situation is, I want to manually allocate the memory for 
the variable, and I don't want the garbage collector to know about this 
object.

 You could do bad things with overloading new/delete, but those are 
 hardly unique situations.

 
 Granted.  There are so many ways to mess things up with pointers.  It's 
 hard to make a systems language "safe".  I guess my approach would be to 
 make it "safer".

I don't see that. I mean, if D didn't have arrays, you couldn't ever get 
an array bounds error; if it didn't have pointers, you would have 
trouble segfaulting; but those are too useful.

I've manually created objects without using the new operator or the 
constructor. It's ugly. It's error-prone. Overloading new is safer, when 
you just want to control how the memory is allocated. (I couldn't avoid 
it because I didn't want to use a constructor.)

 Then the compiler could disallow taking the address of a class field,
 since we know the resulting pointer would pointer to the GC heap.
 Note that this would be a compile-time check, and so would not degrade
 run-time performance.

 That's not necessary, since you can map a source range to a 
 destination range. It would be a simplifying assumption that improves 
 performance, by changing two comparisons and an addition for each 
 pointer (plus one subtraction per move) to one comparison and one 
 assignment for each pointer. But you're going through a large amount 
 of memory, so that's not a serious concern, I think.

 a) Preceding a pointer declaration with fixed would allow that 
 pointer to take the address in the GC heap.

 It'd be undefined behavior to do otherwise. But safe as long as no 
 collections happen before you use the pointer.

 
 Unless I am missing something, this would require a run-time check for 
 each pointer assignment or pointer arithmetic operation. 

Undefined behavior means there are no checks preventing it, but bad 
things can happen if you do it, so be careful, and it isn't Walter's 
fault if it explodes in your face.

The point is, it might be a useful thing, in which case you wouldn't 
want to disallow it. But either way, checking it is too expensive, so 
calling it undefined behavior should suffice.

Dec 23 2007

"Craig Black" <craigblack2 cox.net> writes:

 Granted.  There are so many ways to mess things up with pointers.  It's 
 hard to make a systems language "safe".  I guess my approach would be to 
 make it "safer".

 I don't see that. I mean, if D didn't have arrays, you couldn't ever get 
 an array bounds error; if it didn't have pointers, you would have trouble 
 segfaulting; but those are too useful.

 I've manually created objects without using the new operator or the 
 constructor. It's ugly. It's error-prone. Overloading new is safer, when 
 you just want to control how the memory is allocated. (I couldn't avoid it 
 because I didn't want to use a constructor.)

I was not proposing that anyone rely on "manually created objects without 
using the new operator or the constructor".  I was proposing that the 
capability to allocate an object on the malloc heap would be moved to 
structs, so that structs behaved like C++ aggregate types.

However, I realize now that this is no longer necessary, because a run-time 
check that enforces pointer restrictions could be removed in release mode.

 Undefined behavior means there are no checks preventing it, but bad things 
 can happen if you do it, so be careful, and it isn't Walter's fault if it 
 explodes in your face.

 The point is, it might be a useful thing, in which case you wouldn't want 
 to disallow it. But either way, checking it is too expensive, so calling 
 it undefined behavior should suffice.

Again the run-time check could be removed in release mode, so no harm done.

Dec 23 2007

D Programming

C/C++ Programming

Other

digitalmars.D - My Language Feature Requests