www.digitalmars.com         C & C++   DMDScript  

D - Should all Arrays be more like Objects

reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
As currently implemented Static arrays are on the stack, and you can without
warnings pass them to functions or constructors as dynamic arrays, which may
cause the dynamic array reference to remain live after the stack frame has
gone.
more alarming than this is my hypothosis that static arrays within objects
are not noticed by the GC and thus if I hold onto a reference to an array
within an object but not the object. The array will may be freed from under
my feet by the GC.

I understand the need for structures and static arrays, and the desire to
get down and dirty. but you can do that and remain safe if constrained a
little.
as point of reference C and C++ have the register keyword, and in C is means
behaves LIKE a register (not IS A register) but like one; so it has no
address
but in C++ it means optimise this value's use, the former is IMHO a much
nicer behaviour, as a programmer I can restrict what I can do with the
variable allowing the compiler to optimise if it feels in the mood to do so.

if the objectives of D are to reduce the pit falls when writing OO
programmes then
and classes as all dynamic ...
either all Arrays and Structures are dynamic too
or an automatic or stack attribute for method parameters
two rules (I think), references to stack items can not be stored in dynamic
items
and references to stack items of the current frame can not be put into any
items parameters refer to.
so a stack parameter can only be put into stack items of the current frame
or passed to methods that take stack parameters without manual intervention.
The programmer would have to create a dynamic copy of the static item. i.e.
dup the array.
because D is static, the compiler could do the analysis for you.

Mike.
Jan 22 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
news:a2k4gh$cc7$1 digitaldaemon.com...
 As currently implemented Static arrays are on the stack, and you can
without
 warnings pass them to functions or constructors as dynamic arrays, which
may
 cause the dynamic array reference to remain live after the stack frame has
 gone.
Yes, that is a general problem with anything on the stack you take a pointer to.
 more alarming than this is my hypothosis that static arrays within objects
 are not noticed by the GC and thus if I hold onto a reference to an array
 within an object but not the object. The array will may be freed from
under
 my feet by the GC.
No, the GC will recognize that case.
 I understand the need for structures and static arrays, and the desire to
 get down and dirty. but you can do that and remain safe if constrained a
 little.
 as point of reference C and C++ have the register keyword, and in C is
means
 behaves LIKE a register (not IS A register) but like one; so it has no
 address
 but in C++ it means optimise this value's use, the former is IMHO a much
 nicer behaviour, as a programmer I can restrict what I can do with the
 variable allowing the compiler to optimise if it feels in the mood to do
so.
 if the objectives of D are to reduce the pit falls when writing OO
 programmes then
 and classes as all dynamic ...
 either all Arrays and Structures are dynamic too
 or an automatic or stack attribute for method parameters
 two rules (I think), references to stack items can not be stored in
dynamic
 items
 and references to stack items of the current frame can not be put into any
 items parameters refer to.
 so a stack parameter can only be put into stack items of the current frame
 or passed to methods that take stack parameters without manual
intervention.
 The programmer would have to create a dynamic copy of the static item.
i.e.
 dup the array.
 because D is static, the compiler could do the analysis for you.
Those are good points, and I thought a long time about them in the design of D. I eventually came to the conclusion that needing to make copies of arrays and objects all the time was too much overhead.
Jan 22 2002
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
The GC could check reference counts on stack objects as they go out of scope.
You could throw a "ReferencesRemainException" if somebody is still pointing to
it.

This is much like my idea that delete should throw an exception if references
remain....

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]
Jan 23 2002
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
As I read it the GC is a Mark and sweep, conservative collector.
so no ref counts to use, to add them would waste lots of time for all the
cases when not required and to only have them for stack arrays would require
the same analysis as for automatic promotion to heap arrays.

so the only way to see if it's live is to walk the whole heap every time you
exit a non leaf method that has passed an array as a parameter.
and to put more spanners in the works, a conservative collector can get it
wrong so it may not be live, you may just have an int with the same value.

Who gets the exception., the thread who's stack has been used or the
thread(s) with the array.

instead of going to the expence of trying to detect it, why no disallow it,
or make sure it never happens
if all array where objects except those within objects as the GC keeps the
parent alive
you never have a problem.
the compiler can optimise arrays in leaf methods, and with good memory and
one that does not coalesce too argessively then the right size block will be
available.

Mike.

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C4EE3D3.30B44B51 deming-os.org...
 The GC could check reference counts on stack objects as they go out of
scope.
 You could throw a "ReferencesRemainException" if somebody is still
pointing to
 it.

 This is much like my idea that delete should throw an exception if
references
 remain....

 --
 The Villagers are Online! villagersonline.com

 .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
 .[ (a version.of(English).(precise.more)) is(possible) ]
 ?[ you want.to(help(develop(it))) ]
Jan 23 2002
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Mike Wynn wrote:

 As I read it the GC is a Mark and sweep, conservative collector.
 so no ref counts to use, to add them would waste lots of time for all the
 cases when not required and to only have them for stack arrays would require
 the same analysis as for automatic promotion to heap arrays.
You're right, sorry. I forgot about it. I keep thinking about it as a reference-counting collector because I think that it should be one :) NOTE TO WALTER: David Bacon of IBM Research has implemented a reference counting collector that does loop detection without any other algorithms. That means no mark-and-sweep, stack traversal, or anything like that. It's called Recycler and is implemented on the Jalapeno JVM. http://www.research.ibm.com/people/d/dfb is his home page, and http://www.research.ibm.com/people/d/dfb/papers.html#Bacon01Java describes how he does it.
 instead of going to the expence of trying to detect it, why no disallow it,
 or make sure it never happens
 if all array where objects except those within objects as the GC keeps the
 parent alive
 you never have a problem.
 the compiler can optimise arrays in leaf methods, and with good memory and
 one that does not coalesce too argessively then the right size block will be
 available.
I thought about proposing a function pointer type modifier that would allow you to declare pointers (passed as parameters) that couldn't be saved after the function returns...syntax error if you did otherwise. However, that breaks this code: int StartJob(char command[]); // returns an id int WaitForJobToFinish(int ID); void MyFunc() { char command = "do stuff"; int id = StartJob(command); WaitForJobToFinish(id); }; You want to ensure that all pointers to 'command' are gone before MyFunc() exits. But we can't because the thing we're calling has to save a pointer to it between the calls to StartJob() and WaitForJobToFinish()...unless we force StartJob() to copy the buffer, but then we're getting unnecessary overhead again. I'm open to ideas here... -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Jan 23 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C4F034C.BD4D051A deming-os.org...
 NOTE TO WALTER: David Bacon of IBM Research has implemented a reference
counting
 collector that does loop detection without any other algorithms.  That
means no
 mark-and-sweep, stack traversal, or anything like that.  It's called
Recycler
 and is implemented on the Jalapeno JVM.

 http://www.research.ibm.com/people/d/dfb   is his home page, and
 http://www.research.ibm.com/people/d/dfb/papers.html#Bacon01Java describes
how
 he does it.
Thanks! But I don't think reference counting can work with D, since the only pointer to an object might be an interior pointer. I don't see how that can work with reference counts.
Jan 23 2002
next sibling parent "Juan Carlos Arevalo Baeza" <jcab roningames.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a2nt3q$2v4f$2 digitaldaemon.com...

 Thanks! But I don't think reference counting can work with D, since the
only
 pointer to an object might be an interior pointer. I don't see how that
can
 work with reference counts.
Well... There's an option that sounds interesting to me: assuming that pointers are somewhat deprecated, as they should be in a higher-level language, they could be implemented more inefficiently as a pointer to an object, and an offet within that object. Then, reference counting would work. This could be even more interesting if you consider the possibility of implementing an extra level on indirection in object pointers (references), where the pointer is an index into an array of active "real object pointers". In that case, you can easily implement a self-compacting heap. Anyway, it's an idea. Oh, and you could still have pointers into real memory by making the object be at position zero. For OS programming (if you still want to have that as an option), you could also have a special type (call it "address") which would be just an untyped raw pointer (just like in assembler), and would be used with casts. Salutaciones, JCAB
Jan 23 2002
prev sibling parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Walter wrote:

 Thanks! But I don't think reference counting can work with D, since the only
 pointer to an object might be an interior pointer. I don't see how that can
 work with reference counts.
I've been thinking about that. I see several possible solutions, none of which is optimal. Be sure to read (5), which I think has the most promise: SIMPLE IDEAS: 1) All pointers are actually double pointers; one element points to the actual thing being referenced, while one points to the garbage-collectable thing which it is inside of; 2) When setting (or unsetting) a pointer, do a lookup into a table of objects to find what the pointer is inside of. If this was a hash table, then you would have a reasonably good chance of getting an immediate hit as long as the pointer is NOT an interior pointer... MORE COMPLEX ONES: 4) Require that all reference-counted objects be on n-byte boundaries (perhaps 32 byte boundaries). Then, you can find the start of the object as follows: a) Put a magic value as the first 32 bits of every reference counted object that table. c) When looking for the "root" of an object, first go back to the 32-byte boundary and check for the magic value. If you don't find it, keep going back 32 bytes at a time until you do. d) When you find a magic value, guess that it's the root, and look at the "pointer into the memory table." Confirm that this pointer is to a valid address in the table and that the table at that entry point points back to this object. If so, then you have the object root. If not, return to (c) and continue. 5) Make all members of an object also be garbage collected. However, you set a flag (or a pointer or offset value) that indicates that the method of "garbage collecting" a member is simply to decrement the reference count of the parent object. a) When using a pointer to an object to get a pointer to a member of it, you check the count on the child first. If this count is 0, then you increment the count on the parent as well as the child, as you are making the child "refer" to the parent. b) Later, when you lose the pointer to the child, you decrement its count. If the child ends up "garbage collected", then you don't delete the memory...instead, you decrement the count on the parent. Then the parent might or might not get collected. c) Dynamic arrays will have to be implemented as 12 bytes instead of 8; each array has a pointer to a "root" (root of the entire array, not this slice of it), and a pointer to the "root of this slice", and a length field. Thus, you can always know where to find the reference count when an array reference is destroyed. d) You can't have any GC overhead inside structs (so while you can GC a struct, you can't GC its members). Also, it's not practical to have a GC record for every 'int' inside a class. Perhaps if you take a pointer to such elements, you have to uses a special non-GC pointer. This isn't actually so bad, if you think about it... e) If you absolutely demand GC-ing members of a class: For small members of classes, you could group them into n-byte chunks (each of which starts with a GC record) and require that all classes be allocated on a similar boundary. Thus, except for pointers into a struct (which have to be non-GC pointers), you can always find a GC record by taking the pointer and rounding down to the n-byte boundary. -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Jan 24 2002
parent "Walter" <walter digitalmars.com> writes:
The solutions you propose I think will work, but there'll be considerable
added overhead in both execution time and memory consumption. Also, some of
the interoperability with C objects will no longer be possible.

I've written some programs using conservative GC and benchmarked them
against fast reference counting schemes (written by others). My GC
outperformed them, so I'm not sold on the idea. The GC I used is the one
that comes with the Phobos library source, and it's a pretty basic one.
Upgrading it to a generational collector would produce a big increment in
performance.

-Walter

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C5039CF.397B4A12 deming-os.org...
 Walter wrote:

 Thanks! But I don't think reference counting can work with D, since the
only
 pointer to an object might be an interior pointer. I don't see how that
can
 work with reference counts.
I've been thinking about that. I see several possible solutions, none of
which
 is optimal.  Be sure to read (5), which I think has the most promise:

 SIMPLE IDEAS:

 1) All pointers are actually double pointers; one element points to the
actual
 thing being referenced, while one points to the garbage-collectable thing
which
 it is inside of;

 2) When setting (or unsetting) a pointer, do a lookup into a table of
objects to
 find what the pointer is inside of.  If this was a hash table, then you
would
 have a reasonably good chance of getting an immediate hit as long as the
pointer
 is NOT an interior pointer...

 MORE COMPLEX ONES:

 4) Require that all reference-counted objects be on n-byte boundaries
(perhaps
 32 byte boundaries).  Then, you can find the start of the object as
follows:
     a) Put a magic value as the first 32 bits of every reference counted
object

to
 that table.
     c) When looking for the "root" of an object, first go back to the
32-byte
 boundary and check for the magic value.  If you don't find it, keep going
back
 32 bytes at a time until you do.
     d) When you find a magic value, guess that it's the root, and look at
the
 "pointer into the memory table."  Confirm that this pointer is to a valid
 address in the table and that the table at that entry point points back to
this
 object.  If so, then you have the object root.  If not, return to (c) and
 continue.

 5) Make all members of an object also be garbage collected.  However, you
set a
 flag (or a pointer or offset value) that indicates that the method of
"garbage
 collecting" a member is simply to decrement the reference count of the
parent
 object.
     a) When using a pointer to an object to get a pointer to a member of
it, you
 check the count on the child first.  If this count is 0, then you
increment the
 count on the parent as well as the child, as you are making the child
"refer" to
 the parent.
     b) Later, when you lose the pointer to the child, you decrement its
count.
 If the child ends up "garbage collected", then you don't delete the
 memory...instead, you decrement the count on the parent.  Then the parent
might
 or might not get collected.
     c) Dynamic arrays will have to be implemented as 12 bytes instead of
8; each
 array has a pointer to a "root" (root of the entire array, not this slice
of
 it), and a pointer to the "root of this slice", and a length field.  Thus,
you
 can always know where to find the reference count when an array reference
is
 destroyed.
     d) You can't have any GC overhead inside structs (so while you can GC
a
 struct, you can't GC its members).  Also, it's not practical to have a GC
record
 for every 'int' inside a class.  Perhaps if you take a pointer to such
elements,
 you have to uses a special non-GC pointer.  This isn't actually so bad, if
you
 think about it...
     e) If you absolutely demand GC-ing members of a class:   For small
members
 of classes, you could group them into n-byte chunks (each of which starts
with a
 GC record) and require that all classes be allocated on a similar
boundary.
 Thus, except for pointers into a struct (which have to be non-GC
pointers), you
 can always find a GC record by taking the pointer and rounding down to the
 n-byte boundary.

 --
 The Villagers are Online! villagersonline.com

 .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
 .[ (a version.of(English).(precise.more)) is(possible) ]
 ?[ you want.to(help(develop(it))) ]
Jan 24 2002