D - Should all Arrays be more like Objects

Mike Wynn (32/32) Jan 22 2002 As currently implemented Static arrays are on the stack, and you can wit...

Walter (16/47) Jan 22 2002 without

Russ Lewis (10/10) Jan 23 2002 The GC could check reference counts on stack objects as they go out of s...

Mike Wynn (24/34) Jan 23 2002 As I read it the GC is a Mark and sweep, conservative collector.

Russ Lewis (33/45) Jan 23 2002 You're right, sorry. I forgot about it. I keep thinking about it as a

Walter (9/16) Jan 23 2002 counting

Juan Carlos Arevalo Baeza (21/24) Jan 23 2002 only
Russ Lewis (58/61) Jan 24 2002 I've been thinking about that. I see several possible solutions, none o...

Walter (48/109) Jan 24 2002 The solutions you propose I think will work, but there'll be considerabl...

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

As currently implemented Static arrays are on the stack, and you can without
warnings pass them to functions or constructors as dynamic arrays, which may
cause the dynamic array reference to remain live after the stack frame has
gone.
more alarming than this is my hypothosis that static arrays within objects
are not noticed by the GC and thus if I hold onto a reference to an array
within an object but not the object. The array will may be freed from under
my feet by the GC.

I understand the need for structures and static arrays, and the desire to
get down and dirty. but you can do that and remain safe if constrained a
little.
as point of reference C and C++ have the register keyword, and in C is means
behaves LIKE a register (not IS A register) but like one; so it has no
address
but in C++ it means optimise this value's use, the former is IMHO a much
nicer behaviour, as a programmer I can restrict what I can do with the
variable allowing the compiler to optimise if it feels in the mood to do so.

if the objectives of D are to reduce the pit falls when writing OO
programmes then
and classes as all dynamic ...
either all Arrays and Structures are dynamic too
or an automatic or stack attribute for method parameters
two rules (I think), references to stack items can not be stored in dynamic
items
and references to stack items of the current frame can not be put into any
items parameters refer to.
so a stack parameter can only be put into stack items of the current frame
or passed to methods that take stack parameters without manual intervention.
The programmer would have to create a dynamic copy of the static item. i.e.
dup the array.
because D is static, the compiler could do the analysis for you.

Mike.

Jan 22 2002

"Walter" <walter digitalmars.com> writes:

"Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
news:a2k4gh$cc7$1 digitaldaemon.com...
 As currently implemented Static arrays are on the stack, and you can

without
 warnings pass them to functions or constructors as dynamic arrays, which

may
 cause the dynamic array reference to remain live after the stack frame has
 gone.

Yes, that is a general problem with anything on the stack you take a pointer
to.


 more alarming than this is my hypothosis that static arrays within objects
 are not noticed by the GC and thus if I hold onto a reference to an array
 within an object but not the object. The array will may be freed from

under
 my feet by the GC.

No, the GC will recognize that case.


 I understand the need for structures and static arrays, and the desire to
 get down and dirty. but you can do that and remain safe if constrained a
 little.
 as point of reference C and C++ have the register keyword, and in C is

means
 behaves LIKE a register (not IS A register) but like one; so it has no
 address
 but in C++ it means optimise this value's use, the former is IMHO a much
 nicer behaviour, as a programmer I can restrict what I can do with the
 variable allowing the compiler to optimise if it feels in the mood to do

so.
 if the objectives of D are to reduce the pit falls when writing OO
 programmes then
 and classes as all dynamic ...
 either all Arrays and Structures are dynamic too
 or an automatic or stack attribute for method parameters
 two rules (I think), references to stack items can not be stored in

dynamic
 items
 and references to stack items of the current frame can not be put into any
 items parameters refer to.
 so a stack parameter can only be put into stack items of the current frame
 or passed to methods that take stack parameters without manual

intervention.
 The programmer would have to create a dynamic copy of the static item.

i.e.
 dup the array.
 because D is static, the compiler could do the analysis for you.

Those are good points, and I thought a long time about them in the design of
D. I eventually came to the conclusion that needing to make copies of arrays
and objects all the time was too much overhead.

Jan 22 2002

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

The GC could check reference counts on stack objects as they go out of scope.
You could throw a "ReferencesRemainException" if somebody is still pointing to
it.

This is much like my idea that delete should throw an exception if references
remain....

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]

Jan 23 2002

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

As I read it the GC is a Mark and sweep, conservative collector.
so no ref counts to use, to add them would waste lots of time for all the
cases when not required and to only have them for stack arrays would require
the same analysis as for automatic promotion to heap arrays.

so the only way to see if it's live is to walk the whole heap every time you
exit a non leaf method that has passed an array as a parameter.
and to put more spanners in the works, a conservative collector can get it
wrong so it may not be live, you may just have an int with the same value.

Who gets the exception., the thread who's stack has been used or the
thread(s) with the array.

instead of going to the expence of trying to detect it, why no disallow it,
or make sure it never happens
if all array where objects except those within objects as the GC keeps the
parent alive
you never have a problem.
the compiler can optimise arrays in leaf methods, and with good memory and
one that does not coalesce too argessively then the right size block will be
available.

Mike.

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C4EE3D3.30B44B51 deming-os.org...
 The GC could check reference counts on stack objects as they go out of

scope.
 You could throw a "ReferencesRemainException" if somebody is still

pointing to
 it.

 This is much like my idea that delete should throw an exception if

references
 remain....

 --
 The Villagers are Online! villagersonline.com

 .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
 .[ (a version.of(English).(precise.more)) is(possible) ]
 ?[ you want.to(help(develop(it))) ]

Jan 23 2002

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Mike Wynn wrote:

 As I read it the GC is a Mark and sweep, conservative collector.
 so no ref counts to use, to add them would waste lots of time for all the
 cases when not required and to only have them for stack arrays would require
 the same analysis as for automatic promotion to heap arrays.

You're right, sorry.  I forgot about it.  I keep thinking about it as a
reference-counting collector because I think that it should be one :)

NOTE TO WALTER: David Bacon of IBM Research has implemented a reference counting
collector that does loop detection without any other algorithms.  That means no
mark-and-sweep, stack traversal, or anything like that.  It's called Recycler
and is implemented on the Jalapeno JVM.

http://www.research.ibm.com/people/d/dfb   is his home page, and
http://www.research.ibm.com/people/d/dfb/papers.html#Bacon01Java describes how
he does it.

 instead of going to the expence of trying to detect it, why no disallow it,
 or make sure it never happens
 if all array where objects except those within objects as the GC keeps the
 parent alive
 you never have a problem.
 the compiler can optimise arrays in leaf methods, and with good memory and
 one that does not coalesce too argessively then the right size block will be
 available.

I thought about proposing a function pointer type modifier that would allow you
to declare pointers (passed as parameters) that couldn't be saved after the
function returns...syntax error if you did otherwise.  However, that breaks this
code:

int StartJob(char command[]);   // returns an id
int WaitForJobToFinish(int ID);
void MyFunc()
{
    char command = "do stuff";
    int id = StartJob(command);
    WaitForJobToFinish(id);
};

You want to ensure that all pointers to 'command' are gone before MyFunc()
exits.  But we can't because the thing we're calling has to save a pointer to it
between the calls to StartJob() and WaitForJobToFinish()...unless we force
StartJob() to copy the buffer, but then we're getting unnecessary overhead
again.

I'm open to ideas here...

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]

Jan 23 2002

"Walter" <walter digitalmars.com> writes:

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C4F034C.BD4D051A deming-os.org...
 NOTE TO WALTER: David Bacon of IBM Research has implemented a reference

counting
 collector that does loop detection without any other algorithms.  That

means no
 mark-and-sweep, stack traversal, or anything like that.  It's called

Recycler
 and is implemented on the Jalapeno JVM.

 http://www.research.ibm.com/people/d/dfb   is his home page, and
 http://www.research.ibm.com/people/d/dfb/papers.html#Bacon01Java describes

how
 he does it.

Thanks! But I don't think reference counting can work with D, since the only
pointer to an object might be an interior pointer. I don't see how that can
work with reference counts.

Jan 23 2002

"Juan Carlos Arevalo Baeza" <jcab roningames.com> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a2nt3q$2v4f$2 digitaldaemon.com...

 Thanks! But I don't think reference counting can work with D, since the

only
 pointer to an object might be an interior pointer. I don't see how that

can
 work with reference counts.

   Well... There's an option that sounds interesting to me: assuming that
pointers are somewhat deprecated, as they should be in a higher-level
language, they could be implemented more inefficiently as a pointer to an
object, and an offet within that object. Then, reference counting would
work.

   This could be even more interesting if you consider the possibility of
implementing an extra level on indirection in object pointers (references),
where the pointer is an index into an array of active "real object
pointers". In that case, you can easily implement a self-compacting heap.

   Anyway, it's an idea.

   Oh, and you could still have pointers into real memory by making the
object be at position zero. For OS programming (if you still want to have
that as an option), you could also have a special type (call it "address")
which would be just an untyped raw pointer (just like in assembler), and
would be used with casts.

Salutaciones,
                         JCAB

Jan 23 2002

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Walter wrote:

 Thanks! But I don't think reference counting can work with D, since the only
 pointer to an object might be an interior pointer. I don't see how that can
 work with reference counts.

I've been thinking about that.  I see several possible solutions, none of which
is optimal.  Be sure to read (5), which I think has the most promise:

SIMPLE IDEAS:

1) All pointers are actually double pointers; one element points to the actual
thing being referenced, while one points to the garbage-collectable thing which
it is inside of;

2) When setting (or unsetting) a pointer, do a lookup into a table of objects to
find what the pointer is inside of.  If this was a hash table, then you would
have a reasonably good chance of getting an immediate hit as long as the pointer
is NOT an interior pointer...

MORE COMPLEX ONES:

4) Require that all reference-counted objects be on n-byte boundaries (perhaps
32 byte boundaries).  Then, you can find the start of the object as follows:
    a) Put a magic value as the first 32 bits of every reference counted object

that table.
    c) When looking for the "root" of an object, first go back to the 32-byte
boundary and check for the magic value.  If you don't find it, keep going back
32 bytes at a time until you do.
    d) When you find a magic value, guess that it's the root, and look at the
"pointer into the memory table."  Confirm that this pointer is to a valid
address in the table and that the table at that entry point points back to this
object.  If so, then you have the object root.  If not, return to (c) and
continue.

5) Make all members of an object also be garbage collected.  However, you set a
flag (or a pointer or offset value) that indicates that the method of "garbage
collecting" a member is simply to decrement the reference count of the parent
object.
    a) When using a pointer to an object to get a pointer to a member of it, you
check the count on the child first.  If this count is 0, then you increment the
count on the parent as well as the child, as you are making the child "refer" to
the parent.
    b) Later, when you lose the pointer to the child, you decrement its count.
If the child ends up "garbage collected", then you don't delete the
memory...instead, you decrement the count on the parent.  Then the parent might
or might not get collected.
    c) Dynamic arrays will have to be implemented as 12 bytes instead of 8; each
array has a pointer to a "root" (root of the entire array, not this slice of
it), and a pointer to the "root of this slice", and a length field.  Thus, you
can always know where to find the reference count when an array reference is
destroyed.
    d) You can't have any GC overhead inside structs (so while you can GC a
struct, you can't GC its members).  Also, it's not practical to have a GC record
for every 'int' inside a class.  Perhaps if you take a pointer to such elements,
you have to uses a special non-GC pointer.  This isn't actually so bad, if you
think about it...
    e) If you absolutely demand GC-ing members of a class:   For small members
of classes, you could group them into n-byte chunks (each of which starts with a
GC record) and require that all classes be allocated on a similar boundary.
Thus, except for pointers into a struct (which have to be non-GC pointers), you
can always find a GC record by taking the pointer and rounding down to the
n-byte boundary.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]

Jan 24 2002

"Walter" <walter digitalmars.com> writes:

The solutions you propose I think will work, but there'll be considerable
added overhead in both execution time and memory consumption. Also, some of
the interoperability with C objects will no longer be possible.

I've written some programs using conservative GC and benchmarked them
against fast reference counting schemes (written by others). My GC
outperformed them, so I'm not sold on the idea. The GC I used is the one
that comes with the Phobos library source, and it's a pretty basic one.
Upgrading it to a generational collector would produce a big increment in
performance.

-Walter

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C5039CF.397B4A12 deming-os.org...
 Walter wrote:

 Thanks! But I don't think reference counting can work with D, since the


only
 pointer to an object might be an interior pointer. I don't see how that


can
 work with reference counts.

 I've been thinking about that.  I see several possible solutions, none of

which
 is optimal.  Be sure to read (5), which I think has the most promise:

 SIMPLE IDEAS:

 1) All pointers are actually double pointers; one element points to the

actual
 thing being referenced, while one points to the garbage-collectable thing

which
 it is inside of;

 2) When setting (or unsetting) a pointer, do a lookup into a table of

objects to
 find what the pointer is inside of.  If this was a hash table, then you

would
 have a reasonably good chance of getting an immediate hit as long as the

pointer
 is NOT an interior pointer...

 MORE COMPLEX ONES:

 4) Require that all reference-counted objects be on n-byte boundaries

(perhaps
 32 byte boundaries).  Then, you can find the start of the object as

follows:
     a) Put a magic value as the first 32 bits of every reference counted

object


to
 that table.
     c) When looking for the "root" of an object, first go back to the

32-byte
 boundary and check for the magic value.  If you don't find it, keep going

back
 32 bytes at a time until you do.
     d) When you find a magic value, guess that it's the root, and look at

the
 "pointer into the memory table."  Confirm that this pointer is to a valid
 address in the table and that the table at that entry point points back to

this
 object.  If so, then you have the object root.  If not, return to (c) and
 continue.

 5) Make all members of an object also be garbage collected.  However, you

set a
 flag (or a pointer or offset value) that indicates that the method of

"garbage
 collecting" a member is simply to decrement the reference count of the

parent
 object.
     a) When using a pointer to an object to get a pointer to a member of

it, you
 check the count on the child first.  If this count is 0, then you

increment the
 count on the parent as well as the child, as you are making the child

"refer" to
 the parent.
     b) Later, when you lose the pointer to the child, you decrement its

count.
 If the child ends up "garbage collected", then you don't delete the
 memory...instead, you decrement the count on the parent.  Then the parent

might
 or might not get collected.
     c) Dynamic arrays will have to be implemented as 12 bytes instead of

8; each
 array has a pointer to a "root" (root of the entire array, not this slice

of
 it), and a pointer to the "root of this slice", and a length field.  Thus,

you
 can always know where to find the reference count when an array reference

is
 destroyed.
     d) You can't have any GC overhead inside structs (so while you can GC

a
 struct, you can't GC its members).  Also, it's not practical to have a GC

record
 for every 'int' inside a class.  Perhaps if you take a pointer to such

elements,
 you have to uses a special non-GC pointer.  This isn't actually so bad, if

you
 think about it...
     e) If you absolutely demand GC-ing members of a class:   For small

members
 of classes, you could group them into n-byte chunks (each of which starts

with a
 GC record) and require that all classes be allocated on a similar

boundary.
 Thus, except for pointers into a struct (which have to be non-GC

pointers), you
 can always find a GC record by taking the pointer and rounding down to the
 n-byte boundary.

 --
 The Villagers are Online! villagersonline.com

 .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
 .[ (a version.of(English).(precise.more)) is(possible) ]
 ?[ you want.to(help(develop(it))) ]

Jan 24 2002

D Programming

C/C++ Programming

Other

D - Should all Arrays be more like Objects