Is this something that actually exists? Or just an idea being thrown
around? Or just a marketing slogan for using D without pointers?
It's an idea for defining a language subset that is enforceable by the
compiler.
... in other words, an upcoming compiler switch ?
I like that idea very much.
Mar 23 2008
↑ ↓ ←→ Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
Bill Baxter wrote:
Is this something that actually exists? Or just an idea being thrown
around? Or just a marketing slogan for using D without pointers?
It's an idea for defining a language subset that is enforceable by the
compiler.
Hmm. This sounds like a big can of worms to me. I'd rather have bug
fixes or further progress on the WalterAndrei.pdf agenda than a compiler
switch that forces me to use a restricted subset of D. I guess I'm just
not "enterprise" enough. :-)
--bb
Mar 25 2008
↑ ↓ ← → Sean Kelly <sean invisibleduck.org> writes:
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
Walter Bright wrote:
Bill Baxter wrote:
Is this something that actually exists? Or just an idea being thrown
around? Or just a marketing slogan for using D without pointers?
It's an idea for defining a language subset that is enforceable by the
compiler.
fixes or further progress on the WalterAndrei.pdf agenda than a compiler
switch that forces me to use a restricted subset of D. I guess I'm just
not "enterprise" enough. :-)
While you're at it, you may as well ask for all the features in the 1.0 spec
to actually be implemented. Why aim low?
Sean
Mar 25 2008
↑↓←→ Lars Ivar Igesund <larsivar igesund.net> writes:
In light of this I find it rather incredible that printf still is exposed
through Object.
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
Mar 23 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Lars Ivar Igesund wrote:
In light of this I find it rather incredible that printf still is exposed
through Object.
printf would have to be removed from the safe D subset.
In light of this I find it rather incredible that printf still is
exposed through Object.
Yay!!! That's great news
Mar 23 2008
↑ ↓ ← → Lars Ivar Igesund <larsivar igesund.net> writes:
Walter Bright wrote:
Lars Ivar Igesund wrote:
In light of this I find it rather incredible that printf still is exposed
through Object.
printf would have to be removed from the safe D subset.
Point is that there is a useless _and_ unsafe global symbol/presence (only
in Phobos though).
--
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
Mar 23 2008
↑↓←→ Charles D Hixson <charleshixsn earthlink.net> writes:
http://www.digitalmars.com/d/2.0/safed.html
A bit light on the syntax, but I completely agree with
avoiding pointers whenever possible. Unchecked casts and
unions are also clearly dangerous, if less so.
Unfortunately, unchecked casts seem to be necessary for some
purposes. It's not just efficiency, I literally don't see
any way around them. (I'm considering a union to be a kind of
unchecked cast.)
P.S.: What is a checked cast? If I want to consider a long
as an array of bytes:
byte[long.sizeof] val;
it ought to be safe to later consider that array of bytes as a
long. But what's a safe (checked) way to do it? (I'm not
talking about conversion. I don't want to end up with an
array of longs.)
PPS: When talking about casts or type conversions, please
make it explicit whether the same bit pattern is maintained.
I often read those descriptions, and realize that I can't
figure out exactly what is happening. With C I was always
certain that I was just telling the compiler to think about
the same piece of memory differently, and that nothing
actually changed. With more modern languages, a lot more
magic happens under the hood, and I'm no longer as certain
what's going on. I often wonder after reading the
documentation whether the same bit pattern is maintained, or
whether an equivalent value is produced. E.g., I've never
tried casting a float to a long. What would it produce? I
can't predict. I'd often prefer to deal with ulongs or ucents
rather than byte arrays, but then at other times I need to
address particular bytes out of that value. Because I don't
really understand a cast, I just use byte arrays (well,
ubyte). But it's "sloppier". Generally I'm dealing with a
unitary entity, and needing to think of it as an array all the
time is uncomfortable. (I'd even like a notation for dealing
with particular bits, though I haven't needed that recently.)
Note that this isn't a request for a change in how things act,
but rather in how they are documented.
I *suspect* that cast is presumed to be defined by C, and that
it means "Think about they type differently, but don't change
it's bit pattern", but I'm never quite certain.
Mar 23 2008
↑ ↓← → Bill Baxter <dnewsgroup billbaxter.com> writes:
Charles D Hixson wrote:
With C I was always certain that I was just telling the
compiler to think about the same piece of memory differently, and that
nothing actually changed.
Not so:
int x = (int)2.5;
Not the same bit pattern after as before (not even if it was 2.0 on the
right). But maybe you're only talking about casts with pointers in them?
--bb
Mar 23 2008
↑↓←→ Walter Bright <newshound1 digitalmars.com> writes:
Charles D Hixson wrote:
PPS: When talking about casts or type conversions, please make it
explicit whether the same bit pattern is maintained. I often read those
descriptions, and realize that I can't figure out exactly what is
happening. With C I was always certain that I was just telling the
compiler to think about the same piece of memory differently, and that
nothing actually changed.
int i = 3;
double d = (double)i;
changes the bit pattern in C (as well as in D)
Mar 23 2008
↑ ↓ ← → Charles D Hixson <charleshixsn earthlink.net> writes:
Walter Bright wrote:
Charles D Hixson wrote:
PPS: When talking about casts or type conversions, please make it
explicit whether the same bit pattern is maintained. I often read
those descriptions, and realize that I can't figure out exactly what
is happening. With C I was always certain that I was just telling the
compiler to think about the same piece of memory differently, and that
nothing actually changed.
int i = 3;
double d = (double)i;
changes the bit pattern in C (as well as in D)
So I didn't understand them in C, either. I'd been
understanding them as a temporary substitute for a union for
doing unsafe conversions. (And avoiding them as much as
possible. Apparently just as well.)
Mar 23 2008
↑ ↓ ←→ Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Charles D Hixson wrote:
PPS: When talking about casts or type conversions, please make it
explicit whether the same bit pattern is maintained. I often read those
descriptions, and realize that I can't figure out exactly what is
happening. With C I was always certain that I was just telling the
compiler to think about the same piece of memory differently, and that
nothing actually changed. With more modern languages, a lot more magic
happens under the hood, and I'm no longer as certain what's going on. I
often wonder after reading the documentation whether the same bit
pattern is maintained, or whether an equivalent value is produced.
E.g., I've never tried casting a float to a long. What would it
produce? I can't predict. I'd often prefer to deal with ulongs or
ucents rather than byte arrays, but then at other times I need to
address particular bytes out of that value. Because I don't really
understand a cast, I just use byte arrays (well, ubyte). But it's
"sloppier". Generally I'm dealing with a unitary entity, and needing to
think of it as an array all the time is uncomfortable. (I'd even like a
notation for dealing with particular bits, though I haven't needed that
recently.)
Note that this isn't a request for a change in how things act, but
rather in how they are documented.
I *suspect* that cast is presumed to be defined by C, and that it means
"Think about they type differently, but don't change it's bit pattern",
but I'm never quite certain.
Yup, this is one of the C legacy behaviors/mentality that I've found
ever more irritant. I would prefer that the language syntax would better
distinguish between opaque casts (no bit changes) and conversion casts
(bit changes, since a conversion is made).
--
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
PPS: When talking about casts or type conversions, please make it
explicit whether the same bit pattern is maintained. I often read
those descriptions, and realize that I can't figure out exactly what
is happening. With C I was always certain that I was just telling the
compiler to think about the same piece of memory differently, and that
nothing actually changed. With more modern languages, a lot more
magic happens under the hood, and I'm no longer as certain what's
going on. I often wonder after reading the documentation whether the
same bit pattern is maintained, or whether an equivalent value is
produced. E.g., I've never tried casting a float to a long. What
would it produce? I can't predict. I'd often prefer to deal with
ulongs or ucents rather than byte arrays, but then at other times I
need to address particular bytes out of that value. Because I don't
really understand a cast, I just use byte arrays (well, ubyte). But
it's "sloppier". Generally I'm dealing with a unitary entity, and
needing to think of it as an array all the time is uncomfortable.
(I'd even like a notation for dealing with particular bits, though I
haven't needed that recently.)
Note that this isn't a request for a change in how things act, but
rather in how they are documented.
I *suspect* that cast is presumed to be defined by C, and that it
means "Think about they type differently, but don't change it's bit
pattern", but I'm never quite certain.
Yup, this is one of the C legacy behaviors/mentality that I've found
ever more irritant. I would prefer that the language syntax would better
distinguish between opaque casts (no bit changes) and conversion casts
(bit changes, since a conversion is made).
I'm with you. The worst example: Despite the name, C++'s reinterpret_cast<>
sometimes does _conversion casts_, involving bit changes.
How will one assert that a library function is certified for usage in
SafeD even if it uses unsafe constructs? New keywords?
There'll have to be some syntax for that.
I hope you mean that once such a library function is Certified, it gets
some kind of [at least compiler readable] property stating that it is
SafeD compliant?
As to the matter of certifying the function, in trivial cases the
compiler could do it.
But with some important special cases, I can see no other way than to
manually scrutinize the source code. Think of a complicated function
(say, some hairy tensor math operation, maybe an FFT function, or
whatever that's nontrivial) that internally needs to do "unsafe"
operations or even in-line asm, but that has been deemed safe by
Authoritative Professionals.
Mar 24 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
Georg Wrede wrote:
Walter Bright wrote:
Julio César Carrascal Urquijo wrote:
How will one assert that a library function is certified for usage in
SafeD even if it uses unsafe constructs? New keywords?
There'll have to be some syntax for that.
I hope you mean that once such a library function is Certified, it gets
some kind of [at least compiler readable] property stating that it is
SafeD compliant?
Yes.
As to the matter of certifying the function, in trivial cases the
compiler could do it.
There's no reason to syntactically mark a function as safe if the
compiler can verify it.
But with some important special cases, I can see no other way than to
manually scrutinize the source code. Think of a complicated function
(say, some hairy tensor math operation, maybe an FFT function, or
whatever that's nontrivial) that internally needs to do "unsafe"
operations or even in-line asm, but that has been deemed safe by
Authoritative Professionals.
Yes, but the idea is to reduce the scope as much as possible of where
you have to manually look for unsafe code.
How will one assert that a library function is certified for usage
in SafeD even if it uses unsafe constructs? New keywords?
There'll have to be some syntax for that.
I hope you mean that once such a library function is Certified, it
gets some kind of [at least compiler readable] property stating that
it is SafeD compliant?
Yes.
As to the matter of certifying the function, in trivial cases the
compiler could do it.
There's no reason to syntactically mark a function as safe if the
compiler can verify it.
But with some important special cases, I can see no other way than to
manually scrutinize the source code. Think of a complicated function
(say, some hairy tensor math operation, maybe an FFT function, or
whatever that's nontrivial) that internally needs to do "unsafe"
operations or even in-line asm, but that has been deemed safe by
Authoritative Professionals.
Yes, but the idea is to reduce the scope as much as possible of where
you have to manually look for unsafe code.
I'm simply thrilled!
Mar 24 2008
↑ ↓ ← → Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
There's no reason to syntactically mark a function as safe if the
compiler can verify it.
Well, if you have a mix of safe and unsafe code, there is -- you want to
tell the compiler to verify some stuff and ignore other stuff.
Mar 25 2008
↑ ↓ ← → Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
I must admit that the presentation went way over my head, but I wanted
to add that those 'code constraints' reminds me of 'capabilities' which
are an interesting way to provide granular security.
renoX
I must admit that the presentation went way over my head, but I wanted
to add that those 'code constraints' reminds me of 'capabilities' which
are an interesting way to provide granular security.
renoX
Just to explain what I mean for this, here's a video-talk about Joe, a
Java's subset intended to enable capability-style programming:
http://uk.youtube.com/watch?v=EGX2I31OhBEhttp://code.google.com/p/joe-e/
The goal is is different but it's still interesting: if I understood
correctly SafeD's goal would be to offer a Java-like safety, but even
Java's like safety isn't enough to provide fine grain security so
researchers made Joe a Java's subset for this.
So maybe Joe's design would be interesting as an inspiration to SafeD
(and if it's too limiting, there could be several level of 'safety').
Regards,
renoX
Would it be possible to use unsafe libraries with a safe D subset? I'm
think of something like being able to specify what libraries you can
link to that are unsafe somehow, perhaps through a dll or wrapper.
-Joel
Would it be possible to use unsafe libraries with a safe D subset? I'm
think of something like being able to specify what libraries you can
link to that are unsafe somehow, perhaps through a dll or wrapper.
-Joel
What I'm getting at is that some libs may be safe for one project but
may not be considered safe in another. For instance, someone may want
to prevent writing to a file, in another this may be perfectly
acceptable however they may want security in other areas.
-Joel
I wish D uses the concept of UNSAFE modules similar (or the same) as in
the excellent Modula-3 language (which influenced all modern OO
languages, IMHO).
Modula-3 has a keyword "UNSAFE" which is used in a module or interface
declaration to indicate that it is _unsafe_. In other words it informs
us that the module/interface uses unsafe features of the language. If
module or interface is not labeled "unsafe" (default behavior), it is
assumed to be safe.
This simple concept is amazing, as is the fact that Modula-3 (as a
language) had this (plus numerous other modern features) two decades ago.
Kind regards
PS. I am not advocating Modula-3 here. I do C++ (16 years), D, Java and
PHP programming, mostly.
Mar 25 2008
↑ ↓←→ Walter Bright <newshound1 digitalmars.com> writes:
Dejan Lekic wrote:
This simple concept is amazing, as is the fact that Modula-3 (as a
language) had this (plus numerous other modern features) two decades ago.
Features from Modula-3 make me nervous, as Modula-3 was an abject
failure. I don't know why M3 failed, so I am suspicious of adopting
features without thoroughly understanding why M3 failed.
Mar 25 2008
↑ ↓ ←→ Charles D Hixson <charleshixsn earthlink.net> writes:
Walter Bright wrote:
Dejan Lekic wrote:
This simple concept is amazing, as is the fact that Modula-3 (as a
language) had this (plus numerous other modern features) two decades ago.
Features from Modula-3 make me nervous, as Modula-3 was an abject
failure. I don't know why M3 failed, so I am suspicious of adopting
features without thoroughly understanding why M3 failed.
That particular feature, however, seems reasonable. But I
question the granularity.
I think that unsafe should be an attribute that could be set
statically for a class or a function. And read whenever you
felt like reading it, of course. But the question is, should
it be set by the compiler, or by the programmer. And what
form should a check for safe vs. unsafe take?
One obvious form would be an assert check, but this presumes
that unsafe code is allowed by default, and is only rejected
if a test fails.
Another plausible approach would be to by default forbid the
calling of unsafe code, and to have a pragma that allows it.
This is a bit messy, as pragmas don't have nice boundary
conditions. (Rather like extern. You either enable it for a
statement/block, or for all succeeding statement/blocks.
Still, a variation of pragma, the
pragma ( Identifier , ExpressionList )
form, could be nice. I'm thinking of it having the form:
pragma (AllowUnsafe, class1, class2, etc,
classk::unsafeFunction, etc.);
to allow the use of the specified unsafe code. The reason for
this form of the pragma is so that you don't accidentally
allow all unsafe functions throughout and entire module by
mistake. Naturally the normal rules would still apply: If
you declared a pragma within a block, it would only apply
within that block.
If you're adopting the pragma form, then the code could be
detected as unsafe by the compiler, and only it's use
forbidden (unless specifically allowed).
I haven't been able to think of any reason for dynamically
allowing/forbidding the use of unsafe code, though I suppose
such is possible. In such a case it would probably be
appropriate to use whatever form of allowance is decided upon,
and then to forbid it with standard if statement (or try/catch
blocks).
Mar 25 2008
↑ ↓ ←→ Walter Bright <newshound1 digitalmars.com> writes:
I suspect that having a granular level of specifying safe/unsafe is the
wrong approach. Doing it at the module level is easy to understand, and
has the side effect of encouraging better modularization of safe/unsafe
code.
I suspect that having a granular level of specifying safe/unsafe is the
wrong approach. Doing it at the module level is easy to understand, and
has the side effect of encouraging better modularization of safe/unsafe
code.
That would mean that modules in Phobos would be either Safe or Unsafe.
Or[/and] that some modules would have to have two versions, one Safe and
the other Unsafe.
More practical would be (especially if the compiler has access to
info/hints to the safety of individual functions) to have it per function.
Then the compiler could discriminate, depending on if the user had used
the -Safe switch or not.
Personally, I'd advocate having Safety on App Level. Either an app is
SafeD compliant, or not. I have a hard time seeing anything in between.
Op Tue, 25 Mar 2008 11:18:22 -0700, schreef Walter Bright:
Features from Modula-3 make me nervous, as Modula-3 was an abject
failure. I don't know why M3 failed, so I am suspicious of adopting
features without thoroughly understanding why M3 failed.
I think this calls for a compiler switch that forces bounds checking on,
whether or not debug or release mode. You don't want to be shipping debug code.
Also, a pragma or similar would be helpful; if it could enable bounds-checking
from that point until the end of the scope, you could completely rely on bounds
checks in your code, like you can do in other modern languages.
Finally, would SafeD have to disallow destructors? If you're accessing garbage
collected memory in a destructor, you're asking for trouble. It's not always as
simple as directly disallowing access these fields. Calling functions can
indirectly cause the memory to be accessed. However, if you're not accessing GC
memory in a destructor, you're probably using some lower-level functions, which
are generally untrustworthy.
--
Chris Miller <chris dprogramming.com>
Mar 25 2008
↑ ↓ ←→ Chris Miller <lordSaurontheGreat gmail.com> writes:
Chris Miller Wrote:
On Sat, 22 Mar 2008 21:47:59 -0700
Walter Bright <newshound1 digitalmars.com> wrote:
I think this calls for a compiler switch that forces bounds checking on,
whether or not debug or release mode. You don't want to be shipping debug code.
Also, a pragma or similar would be helpful; if it could enable bounds-checking
from that point until the end of the scope, you could completely rely on bounds
checks in your code, like you can do in other modern languages.
Finally, would SafeD have to disallow destructors? If you're accessing garbage
collected memory in a destructor, you're asking for trouble. It's not always as
simple as directly disallowing access these fields. Calling functions can
indirectly cause the memory to be accessed. However, if you're not accessing GC
memory in a destructor, you're probably using some lower-level functions, which
are generally untrustworthy.
I thought the garbage collector only freed memory after the destructor had been
run.
DMD 1.00 spec document, page 104 says "The garbage collector calls the
destructor when the object is deleted."
Did this change? I haven't checked for an update to my copy of the spec
document in some time.
-- the "other" Chris Miller
Mar 25 2008
↑ ↓←→ Chris Miller <lordSaurontheGreat gmail.com> writes:
Chris Miller Wrote:
http://www.digitalmars.com/d/2.0/class.html#Destructor
"When the garbage collector calls a destructor for an object of a class that
has members that are references to garbage collected objects, those references
are no longer valid. This means that destructors cannot reference sub objects.
This is because that the garbage collector does not collect objects in any
guaranteed order, so there is no guarantee that any pointers or references to
any other garbage collected objects exist when the garbage collector runs the
destructor for an object."
Oh, well then you should just move the code you wanted to run in the parent
object's destructor to the sub-object's destructor, though it does leave my
wondering why the garbage collector would have killed the sub-object first,
since if there is code in the parent object's destructor that uses the
sub-object, that should count as a reference, so the sub-object should still be
in scope - I think. Or does code in a destructor not count towards keeping
heap references in scope?
-- the "other" Chris Miller
Mar 25 2008
↑ ↓ ←→ Chris Miller <lordSaurontheGreat gmail.com> writes:
Brad Roberts Wrote:
Chris Miller wrote:
Chris Miller Wrote:
http://www.digitalmars.com/d/2.0/class.html#Destructor
"When the garbage collector calls a destructor for an object of a class that
has members that are references to garbage collected objects, those references
are no longer valid. This means that destructors cannot reference sub objects.
This is because that the garbage collector does not collect objects in any
guaranteed order, so there is no guarantee that any pointers or references to
any other garbage collected objects exist when the garbage collector runs the
destructor for an object."
Oh, well then you should just move the code you wanted to run in the parent
object's destructor to the sub-object's destructor, though it does leave my
wondering why the garbage collector would have killed the sub-object first,
since if there is code in the parent object's destructor that uses the
sub-object, that should count as a reference, so the sub-object should still be
in scope - I think. Or does code in a destructor not count towards keeping
heap references in scope?
-- the "other" Chris Miller
'code' never holds references, strictly data. The question you have, I
believe, is when obj1 holds a reference to obj2 (be it a pointer or a
reference), what happens when the last reference to obj1 disappears.
Both objects become 'dead' and the order of cleanup is undefined. In
that case, you can question the choice, but consider any cycle of
references where together they are all referenced. When the last
external reference to the cycle disappears the entire set is collectable
with no 'right' order. Based on the fact that in some cases there's no
'right' order, the collector takes the approach of 'no order can be
assumed for any object being collected'. The result being that no
references to other collectable memory can be referenced from a
destructor. Because of the collector, you don't need to either.
Destructors only need to manage non-collectable entities.
Ah, so you're saying that reference counting is run according to scope relative
to the sequential process line as it propagates down from main(). That makes
sense, since those objects are no longer referenced by anything from the
running scope, they're all garbage collectible.
So yes, then the only solution would be to introduce a new rule to the garbage
collector stipulating that objects with the least number of references from
other objects in the collectable scope should be deleted first. This would
inherently slow down the garbage collector, thus presenting you with a tradeoff.
What about something like scope(exit) object.doThis(); ? Sort of like Tango's
FileConduit? Would that work to be able to manually force a ordered deletion?
It wouldn't be automatic by the garbage collector. In my experience a scope
statement is a lot easier than the C++ way of carefully discovering when
something is finally out of scope! Just a thought. I don't know, but it's a
very interesting problem to think about!
== Quote from Chris Miller (lordSaurontheGreat gmail.com)'s article
...
assumed for any object being collected'. The result being that no
references to other collectable memory can be referenced from a
destructor. Because of the collector, you don't need to either.
Destructors only need to manage non-collectable entities.
Ah, so you're saying that reference counting is run according to scope
relative to the sequential process line as it propagates down from
main(). That makes sense, since those objects are no longer referenced
by anything from the running scope, they're all garbage collectible.
So yes, then the only solution would be to introduce a new rule to the
garbage collector stipulating that objects with the least number of
references from other objects in the collectable scope should be deleted
first. This would inherently slow down the garbage collector, thus
presenting you with a tradeoff.
What about something like scope(exit) object.doThis(); ? Sort of like
Tango's FileConduit? Would that work to be able to manually force a
ordered deletion? It wouldn't be automatic by the garbage collector.
In my experience a scope statement is a lot easier than the C++ way of
carefully discovering when something is finally out of scope! Just a
thought. I don't know, but it's a very interesting problem to think
about!
Garbage collection can be done (to a degree) with reference counting, but
systems which use GC more often use an algorithm called "mark / sweep"
(or a variation of it.) This has various advantages over reference
counting. First, if you have circular pointers, such as blocks A and B
pointing to each other, they would never be collected in a reference
counting system; secondly, every time a reference counting system needs
to copy or overwrite a pointer, a count has to be adjusted somewhere,
and this means that a lot of algorithms run more slowly -- for example,
copying an array of pointers requires all the pointed-to blocks to be
"touched" in order to increase their refcounts.
The way mark/sweep collection works is that the program stack is scanned,
along with all global variables, and the stacks of any threads and thread
local storage, plus the actual machine registers. This scan looks for
pointers or (in a "conservative" design) things which *might* be pointers.
Any of the blocks of (dynamically allocated) memory that is pointed to by
one of these (stack, global, or register) pointers is then considered to
also be "live". This block is then scanned as well for pointers, and so
on. Basically, if there is a series of pointers from a live area to a
specific block, that block is also live, recursively. Any other area is
considered "garbage", because if you don't have a pointer to it anywhere
in the live set, you can't still be using it.
Mark/sweep doesn't have the problem of circular reference counts, but on
the other hand, there is no way to figure out which blocks were parents
of which others, so that destruction order is essentially random. There
is no way to fix this reliably, especially since the circular links can
mean that the objects are both parents of each other -- so what order do
they get destroyed in?
In languages like D and C++, the garbage collection is conservative, and
this means that any pointer-sized block will be considered a pointer if
it contains a value that is an address of any area that might still be
live.
This means that a few "garbage" blocks can be kept around because they
are in a memory area which is pointed to by some random integer. This
also means that in practice, even for sets of memory blocks that are
not circular and might have an obvious destruction order could not be
guaranteed to be destroyed in the right order, because a random integer
in any of them might make them seem circular, so relying on any policy
that tried to detect circularity would be unreliable at best.
There are some ways to work around this though -- If object A needs
to call B.close(), then a reference to object B can be stored in a
global (or static) variable as well as in object A. After object A's
destructor calls B.close(), then it should remove B from the global
table, thus making B garbage (B will not actually get freed until the
next GC cycle.) (Make sure the table doesn't need to synchronize for
the "remove" step since that could cause deadlock, so an associative
array is probably a bad idea.)
Kevin
On Mon, 31 Mar 2008 05:13:50 +0200, Kevin Bealer <kevinbealer gmail.com>
wrote:
In languages like D and C++, the garbage collection is conservative, and
this means that any pointer-sized block will be considered a pointer if
it contains a value that is an address of any area that might still be
live.
This means that a few "garbage" blocks can be kept around because they
are in a memory area which is pointed to by some random integer. This
also means that in practice, even for sets of memory blocks that are
not circular and might have an obvious destruction order could not be
guaranteed to be destroyed in the right order, because a random integer
in any of them might make them seem circular, so relying on any policy
that tried to detect circularity would be unreliable at best.
In D, an allocated block can be marked as containing no pointers, and thus
will not be scanned for things looking pointers. I don't know how good the
GC/compiler is at understanding these things on its own, but at least a
programmer can make it more informed.
--Simen
On Mon, 31 Mar 2008 05:13:50 +0200, Kevin Bealer <kevinbealer gmail.com>
wrote:
In languages like D and C++, the garbage collection is conservative, and
this means that any pointer-sized block will be considered a pointer if
it contains a value that is an address of any area that might still be
live.
This means that a few "garbage" blocks can be kept around because they
are in a memory area which is pointed to by some random integer. This
also means that in practice, even for sets of memory blocks that are
not circular and might have an obvious destruction order could not be
guaranteed to be destroyed in the right order, because a random integer
in any of them might make them seem circular, so relying on any policy
that tried to detect circularity would be unreliable at best.
In D, an allocated block can be marked as containing no pointers, and thus
will not be scanned for things looking pointers. I don't know how good the
GC/compiler is at understanding these things on its own, but at least a
programmer can make it more informed.
--Simen
Yes, and I forgot to mention that at one point the D compiler got an upgrade
that made it more "precise" (the opposite of "conservative" in GC jargon). It
now knows that a dynamically allocated array of a primitive type (such as an
int[] array or one of the string types) is not a pointer.
So if you are dealing with lots of large matrices or lots of string data, you
should suffer much less from the effects of the accidental retention of blocks.
I don't think it affects for structures, classes, or stack data, though it would
probably be straightforward to do this for structures that had no pointers.
Kevin
Mar 31 2008
↑ ↓ ← → Kevin Bealer <kevinbealer gmail.com> writes:
Kevin Bealer Wrote:
Simen Kjaeraas Wrote:
On Mon, 31 Mar 2008 05:13:50 +0200, Kevin Bealer <kevinbealer gmail.com>
wrote:
In languages like D and C++, the garbage collection is conservative, and
this means that any pointer-sized block will be considered a pointer if
it contains a value that is an address of any area that might still be
live.
This means that a few "garbage" blocks can be kept around because they
are in a memory area which is pointed to by some random integer. This
also means that in practice, even for sets of memory blocks that are
not circular and might have an obvious destruction order could not be
guaranteed to be destroyed in the right order, because a random integer
in any of them might make them seem circular, so relying on any policy
that tried to detect circularity would be unreliable at best.
In D, an allocated block can be marked as containing no pointers, and thus
will not be scanned for things looking pointers. I don't know how good the
GC/compiler is at understanding these things on its own, but at least a
programmer can make it more informed.
--Simen
Yes, and I forgot to mention that at one point the D compiler got an upgrade
that made it more "precise" (the opposite of "conservative" in GC jargon). It
now knows that a dynamically allocated array of a primitive type (such as an
int[] array or one of the string types) is not a pointer.
Oops -- I meant the data in the array -- the array actually is a pointer/length.
So if you are dealing with lots of large matrices or lots of string data, you
should suffer much less from the effects of the accidental retention of blocks.
I don't think it affects for structures, classes, or stack data, though it
would
probably be straightforward to do this for structures that had no pointers.
Kevin
There are some ways to work around this though -- If object A needs
to call B.close(), then a reference to object B can be stored in a
global (or static) variable as well as in object A. After object A's
destructor calls B.close(), then it should remove B from the global
table, thus making B garbage (B will not actually get freed until the
next GC cycle.) (Make sure the table doesn't need to synchronize for
the "remove" step since that could cause deadlock, so an associative
array is probably a bad idea.)
What about this solution:
class Parent
{
private Child mychild;
this()
{
mychild = new Child();
addRoot(mychild);
}
~this()
{
mychild.close();
removeRoot(mychild);
}
}
That would not destroy mychild until the Parent object destructor is
called. One could even add "delete mychild" at the end of ~this to make
it even more efficient, if we suppose that mychild is useless anyway
without its Parent (which is often the case, consider a I/O object, i.e.
a socket, attached to a "parent" that control it).
The downside of this solution is that you're basically falling back to
manual memory management (sort of), but I don't think it is avoidable.
By the way, I never understood why the I/O objets in Phobos and Tango
(i.e. FileConduit) do not automatically close() themselves in their
destructors... that would solve the problem in the majority of cases.
Mar 31 2008
↑ ↓ ←→ Sean Kelly <sean invisibleduck.org> writes:
== Quote from Kevin Bealer (kevinbealer gmail.com)'s article
Mark/sweep doesn't have the problem of circular reference counts, but on
the other hand, there is no way to figure out which blocks were parents
of which others, so that destruction order is essentially random. There
is no way to fix this reliably, especially since the circular links can
mean that the objects are both parents of each other -- so what order do
they get destroyed in?
This should probably be qualified by saying that there is no efficient way to
find the parent object, and in the case of circular chains, there may not even
be a parent object. However, in the general case the GC could theoretically
maintain enough bookkeeping information to destroy objects hierarchically
when possible. But I suspect that doing so would greatly increase both the
memory needed for a scan and the time involved. Thus, guaranteeing in-
order destruction simply isn't practical, even when it's possible.
Sean
Mar 31 2008
↑ ↓ ← → Kevin Bealer <kevinbealer gmail.com> writes:
Sean Kelly Wrote:
== Quote from Kevin Bealer (kevinbealer gmail.com)'s article
Mark/sweep doesn't have the problem of circular reference counts, but on
the other hand, there is no way to figure out which blocks were parents
of which others, so that destruction order is essentially random. There
is no way to fix this reliably, especially since the circular links can
mean that the objects are both parents of each other -- so what order do
they get destroyed in?
This should probably be qualified by saying that there is no efficient way to
find the parent object, and in the case of circular chains, there may not even
be a parent object. However, in the general case the GC could theoretically
maintain enough bookkeeping information to destroy objects hierarchically
when possible. But I suspect that doing so would greatly increase both the
memory needed for a scan and the time involved. Thus, guaranteeing in-
order destruction simply isn't practical, even when it's possible.
Sean
Yes -- especially for a precise collector in a language like Java, but I'd be
reluctant to make assumptions about the data in this way in a conservative
collector; I think the bookkeeping you mention would require the GC to
have nearly perfect type information, which would make it impossible to
do many things.
If object A has noisy data that happens to point into B's area, then you have
a relationship from A to B. If B also has a relationship to A, then you get a
cycle, and the GC has to fall back on unordered destruction. If B expects
the GC to make sure that A still exists this could be a problem.
Unless the user specifically marks these relationships, it seems difficult to
do this reliably. This could be done of course, but as long as we need to
change B, why not make B handle cleanup in its own destructor?
For extra control over B's behavior, the B object could take a function
pointer (not a delegate) that told it how to clean itself up. I think we can
assume that function pointers won't point to anything collectable. Maybe
something like this:
class B {
// I don't remember the function pointer syntax...
void setDestructHandler(void (*foo)(B*));
};
Kevin
Mar 31 2008
↑ ↓ ← → Brad Roberts <braddr puremagic.com> writes:
Chris Miller wrote:
Chris Miller Wrote:
http://www.digitalmars.com/d/2.0/class.html#Destructor
"When the garbage collector calls a destructor for an object of a class that
has members that are references to garbage collected objects, those references
are no longer valid. This means that destructors cannot reference sub objects.
This is because that the garbage collector does not collect objects in any
guaranteed order, so there is no guarantee that any pointers or references to
any other garbage collected objects exist when the garbage collector runs the
destructor for an object."
Oh, well then you should just move the code you wanted to run in the parent
object's destructor to the sub-object's destructor, though it does leave my
wondering why the garbage collector would have killed the sub-object first,
since if there is code in the parent object's destructor that uses the
sub-object, that should count as a reference, so the sub-object should still be
in scope - I think. Or does code in a destructor not count towards keeping
heap references in scope?
-- the "other" Chris Miller
'code' never holds references, strictly data. The question you have, I
believe, is when obj1 holds a reference to obj2 (be it a pointer or a
reference), what happens when the last reference to obj1 disappears.
Both objects become 'dead' and the order of cleanup is undefined. In
that case, you can question the choice, but consider any cycle of
references where together they are all referenced. When the last
external reference to the cycle disappears the entire set is collectable
with no 'right' order. Based on the fact that in some cases there's no
'right' order, the collector takes the approach of 'no order can be
assumed for any object being collected'. The result being that no
references to other collectable memory can be referenced from a
destructor. Because of the collector, you don't need to either.
Destructors only need to manage non-collectable entities.
Later,
Brad
↑↓← → Chris Miller <chris dprogramming.com> writes:
Finally, would SafeD have to disallow destructors? If you're accessing garbage
collected memory in a destructor, you're asking for trouble. It's not always as
simple as directly disallowing access these fields. Calling functions can
indirectly cause the memory to be accessed. However, if you're not accessing GC
memory in a destructor, you're probably using some lower-level functions, which
are generally untrustworthy.
I thought the garbage collector only freed memory after the destructor had
been run.
DMD 1.00 spec document, page 104 says "The garbage collector calls the
destructor when the object is deleted."
Did this change? I haven't checked for an update to my copy of the spec
document in some time.
-- the "other" Chris Miller
http://www.digitalmars.com/d/2.0/class.html#Destructor
"When the garbage collector calls a destructor for an object of a class that
has members that are references to garbage collected objects, those references
are no longer valid. This means that destructors cannot reference sub objects.
This is because that the garbage collector does not collect objects in any
guaranteed order, so there is no guarantee that any pointers or references to
any other garbage collected objects exist when the garbage collector runs the
destructor for an object."
--
Chris Miller <chris dprogramming.com>
Mar 25 2008
↑↓← → Alexander Panek <alexander.panek brainsware.org> writes:
I think polishing D 1.0 is definitely more important.
I think the real paradigm shift will come from 'pure', not from 'safe D' (or
even const).
I'd love to be able to mark most of my functions as pure right now (even before
it's enforced by the compiler).
Mar 31 2008
↑ ↓ ← → Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
I think the real paradigm shift will come from 'pure', not from 'safe D'
(or even const).
I'd love to be able to mark most of my functions as pure right now (even
before it's enforced by the compiler).