www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Garbage Collector - One Last Question

reply Arcane Jill <Arcane_member pathlink.com> writes:
The garbage collector is free to move objects about in memory for
defragmentation purposes. May I request that when this happens, either

(a) The original location be securely wiped so that no trace of the original
data remains. (A simple memset() will do this, providing it's not optimized
away), OR

(b) A callback mechanism exist, so that the class which owns the data be
notified, so that it may perform the move itself.

Of course, you'll probably want to NOT do this for most data. Perhaps a special
attribute (might I suggest the keyword "sensitive") could enable this behavior
for only that data for which it matters.

Arcane Jill
Jun 09 2004
parent reply "Walter" <newshound digitalmars.com> writes:
May I suggest instead that secure data be allocated (using overloaded new
and delete) in a separate memory pool. User instantiated class objects would
just contain references to this secure pool, and not contain any sensitive
data directly. Having a secure pool has several advantages:

1) being a specific area of memory, it can be 'locked' without needing to
lock the whole gc heap
2) on program exit or failure, it can be securely wiped
3) you can completely control the semantics of it

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7nv5$dhr$1 digitaldaemon.com...
 The garbage collector is free to move objects about in memory for
 defragmentation purposes. May I request that when this happens, either

 (a) The original location be securely wiped so that no trace of the
original
 data remains. (A simple memset() will do this, providing it's not
optimized
 away), OR

 (b) A callback mechanism exist, so that the class which owns the data be
 notified, so that it may perform the move itself.

 Of course, you'll probably want to NOT do this for most data. Perhaps a
special
 attribute (might I suggest the keyword "sensitive") could enable this
behavior
 for only that data for which it matters.

 Arcane Jill
Jun 09 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca7pj8$g0b$1 digitaldaemon.com>, Walter says...
May I suggest instead that secure data be allocated (using overloaded new
and delete) in a separate memory pool. User instantiated class objects would
just contain references to this secure pool, and not contain any sensitive
data directly.
Good idea ... except for one small problem. As we've just recently cleared up, if you put data in an area not managed by the GC, you will never be notified when it becomes no longer reachable, so there is no way to know when to delete it. So you would be back to requiring an explicit delete(), which, in the circumstances I have in mind is not really an option. Basically I'd then have to write a secondary GC in order to discover whether or not it was okay to delete it. There ARE ways around this problem though. I can think of at least two off the top of my head. However, these ideas of mine would not be of practical use unless it were possible to overload operator new in general (instead of on a per-class basis). That is, I can't (currently) say:
    RegExp re = new(MyAllocator) RegExp(pattern, attributes);
because (currently) I can only add operator new overloads to my own classes, not to existing classes in Phobos. In C++ you can make a custom new that works for EVERYTHING. Of course, I might have got this wrong, in which case please tell me. It's not just classes either. I'd want my operator new to be able to allocate arrays too. That is, I'd want to be able to replace:
   char[] r;
   r.length = 100;
with
   char[] password;
   password.length = new(MyAlloctor) 100; // or some similar syntax
Having a secure pool has several advantages:
Yes it does, but unless we can overload new GENERALLY, instead of only for specific classes, that advantage is lost. Any chance we can have a GLOBAL overload for new? I don't think that my idea (in previous post) would slow down the gc though. After all, if data were NOT marked as sensitive, the gc would behave exactly as it does now. (Er, I mean, exactly as it would do if it did relocation). Only data marked as sensitive would have to be wiped, and, in practice, that is likely to be rare. Jill
Jun 09 2004
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
I just can't figure out why you'd need a global operator new. I've used such
in C++ now and then, and always wound up backing it out because it screws
things up (for example, it prevents linking in an existing library that
relies on the default new's semantics).

What doing the extra layer enables you to do is control the reference
counting to it, the user of the class doesn't need to, and you wouldn't need
to worry about user class references being copied about willy-nilly.
Increment the reference count on construction of the user object, and
decrement on destruction. When 0, secure delete the hidden security data.
When you reach a 'sync' point, invoke a 'reaper' that secure deletes any
remaining secure data items.

If you're doing a web server thing, you can do a secure delete on any items
that haven't been collected after a fixed time has elapsed (like 15
minutes).

The secure delete 'reaper' also needn't recycle the memory; any user objects
still live should check to see if the data is still 'live' before accessing
it, and fail reasonably if it has been reaped.

I actually kind of like the idea of a reaper thread that goes through and
expires any secure data that is more than X minutes old. It's a nice backup
against bugs where data is inadvertantly held on to (can't guarantee this
doesn't happen in C++, just store a reference into static data. Voila, it
lives forever, copy constructor or not).
Jun 09 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
Many, many thanks to everyone who put their ideas into this. I am happy to
report that I've completely solved the problem now, and I don't need to ask
Walter to implement _anything_ new (no pun intended).

All this has made me realize that, while I could think of one or two tweaks to
the language which might make it easier for systems programmers, it would be FAR
more appropriate for me to put a concrete proposal together, complete with
rationale for everything, and why it would help everyone (instead of just me).
So I'll do that in my own time, and I'll take my time over it so I don't waste
everyone's time with dumb ideas that I change my mind about a few hours later
because I've thought of something else.

In the meantime - all of my problems are solved, and I'm happy. (Now all I've
got to do is go and write the code). Thanks to everyone who joined in this
discussion - most especially Walter, whose patience is truly amazing.

Jill


I'll just reply to this...


In article <ca89ts$18th$1 digitaldaemon.com>, Walter says...
I just can't figure out why you'd need a global operator new. I've used such
in C++ now and then, and always wound up backing it out because it screws
things up (for example, it prevents linking in an existing library that
relies on the default new's semantics).
I'll spend some time putting a sensible proposal together. I'm sure it will all make sense if I lay down all the arguments reasonably, like Norbert did with multidimensional arrays. In the meantime, you can just forget it, as I don't need it for now.
What doing the extra layer enables you to do is control the reference
counting to it, the user of the class doesn't need to, and you wouldn't need
to worry about user class references being copied about willy-nilly.
Increment the reference count on construction of the user object, and
decrement on destruction. When 0, secure delete the hidden security data.
When you reach a 'sync' point, invoke a 'reaper' that secure deletes any
remaining secure data items.
Yes, you are correct.
If you're doing a web server thing, you can do a secure delete on any items
that haven't been collected after a fixed time has elapsed (like 15
minutes).
Now THAT is a possibility I hadn't thought of. Cheers!
The secure delete 'reaper' also needn't recycle the memory; any user objects
still live should check to see if the data is still 'live' before accessing
it, and fail reasonably if it has been reaped.
Nice!
I actually kind of like the idea of a reaper thread that goes through and
expires any secure data that is more than X minutes old. It's a nice backup
against bugs where data is inadvertantly held on to (can't guarantee this
doesn't happen in C++, just store a reference into static data. Voila, it
lives forever, copy constructor or not).
Walter, you're a genius. (Although I solved my problem another way and don't need this). I will bear that in mind for the future.
Jun 09 2004
parent reply Mike Swieton <mike swieton.net> writes:
On Thu, 10 Jun 2004 06:25:23 +0000, Arcane Jill wrote:

 
 Many, many thanks to everyone who put their ideas into this. I am happy to
 report that I've completely solved the problem now, and I don't need to ask
 Walter to implement _anything_ new (no pun intended).
If it's not something terrible domain-specific, why don't you share your solution? Mike Swieton __ In case you haven't realized it, building computer systems is hard. - Martin Fowler
Jun 10 2004
parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <pan.2004.06.10.23.54.20.311894 swieton.net>, Mike Swieton says...
On Thu, 10 Jun 2004 06:25:23 +0000, Arcane Jill wrote:

 
 Many, many thanks to everyone who put their ideas into this. I am happy to
 report that I've completely solved the problem now, and I don't need to ask
 Walter to implement _anything_ new (no pun intended).
If it's not something terrible domain-specific, why don't you share your solution?
Sure. Here's the simplified "proof of concept" explanation. What you do is you declare a class like this:
   class A
   {
       ubyte* p;
   }
In the constructor, you call malloc() to get some memory, store its address in p, throwing an exception if malloc() returns null. In the destructor, you call free(p). Now, because this is a class WITHOUT a custom new(), it will be managed by the garbage collector. This means that its destructor will, eventually, be called. Since that destructor frees the malloc'ed memory, you have no memory leak. And all of the memory that you got through malloc() will never be touched by the garbage collector (except indirectly when it calls the destructor). A more serious class would allow for resizing, wiping, and so on, and of course if you want, you can replace malloc() and free() with your own custom allocation/deallocation pair. And that's pretty much it really. Arcane Jill
Jun 11 2004
prev sibling next sibling parent "Martin M. Pedersen" <martin moeller-pedersen.dk> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7ra3$iji$1 digitaldaemon.com...
 Good idea ... except for one small problem. As we've just recently cleared
up,
 if you put data in an area not managed by the GC, you will never be
notified
 when it becomes no longer reachable, so there is no way to know when to
delete
 it.
How about having an object on the normal heap, which has a destructor the wipes out the sensitive data? Keep a reference to the heap-object and the sensitive data together, or reference the sensitive data through the heap-object. Then the destructor and the GC will take care of it for you. Regards, Martin
Jun 09 2004
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

<snip>
 Good idea ... except for one small problem. As we've just recently cleared up, 
 if you put data in an area not managed by the GC, you will never be notified
 when it becomes no longer reachable, so there is no way to know when to delete
 it.
<snip> My impression was that you were going to have an object wrapper, which would remain in the heap, around the secure data. After all, IINM your Int class is already an object wrapper around a dynamic array. Then all you need to do is have Int malloc/free the actual data content. This would also mean that the explicit memory management remains internal to your class. To make sure it's wiped on exit, all you then need is to make sure gc_term gets called. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 11 2004
prev sibling parent "Matthew" <matthew.hat stlsoft.dot.org> writes:
I do like the idea of an object/class getting a callback when an instance is
about to be moved.

How about it being a classinfo attribute, and only accessible to D
implementation
code and to that class. If the classinfo callback is null, then the GC proceeds
as normal. If not, then it gives the callback a tinkle.

Whether or not you'd want to have the call be able to cancel the move, or split
it into pre-move and post-move, etc. is up for debate

"Walter" <newshound digitalmars.com> wrote in message
news:ca7pj8$g0b$1 digitaldaemon.com...
 May I suggest instead that secure data be allocated (using overloaded new
 and delete) in a separate memory pool. User instantiated class objects would
 just contain references to this secure pool, and not contain any sensitive
 data directly. Having a secure pool has several advantages:

 1) being a specific area of memory, it can be 'locked' without needing to
 lock the whole gc heap
 2) on program exit or failure, it can be securely wiped
 3) you can completely control the semantics of it

 "Arcane Jill" <Arcane_member pathlink.com> wrote in message
 news:ca7nv5$dhr$1 digitaldaemon.com...
 The garbage collector is free to move objects about in memory for
 defragmentation purposes. May I request that when this happens, either

 (a) The original location be securely wiped so that no trace of the
original
 data remains. (A simple memset() will do this, providing it's not
optimized
 away), OR

 (b) A callback mechanism exist, so that the class which owns the data be
 notified, so that it may perform the move itself.

 Of course, you'll probably want to NOT do this for most data. Perhaps a
special
 attribute (might I suggest the keyword "sensitive") could enable this
behavior
 for only that data for which it matters.

 Arcane Jill
Jun 09 2004