www.digitalmars.com         C & C++   DMDScript  

D - persistence

reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
I've been thinking alot about persistence lately, saw a few other languages
recently that offer this, and realized that a significant portion of my work
is bogged down in the details of persistence of objects state.

One trick that we pull in games-programming land is to save off the entire
stack and heap of the program (and OS if necessary) to a big binary file and
compress it, then put it on the CD as a save game.  With creative use of
setjmp/longjmp or equivalent, this allows one to "suspend" a program
indefinitely and restore it at any time.  It's also blazingly fast compared
to fiddling around with individual structure members, dealing with
endianness and alignment, manipulating file pointers, size conversions, etc,
etc ad nauseum.

Some other languages recently allow you to suspend the program to disk at
any time and resume it later.

This seems to me to be the ideal form of persistence.

All you need is a way for an object to transform its binary image in memory
in such a way that if stored to disk, it could rebuild the object from
memory accessible elsewhere in the program dataspace and the object's disk
image.

Most objects would be able to store enough information that they could be
rebuilt, and fit that into the same amount of bits.

The compiler could obviously help out here by keeping track of where the
objects are and what type they are, and providing enough hooks into its low
level typeinfo structures that the objects could recouple themselves with
other objects that had been saved.  Stuff like transforming object id's into
references on load, and references to object id's on save.

Then adding a simple command keyword ("yield"? "suspend"?) or a library call
would allow us to suspend program execution at any time in a way that allows
the program, when run again on the same machine, to pick up exactly where it
left off (probably in a function call off the main program loop.)

This is extremely powerful as it allows one to write database-like
applications with nary a stream in sight.  No printfs, no fwrites, no
sizeof's, no byte order, zilch.

The language could even go so far as to support running the program out of
virtual memory.  It asks for memory, if there is none, it saves off some old
objects and reclaims their space.  When the old objects are needed again, it
(thru some page faulting mechanism) loads them back again.  On Win32 this
would be an ideal situation for memory-mapped files.  D programs then
wouldn't be bound by the amount of RAM on the target machine, but merely
total backing store.  The OS could be paging stuff to network store or who
knows where behind that;  potentially limitless storage.  With little to no
work on the part of the application programmer.

That's the tricky part.  We applications programmers are lazy as all hell.
We get off on finding out ways to *not* write code.  I freely admit it.

Instead we think of ways to do more work with less code.  The less code the
better.

Similarly individual objects could be presented to the OS as separate files.
You override the extension, you can specify where to put them at time of
storage.  You can tell individual objects, go to storage, summon from
storage.  Imagine manipulating arrays of objects which may or may not be
loaded, and loading/unloading them from their corresponding files.

You could write code, run functions that generate objects, load files into
memory, get everything set up and then, when you're ready to ship, cause a
yield function call, zip up the directory, wrap an install program around it
and ship it onto a CD along with the language runtime package.

For this to all work we need a couple things to happen:

You can't have objects changing memory layout without versioning problems.
So the compiler would have to keep track of generations of object classes.
One way to limit this is that each time a program is successfully
recompiled, all data from existing running copies gets converted to the new
format immediately.  The programmers have to provide these "conversion"
functions, but they only have to do it once and then the code can be tossed
if you wish, because then all the old objects are in the new format.
Obviously this could go wrong so maybe the compiler could keep backups of
all the old versions of these conversion functions in case it runs into old
files or the programmer has to restore from backup.

Each time the memory structure of an object changes (thru successful
recompile) the compiler could analyze the change (was it just moving this
data from here to there, or was it something more complicated?) and perhaps
auto generate these conversion functions.  It might only prompt for them
when necessary.  If you had a good IDE all this could be done as you drag
stuff around, with undo and everything.  Arguably this would be the
trickiest part of the whole process.  The physical act of keeping track of
the bits and moving them to/from disk is almost trivial by comparison to
what it would take to make such a system not inherently brittle.  Admittedly
I have not given this part of it much perusal yet.

Anyway there's also the issue of OS-level handles;  how to reconstruct
things such as open files, reload textures from disk, etc.  Then there is
the problem of figuring out how many things are taking up way more space in
memory than they need to on disk (for instance, is the data *already* on the
disk somewhere, and doesn't need to be resaved as a new copy just because it
was in RAM when the program suspended).  For this we'd need some declaration
helpers;  storage classes perhaps.  const, readonly_file, readwrite_file.
Ways to specify the filename extensions or directories when saving.

This technology would be great for debugging too.  Imagine saving off
program states in various stages of execution and analyzing them to
determine the location of a bug.  Or to help reproduce errors (save right
before the crash, then do what it takes to get it to crash)

Does this sound like it would make your life easier?  If so, help me flesh
it out.

Sean
May 06 2003
next sibling parent "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:b9a3jf$1eeh$1 digitaldaemon.com...
 One trick that we pull in games-programming land is to save off the entire
 stack and heap of the program (and OS if necessary) to a big binary file
and
 compress it, then put it on the CD as a save game.  With creative use of
 setjmp/longjmp or equivalent, this allows one to "suspend" a program
 indefinitely and restore it at any time.  It's also blazingly fast
compared
 to fiddling around with individual structure members, dealing with
 endianness and alignment, manipulating file pointers, size conversions,
etc,
 etc ad nauseum.
That trick goes back at least to the 1970's! I used it myself in game programming. It's also how the DMC compiler does precompiled headers.
 Some other languages recently allow you to suspend the program to disk at
 any time and resume it later.
 This seems to me to be the ideal form of persistence.
 All you need is a way for an object to transform its binary image in
memory
 in such a way that if stored to disk, it could rebuild the object from
 memory accessible elsewhere in the program dataspace and the object's disk
 image.
If it can be reloaded at a different address, then what you need is a way to find all the pointers and add an offset.
 Most objects would be able to store enough information that they could be
 rebuilt, and fit that into the same amount of bits.

 The compiler could obviously help out here by keeping track of where the
 objects are and what type they are, and providing enough hooks into its
low
 level typeinfo structures that the objects could recouple themselves with
 other objects that had been saved.  Stuff like transforming object id's
into
 references on load, and references to object id's on save.

 Then adding a simple command keyword ("yield"? "suspend"?) or a library
call
 would allow us to suspend program execution at any time in a way that
allows
 the program, when run again on the same machine, to pick up exactly where
it
 left off (probably in a function call off the main program loop.)

 This is extremely powerful as it allows one to write database-like
 applications with nary a stream in sight.  No printfs, no fwrites, no
 sizeof's, no byte order, zilch.

 The language could even go so far as to support running the program out of
 virtual memory.  It asks for memory, if there is none, it saves off some
old
 objects and reclaims their space.  When the old objects are needed again,
it
 (thru some page faulting mechanism) loads them back again.  On Win32 this
 would be an ideal situation for memory-mapped files.  D programs then
 wouldn't be bound by the amount of RAM on the target machine, but merely
 total backing store.  The OS could be paging stuff to network store or who
 knows where behind that;  potentially limitless storage.  With little to
no
 work on the part of the application programmer.

 That's the tricky part.  We applications programmers are lazy as all hell.
 We get off on finding out ways to *not* write code.  I freely admit it.

 Instead we think of ways to do more work with less code.  The less code
the
 better.

 Similarly individual objects could be presented to the OS as separate
files.
 You override the extension, you can specify where to put them at time of
 storage.  You can tell individual objects, go to storage, summon from
 storage.  Imagine manipulating arrays of objects which may or may not be
 loaded, and loading/unloading them from their corresponding files.

 You could write code, run functions that generate objects, load files into
 memory, get everything set up and then, when you're ready to ship, cause a
 yield function call, zip up the directory, wrap an install program around
it
 and ship it onto a CD along with the language runtime package.

 For this to all work we need a couple things to happen:

 You can't have objects changing memory layout without versioning problems.
 So the compiler would have to keep track of generations of object classes.
 One way to limit this is that each time a program is successfully
 recompiled, all data from existing running copies gets converted to the
new
 format immediately.  The programmers have to provide these "conversion"
 functions, but they only have to do it once and then the code can be
tossed
 if you wish, because then all the old objects are in the new format.
 Obviously this could go wrong so maybe the compiler could keep backups of
 all the old versions of these conversion functions in case it runs into
old
 files or the programmer has to restore from backup.

 Each time the memory structure of an object changes (thru successful
 recompile) the compiler could analyze the change (was it just moving this
 data from here to there, or was it something more complicated?) and
perhaps
 auto generate these conversion functions.  It might only prompt for them
 when necessary.  If you had a good IDE all this could be done as you drag
 stuff around, with undo and everything.  Arguably this would be the
 trickiest part of the whole process.  The physical act of keeping track of
 the bits and moving them to/from disk is almost trivial by comparison to
 what it would take to make such a system not inherently brittle.
Admittedly
 I have not given this part of it much perusal yet.

 Anyway there's also the issue of OS-level handles;  how to reconstruct
 things such as open files, reload textures from disk, etc.  Then there is
 the problem of figuring out how many things are taking up way more space
in
 memory than they need to on disk (for instance, is the data *already* on
the
 disk somewhere, and doesn't need to be resaved as a new copy just because
it
 was in RAM when the program suspended).  For this we'd need some
declaration
 helpers;  storage classes perhaps.  const, readonly_file, readwrite_file.
 Ways to specify the filename extensions or directories when saving.

 This technology would be great for debugging too.  Imagine saving off
 program states in various stages of execution and analyzing them to
 determine the location of a bug.  Or to help reproduce errors (save right
 before the crash, then do what it takes to get it to crash)

 Does this sound like it would make your life easier?  If so, help me flesh
 it out.
It sounds complicated!
May 08 2003
prev sibling next sibling parent reply "C. Sauls" <ibisbasenji yahoo.com> writes:
I'm a big fan of this, and as I'm getting into the game-programming world
myself I would love to see this feature for precisely that purpose, if not
for others.  A server project of mine, for example, could definitely benifit
from this as a potential method of "true-state" database backups (small dbs,
so the memory footprint is not an issue).

My only thing is, how to implement it directly, and should it neccesarily be
a part of the language core.

I could imagine a syntax likened unto:
    import persistance;
    ...
    File statefile = new File("mstate.dat");
    ...
    memstate_dump(statefile);
    ...
    memstate_load(statefile);
    ...

-- C. Sauls
May 08 2003
next sibling parent "Sean L. Palmer" <palmer.sean verizon.net> writes:
Minus all the registration and possible pointer fixups, that's essentially
it.

Oh well, like Walter said, it's pretty complicated for something intended
for the language core.  How do you support that on an embedded system for
instance?

A library would be the ideal place for it, but it'd need serious cooperation
from the GC and the OS would have to help, probably.  And definitely need an
easy way to register conversion and update functions for when you want to
change the memory layout (frequently;  that's why having the IDE watch for
it and do conversion functions automatically would be great).

Maybe this is something you'd only enable toward the very end of your
project, when object structures aren't changing much.

Also you need to do some kind of setjmp/longjmp type of thing.  Ideally you
break the code up into 3 versions:  one to load raw data and build the
in-memory working set, one to deal with being started from that point (the
"real" app), and one that mainly serves to update files from old to new
formats.

The hard part is writing all the functions to save and restore OS state (you
will need to recreate hardware state as well, BTW, plus recreate all
OS-allocated objects).

It's not trivial.  I guess it was a stupid request.  But other languages
seem to have pulled off something similar;  maybe their way is less
complicated somehow.  I should check some of that out.

Sean

"C. Sauls" <ibisbasenji yahoo.com> wrote in message
news:b9f9i4$2973$1 digitaldaemon.com...
 I'm a big fan of this, and as I'm getting into the game-programming world
 myself I would love to see this feature for precisely that purpose, if not
 for others.  A server project of mine, for example, could definitely
benifit
 from this as a potential method of "true-state" database backups (small
dbs,
 so the memory footprint is not an issue).

 My only thing is, how to implement it directly, and should it neccesarily
be
 a part of the language core.

 I could imagine a syntax likened unto:
     import persistance;
     ...
     File statefile = new File("mstate.dat");
     ...
     memstate_dump(statefile);
     ...
     memstate_load(statefile);
     ...

 -- C. Sauls
May 08 2003
prev sibling parent Helmut Leitner <leitner hls.via.at> writes:
"C. Sauls" wrote:
 
 I'm a big fan of this, and as I'm getting into the game-programming world
 myself I would love to see this feature for precisely that purpose, if not
 for others.  A server project of mine, for example, could definitely benifit
 from this as a potential method of "true-state" database backups (small dbs,
 so the memory footprint is not an issue).
 
 My only thing is, how to implement it directly, and should it neccesarily be
 a part of the language core.
 
 I could imagine a syntax likened unto:
     import persistance;
     ...
     File statefile = new File("mstate.dat");
     ...
     memstate_dump(statefile);
     ...
     memstate_load(statefile);
I think FileSetMemstate("mstate.dat"); FileGetMemstate("mstate.dat"); would be better. Except if you want to handle states internally, then String s=MemstateRetString(); MemstateSetString(s); would be better, allowing trivial void FileSetMemstate(String filename) { String ms=MemstateRetString(); FileSetString(filename,ms); } void FileGetMemstate(String filename) { String ms=FileRetString(filename); MemstateSetString(ms); } Sorry for throwing my LOP at you. :-) -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
May 09 2003
prev sibling next sibling parent Mark Evans <Mark_member pathlink.com> writes:
There are limits to this proposal but language introspection seems apropos.  A
metaclass facility could list all instances of a given class.  A metaclass
method could reveal an object's binary layout and serialize it to hex or base64
encoding, for example.

Personally I would use such encoding inside an XML format that is easy to
maintain across versions of the program.  I might even use actual string
translations instead of encoding, making the XML human-readable, at least for
some objects.  Binary formats are too version-brittle.  A language able to emit
objects in an XML format might be quite useful.

Ultimately no combination of language and library will produce hassle-free
persistence.  At minimum you always have OS resources to reconstruct.  Still the
more support for it the better, and metaclasses plus XML seem promising to me.

Mark
May 09 2003
prev sibling next sibling parent reply "anderson" <anderson badmama.com.au> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:b9a3jf$1eeh$1 digitaldaemon.com...
 I've been thinking alot about persistence lately, saw a few other
languages
 recently that offer this, and realized that a significant portion of my
work
 is bogged down in the details of persistence of objects state.

 One trick that we pull in games-programming land is to save off the entire
 stack and heap of the program (and OS if necessary) to a big binary file
and
 compress it, then put it on the CD as a save game.  With creative use of
 setjmp/longjmp or equivalent, this allows one to "suspend" a program
 indefinitely and restore it at any time.  It's also blazingly fast
compared
 to fiddling around with individual structure members, dealing with
 endianness and alignment, manipulating file pointers, size conversions,
etc,
 etc ad nauseum.

 Some other languages recently allow you to suspend the program to disk at
 any time and resume it later.

 This seems to me to be the ideal form of persistence.

 All you need is a way for an object to transform its binary image in
memory
 in such a way that if stored to disk, it could rebuild the object from
 memory accessible elsewhere in the program dataspace and the object's disk
 image.

 Most objects would be able to store enough information that they could be
 rebuilt, and fit that into the same amount of bits.

 The compiler could obviously help out here by keeping track of where the
 objects are and what type they are, and providing enough hooks into its
low
 level typeinfo structures that the objects could recouple themselves with
 other objects that had been saved.  Stuff like transforming object id's
into
 references on load, and references to object id's on save.

 Then adding a simple command keyword ("yield"? "suspend"?) or a library
call
 would allow us to suspend program execution at any time in a way that
allows
 the program, when run again on the same machine, to pick up exactly where
it
 left off (probably in a function call off the main program loop.)

 This is extremely powerful as it allows one to write database-like
 applications with nary a stream in sight.  No printfs, no fwrites, no
 sizeof's, no byte order, zilch.

 The language could even go so far as to support running the program out of
 virtual memory.  It asks for memory, if there is none, it saves off some
old
 objects and reclaims their space.  When the old objects are needed again,
it
 (thru some page faulting mechanism) loads them back again.  On Win32 this
 would be an ideal situation for memory-mapped files.  D programs then
 wouldn't be bound by the amount of RAM on the target machine, but merely
 total backing store.  The OS could be paging stuff to network store or who
 knows where behind that;  potentially limitless storage.  With little to
no
 work on the part of the application programmer.

 That's the tricky part.  We applications programmers are lazy as all hell.
 We get off on finding out ways to *not* write code.  I freely admit it.

 Instead we think of ways to do more work with less code.  The less code
the
 better.

 Similarly individual objects could be presented to the OS as separate
files.
 You override the extension, you can specify where to put them at time of
 storage.  You can tell individual objects, go to storage, summon from
 storage.  Imagine manipulating arrays of objects which may or may not be
 loaded, and loading/unloading them from their corresponding files.

 You could write code, run functions that generate objects, load files into
 memory, get everything set up and then, when you're ready to ship, cause a
 yield function call, zip up the directory, wrap an install program around
it
 and ship it onto a CD along with the language runtime package.

 For this to all work we need a couple things to happen:

 You can't have objects changing memory layout without versioning problems.
 So the compiler would have to keep track of generations of object classes.
 One way to limit this is that each time a program is successfully
 recompiled, all data from existing running copies gets converted to the
new
 format immediately.  The programmers have to provide these "conversion"
 functions, but they only have to do it once and then the code can be
tossed
 if you wish, because then all the old objects are in the new format.
 Obviously this could go wrong so maybe the compiler could keep backups of
 all the old versions of these conversion functions in case it runs into
old
 files or the programmer has to restore from backup.

 Each time the memory structure of an object changes (thru successful
 recompile) the compiler could analyze the change (was it just moving this
 data from here to there, or was it something more complicated?) and
perhaps
 auto generate these conversion functions.  It might only prompt for them
 when necessary.  If you had a good IDE all this could be done as you drag
 stuff around, with undo and everything.  Arguably this would be the
 trickiest part of the whole process.  The physical act of keeping track of
 the bits and moving them to/from disk is almost trivial by comparison to
 what it would take to make such a system not inherently brittle.
Admittedly
 I have not given this part of it much perusal yet.

 Anyway there's also the issue of OS-level handles;  how to reconstruct
 things such as open files, reload textures from disk, etc.  Then there is
 the problem of figuring out how many things are taking up way more space
in
 memory than they need to on disk (for instance, is the data *already* on
the
 disk somewhere, and doesn't need to be resaved as a new copy just because
it
 was in RAM when the program suspended).  For this we'd need some
declaration
 helpers;  storage classes perhaps.  const, readonly_file, readwrite_file.
 Ways to specify the filename extensions or directories when saving.

 This technology would be great for debugging too.  Imagine saving off
 program states in various stages of execution and analyzing them to
 determine the location of a bug.  Or to help reproduce errors (save right
 before the crash, then do what it takes to get it to crash)

 Does this sound like it would make your life easier?  If so, help me flesh
 it out.
Yes, that'd cut my code down by 20%. It's such a common task, and serilization methods still require quite a bit of work. I think it may be possible mainly as a library, but that'd be even more complex then building it into the complier.
 Sean
May 30 2003
parent Georg Wrede <Georg_member pathlink.com> writes:
 This technology would be great for debugging too.  Imagine saving off
 program states in various stages of execution and analyzing them to
 determine the location of a bug.  Or to help reproduce errors (save right
 before the crash, then do what it takes to get it to crash)

 Does this sound like it would make your life easier?  If so, help me flesh
 it out.
Yes, that'd cut my code down by 20%. It's such a common task, and serilization methods still require quite a bit of work. I think it may be possible mainly as a library, but that'd be even more complex then building it into the complier.
This is standard issue in Unix. You can force your program to save its entire state on disk. Actually, even if you don't, Unix will save it for you if your program crashes hard. (That's why you find all those large files called "core".) Examining then this file with standard Unix debug tools lets you see exactly what was going on at the time it crached.
May 30 2003
prev sibling parent "anderson" <anderson badmama.com.au> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:b9a3jf$1eeh$1 digitaldaemon.com...
 I've been thinking alot about persistence lately, saw a few other
languages
 recently that offer this, and realized that a significant portion of my
work
 is bogged down in the details of persistence of objects state.
My, .02c. Just an idea to toss around, just incase Walter ever changes his mind on persistent objects in D. I think that at least some conversion could be generated by the complier. If a pointer to an object (or object) conversion can be also generated, then the parent object could also be generated. //The serilization methods for these can be generated class A { private(archive): //This is only prototype int x; int y; }; class B { private(archive): B *b; }; With pointers, (I guess this is kindy garden stuff to you guys), the first time it's refereced it'd be saved to the achive, and subsquently only the referece would be achived. Things that are connections would need to be wrapped (int the standard lib). //Connection file connection class FileC : File //File would be treated more like a friend { private(archive): char *filename; private: //The serialize method would be like a constructor. //default will cause the default varables to save as per normal (staticly detects members and builds that code) void serialize(archive X):default(X) { //Do any other data conversion's nessary switch (X.mode) { case SAVE: //Stuff releated to input (filename is done by default(X)) break; case LOAD: //Stuff releated to output (filename is done by default(X)) break; case OPEN: //Stuff to do with opening connections open(filename); break; case CLOSE: //Stuff to do with closing connections close(filename); break; }; }; class D : E { FileC A; //Serialize this class //default will cause the default varables to save as per normal void serialize(archive X) { default(X); //Do any other data conversion's nessary switch (X.mode) { case SAVE: //Stuff releated to input break; case LOAD: //Stuff releated to output break; case OPEN: //Stuff to do with opening connections break; case CLOSE: //Stuff to do with closing connections break; }; } }; On creation an object would first be created in memory, and then it's connections would be opened. static members should also be serilizable (they wouldn't need a method). Anyway most of that appears to be something that could be in the standard lib except for connection //Indicates, that the class is a special type connection default(...) //Method that saves/loads/open/closes the appropriate information. Another idea. Default could be part of the standard lib, if. you could loop through every variable in a class and determine it's type information. The problem here is that variables can be different sizes. //Has the for each statement been determined for D yet? foreach(member; theObject) { ... if (Class) //Pseudo { member.serialize(X); } else if (int) //Pseudo { X.put(member); } else //... othertypes } Anyway, perhaps that'll spark some more ideas. PS - I also brought this topic up a while ago as well.
May 30 2003