www.digitalmars.com         C & C++   DMDScript  

D - Advanced features (for future)

reply Ilya Minkov <midiclub 8ung.at> writes:
Hello.

It would be very good to be able to save classes to disk in a safe 
manner, so that (maybe only public?) fields can be saved and then read 
in, even if a class has been sublassed or expanded (not too hard, with 
current memory model), or even if the underlying machine is different 
(hard). But even saving would probably become much harder if powerful 
data reordering for arrays of classes is implemented.

For this i thing a special problem are Unions. A smart union type has to 
be introduced(switch?), which would keep information on active field, 
and thus provide debugging capabilities. BTW, a parsing library and many 
other usages would draw profit of such a "switch", being shorter to 
write and easier to maintain than a union.

Another useful thing is ML-style pattern matching which i have already 
wished. I was thinking about possible implementation, but then i got 
busy with other things. Yesterday i stumbled over a document describing 
*exactly this* - a C++ extention for this feature. I have only looked 
briefly at the document. Maybe their syntax is overbent, but it might be 
worth a look anyway.

http://citeseer.nj.nec.com/leung96cbased.html

-i.
Jan 29 2003
next sibling parent reply Bill Cox <bill viasic.com> writes:
Hi, Ilya.

Ilya Minkov wrote:
 Hello.
 
 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.
 
 For this i thing a special problem are Unions. A smart union type has to 
 be introduced(switch?), which would keep information on active field, 
 and thus provide debugging capabilities. BTW, a parsing library and many 
 other usages would draw profit of such a "switch", being shorter to 
 write and easier to maintain than a union.

Some of the code gerators we use at work automatically create binary load and save functions. In the early 90's we used them at QuickLogic, but we ran into difficulties maintaining binary backwards compatibility with our simple binary dumps. We also found that a simple memory image of binary data structures typically takes up more space than a carefully designed ASIC format (which takes up more than a carefully designed binary format). As a result, no one has used the binary load/save feature in a decade. It sounds cool. I even wrote code in one of the generators to do it. It just hasn't been as usefull as I thought it would be. Instead of building functions like binary load/save into the language, I'd recommend providing the hooks for users to do it with code generators. Even if there's no direct generation capability in the language, there are a few things that could make D work better than C++ does with code generators. In particular: - Having a way to split up class definitions into multiple parts. For example, an 'extend' keyword in front of a class could mean we're adding to an existing class. This isn't inheritance. We'd be modifying a class directly rather than creating a new one. - Do the same thing for modules, functions, variables, and class methods. It's kind of nice for code generator to be able to put a few fields here, add a few statements there, and add a couple functions to an existing module. For example, the auto-generated recursive destructors we use were hell to write for C. Every kind of class relationship supported had to be considered in big switch statements to generate all the different parts of the function. Really ugly. When targetting a language that supports these after-the-fact extensions, the complexity of the code gerator was reduced tremendously. The same code that adds fields to the parent and child classes also adds a few statements to the recursive destructor. It's much nicer. Extensions like these allow code generators like ClassWizard to simply add files to your project, and not need to modify your hand written files. No more parsing the whole language to do a simple generator. No mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files. If you were to go the whole 9 yards, you might also allow a similar feature: not just extensions... replacement! You could use something like a replace keyword in front of your module or class or function or method or variable. With this syntax, you can add little edit files to your projects that fix problems in a library you've been handed. For example, if you run into a performance problem with a third party library (like that never happens ;-)) and track it down to the use of a singly linked list instead of doubly linked, you type a few lines of code in a patch file, and problem solved! For the next ten years that it takes your library vendor to get around to fixing the problem, you have a work around that usually works with their new releases. What do you think? Bill Cox
Jan 29 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Hello. Sorry it took me that long to become aware of this post. :)

Comments embedded.

-i.

Bill Cox wrote:
 Hi, Ilya.
 
 Ilya Minkov wrote:
 
 Hello.

 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.

 For this i thing a special problem are Unions. A smart union type has 
 to be introduced(switch?), which would keep information on active 
 field, and thus provide debugging capabilities. BTW, a parsing library 
 and many other usages would draw profit of such a "switch", being 
 shorter to write and easier to maintain than a union.

Some of the code gerators we use at work automatically create binary load and save functions. In the early 90's we used them at QuickLogic, but we ran into difficulties maintaining binary backwards compatibility with our simple binary dumps. We also found that a simple memory image of binary data structures typically takes up more space than a carefully designed ASIC format (which takes up more than a carefully designed binary format).

Hm. You have mentioned dynamic properties a while ago. With them, you probably wouldn't have such difficulties. There also has to be some framework, which would allow extending the format, even if the serialisation code is written manually. A basic support for it would include that a basic class has a (stub) method for converting it into the stream of data (.Serialize ?, analogous to current ToHash and ToString). You would then implement this method in the simplest case with statements like "serstream ~ thisproperty.Serialize". This would also imply that .Serialize is implemented in the basic types. Analogous about reading. Languages with dynamic only object methods seem to have this one problem less. However, implicit serialisation sequence would also allow to interpret some data, which cannot be represented in the object directly due to changes. As to the framework, XML is one example of it. I consideer it though appropriate for such things, i would also prefer to have an equivalent binary format (with conversion utilities back and forth), since it would work faster and take up less space. BTW, i could make such an XML-like framework... make a function like ToXMLData, which would be overloaded for basic types. A user can overload it for his own types. And for classes, it should take the corresponding method of a class. It should be doable with interfaces. Then a way to compose one XMLData of many and to save it all in binary, or convert it into real XML. And i have to consider the Pizza contest. Don't expect much though since i'm not the major brain here and i'm only 20, i just started to study CS. And since i *never* eat at Pizza Hut, but rather in Restaurant Italy, Asado Steak, and some others. I still have over 100 restaurants to explore. :)
 As a result, no one has used the binary load/save feature in a decade. 
 It sounds cool. I even wrote code in one of the generators to do it.  It 
 just hasn't been as usefull as I thought it would be.

For static languages binary dumps are much less useful that to dynamic ones.
 Instead of building functions like binary load/save into the language, 
 I'd recommend providing the hooks for users to do it with code 
 generators.  Even if there's no direct generation capability in the 
 language, there are a few things that could make D work better than C++ 
 does with code generators.  In particular:
 
 - Having a way to split up class definitions into multiple parts.
 
 For example, an 'extend' keyword in front of a class could mean we're 
 adding to an existing class.  This isn't inheritance.  We'd be modifying 
 a class directly rather than creating a new one.
 
 - Do the same thing for modules, functions, variables, and class methods.
 
 It's kind of nice for code generator to be able to put a few fields 
 here, add a few statements there, and add a couple functions to an 
 existing module.  For example, the auto-generated recursive destructors 
 we use were hell to write for C.  Every kind of class relationship 
 supported had to be considered in big switch statements to generate all 
 the different parts of the function.  Really ugly.  When targetting a 
 language that supports these after-the-fact extensions, the complexity 
 of the code gerator was reduced tremendously.  The same code that adds 
 fields to the parent and child classes also adds a few statements to the 
 recursive destructor.  It's much nicer.

These are all good ideas. Also consider, that one could possibly have very few classes in the application, but very many methods to add to them. Then it would make sense to split up the class across multiple files for easy navigation and editing. This means however, that all these units have to be compiled simultaneously. Dependencies can be awful to track.
 Extensions like these allow code generators like ClassWizard to simply 
 add files to your project, and not need to modify your hand written 
 files.  No more parsing the whole language to do a simple generator.  No 
 mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.
 
 If you were to go the whole 9 yards, you might also allow a similar 
 feature:  not just extensions... replacement!  You could use something 
 like a replace keyword in front of your module or class or function or 
 method or variable.

Ouch.
 With this syntax, you can add little edit files to your projects that 
 fix problems in a library you've been handed.
 
 For example, if you run into a performance problem with a third party 
 library (like that never happens ;-)) and track it down to the use of a 
 singly linked list instead of doubly linked, you type a few lines of 
 code in a patch file, and problem solved!  For the next ten years that 
 it takes your library vendor to get around to fixing the problem, you 
 have a work around that usually works with their new releases.

Cool :)
 What do you think?
 
 Bill Cox
 

Mar 05 2003
next sibling parent reply Bill Cox <bill viasic.com> writes:
 And i have to consider the Pizza contest. Don't expect much though since 
 i'm not the major brain here and i'm only 20, i just started to study 
 CS. And since i *never* eat at Pizza Hut, but rather in Restaurant 
 Italy, Asado Steak, and some others. I still have over 100 restaurants 
 to explore. :)

You've got a lot of knowledge about computer languages for being only 20. Pretty impressive. I'm 39, just old enough to have actually had a job programming in Fortran on a PDP-11/45. -- Bill
Mar 06 2003
parent "Walter" <walter digitalmars.com> writes:
"Bill Cox" <bill viasic.com> wrote in message
news:3E6734FB.5060406 viasic.com...
 I'm 39, just old enough to have actually had a
 job programming in Fortran on a PDP-11/45.

Been there, done that <g>.
Mar 08 2003
prev sibling parent reply Bill Cox <Bill_member pathlink.com> writes:
I>> We also found that a simple memory image 
 of binary data structures typically takes up more space than a carefully 
 designed ASIC format (which takes up more than a carefully designed 
 binary format).

Hm. You have mentioned dynamic properties a while ago. With them,

probably wouldn't have such difficulties.

I thought a simple example might illustrate the trouble I had with binary save formats. Suppose we're saving a directed graph to disk. It's classes look like: class Node { LinkedList<Edge> inEdges, outEdges; bool visited, marked; char *name; } class Edge { Node fromNode, toNode; } Now, let's assume I have a graph that in a text file would be represented as: A B C B C E C A D D A B C E B D E The first colum is node names, and the remaining symbols are destinations of edges. This takes 34 bytes. If we stream binary to the disk, I assume all Edges and Nodes wind up there. Assume the LinkedList class has a head pointer a name, and two Booleans that I could pack into 1 byte. Each Node would take 7 bytes. Each Edge has two Node pointers and two next pointers. They would take 16 bytes. On disk, the simple binary dump takes 5*7 + 12*16 = 227 bytes. That's a whole lot worse than 34 bytes. As for compatibility, suppose we later on convert our LinkedList relationships to DoublyLinkedList. First, the binary size gets worse, while the text file doesn't. Second, we now have to write converters to be able to load the old binary files. We could gain some backward compatibility by using an even larger binary format that tags all the fields, but what's the point? Are we trying to be efficient, or just trying to avoid writing a parser? File size isn't important for most apps. Look at how large MS Word files are. No one cares. I work with design files representing .13u chips. A small file for us migh be 100 meg. Not only does the text version reduce the size, but our users demand text so they can hack our data structurs with Perl scripts. Bill
Mar 06 2003
parent "Walter" <walter digitalmars.com> writes:
"Bill Cox" <Bill_member pathlink.com> wrote in message
news:b47fib$5io$1 digitaldaemon.com...
 File size isn't important for most apps.  Look at how large MS Word files
 are.  No one cares.  I work with design files representing .13u chips.  A
 small file for us migh be 100 meg.  Not only does the text version reduce
 the size, but our users demand text so they can hack our data structurs
 with Perl scripts.

You hit on a big advantage with text files - they can be checked visually for correctness, and can be editted with ordinary text editors. Binary files require a custom dumper/editor to be written. One reason I don't use .doc files is because I need a specific version of the word processor installed to read them. 20 years from now, who will have that? (Yes, I have 20 year old files I still use.) With ascii text format, I'm covered.
Mar 08 2003
prev sibling parent reply Burton Radons <loth users.sourceforge.net> writes:
Ilya Minkov wrote:
 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.

This is in DLI under the pickle.d module. It transfers a class field image, so new and reordered fields don't matter, and handles single transferrence of pointers, references, and arrays. The only non-portable part is a dependency on IEEE.
 For this i thing a special problem are Unions. A smart union type has to 
 be introduced(switch?), which would keep information on active field, 
 and thus provide debugging capabilities. BTW, a parsing library and many 
 other usages would draw profit of such a "switch", being shorter to 
 write and easier to maintain than a union.

Unions don't get serialisation. If you want to save a union, save the active state.
Jan 29 2003
parent Ilya Minkov <midiclub 8ung.at> writes:
Burton Radons wrote:
 Ilya Minkov wrote:
 
 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.

This is in DLI under the pickle.d module. It transfers a class field image, so new and reordered fields don't matter, and handles single transferrence of pointers, references, and arrays. The only non-portable part is a dependency on IEEE.

Cool. Thanks. So it handles endianness.
 
 Unions don't get serialisation.  If you want to save a union, save the 
 active state.
 

OK... But do you doubt usefulness of a switching union? Thanks a lot. -i.
Jan 29 2003