D - Advanced features (for future)

Ilya Minkov (20/20) Jan 29 2003 Hello.

Bill Cox (50/64) Jan 29 2003 Some of the code gerators we use at work automatically create binary

Ilya Minkov (42/118) Mar 05 2003 Hello. Sorry it took me that long to become aware of this post. :)

Bill Cox (4/9) Mar 06 2003 You've got a lot of knowledge about computer languages for being only

Walter (3/5) Mar 08 2003 Been there, done that .

Bill Cox (41/46) Mar 06 2003 I thought a simple example might illustrate the trouble I had with binar...

Walter (9/14) Mar 08 2003 You hit on a big advantage with text files - they can be checked visuall...

Burton Radons (7/18) Jan 29 2003 This is in DLI under the pickle.d module. It transfers a class field

Ilya Minkov (7/26) Jan 29 2003 Cool. Thanks.

Ilya Minkov <midiclub 8ung.at> writes:

Hello.

It would be very good to be able to save classes to disk in a safe 
manner, so that (maybe only public?) fields can be saved and then read 
in, even if a class has been sublassed or expanded (not too hard, with 
current memory model), or even if the underlying machine is different 
(hard). But even saving would probably become much harder if powerful 
data reordering for arrays of classes is implemented.

For this i thing a special problem are Unions. A smart union type has to 
be introduced(switch?), which would keep information on active field, 
and thus provide debugging capabilities. BTW, a parsing library and many 
other usages would draw profit of such a "switch", being shorter to 
write and easier to maintain than a union.

Another useful thing is ML-style pattern matching which i have already 
wished. I was thinking about possible implementation, but then i got 
busy with other things. Yesterday i stumbled over a document describing 
*exactly this* - a C++ extention for this feature. I have only looked 
briefly at the document. Maybe their syntax is overbent, but it might be 
worth a look anyway.

http://citeseer.nj.nec.com/leung96cbased.html

-i.

Jan 29 2003

Bill Cox <bill viasic.com> writes:

Hi, Ilya.

Ilya Minkov wrote:
 Hello.
 
 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.
 
 For this i thing a special problem are Unions. A smart union type has to 
 be introduced(switch?), which would keep information on active field, 
 and thus provide debugging capabilities. BTW, a parsing library and many 
 other usages would draw profit of such a "switch", being shorter to 
 write and easier to maintain than a union.

Some of the code gerators we use at work automatically create binary 
load and save functions.  In the early 90's we used them at QuickLogic, 
but we ran into difficulties maintaining binary backwards compatibility 
with our simple binary dumps.  We also found that a simple memory image 
of binary data structures typically takes up more space than a carefully 
designed ASIC format (which takes up more than a carefully designed 
binary format).

As a result, no one has used the binary load/save feature in a decade. 
It sounds cool. I even wrote code in one of the generators to do it.  It 
just hasn't been as usefull as I thought it would be.

Instead of building functions like binary load/save into the language, 
I'd recommend providing the hooks for users to do it with code 
generators.  Even if there's no direct generation capability in the 
language, there are a few things that could make D work better than C++ 
does with code generators.  In particular:

- Having a way to split up class definitions into multiple parts.

For example, an 'extend' keyword in front of a class could mean we're 
adding to an existing class.  This isn't inheritance.  We'd be modifying 
a class directly rather than creating a new one.

- Do the same thing for modules, functions, variables, and class methods.

It's kind of nice for code generator to be able to put a few fields 
here, add a few statements there, and add a couple functions to an 
existing module.  For example, the auto-generated recursive destructors 
we use were hell to write for C.  Every kind of class relationship 
supported had to be considered in big switch statements to generate all 
the different parts of the function.  Really ugly.  When targetting a 
language that supports these after-the-fact extensions, the complexity 
of the code gerator was reduced tremendously.  The same code that adds 
fields to the parent and child classes also adds a few statements to the 
recursive destructor.  It's much nicer.

Extensions like these allow code generators like ClassWizard to simply 
add files to your project, and not need to modify your hand written 
files.  No more parsing the whole language to do a simple generator.  No 
mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.

If you were to go the whole 9 yards, you might also allow a similar 
feature:  not just extensions... replacement!  You could use something 
like a replace keyword in front of your module or class or function or 
method or variable.

With this syntax, you can add little edit files to your projects that 
fix problems in a library you've been handed.

For example, if you run into a performance problem with a third party 
library (like that never happens ;-)) and track it down to the use of a 
singly linked list instead of doubly linked, you type a few lines of 
code in a patch file, and problem solved!  For the next ten years that 
it takes your library vendor to get around to fixing the problem, you 
have a work around that usually works with their new releases.

What do you think?

Bill Cox

Jan 29 2003

Ilya Minkov <midiclub 8ung.at> writes:

Hello. Sorry it took me that long to become aware of this post. :)

Comments embedded.

-i.

Bill Cox wrote:
 Hi, Ilya.
 
 Ilya Minkov wrote:
 
 Hello.

 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.

 For this i thing a special problem are Unions. A smart union type has 
 to be introduced(switch?), which would keep information on active 
 field, and thus provide debugging capabilities. BTW, a parsing library 
 and many other usages would draw profit of such a "switch", being 
 shorter to write and easier to maintain than a union.

 
 
 Some of the code gerators we use at work automatically create binary 
 load and save functions.  In the early 90's we used them at QuickLogic, 
 but we ran into difficulties maintaining binary backwards compatibility 
 with our simple binary dumps.  We also found that a simple memory image 
 of binary data structures typically takes up more space than a carefully 
 designed ASIC format (which takes up more than a carefully designed 
 binary format).

Hm. You have mentioned dynamic properties a while ago. With them, you 
probably wouldn't have such difficulties.
There also has to be some framework, which would allow extending the 
format, even if the serialisation code is written manually. A basic 
support for it would include that a basic class has a (stub) method for 
converting it into the stream of data (.Serialize ?, analogous to 
current ToHash and ToString). You would then implement this method in 
the simplest case with statements like "serstream ~ 
thisproperty.Serialize". This would also imply that .Serialize is 
implemented in the basic types. Analogous about reading.

Languages with dynamic only object methods seem to have this one problem 
less. However, implicit serialisation sequence would also allow to 
interpret some data, which cannot be represented in the object directly 
due to changes.

As to the framework, XML is one example of it. I consideer it though 
appropriate for such things, i would also prefer to have an equivalent 
binary format (with conversion utilities back and forth), since it would 
work faster and take up less space.

BTW, i could make such an XML-like framework... make a function like 
ToXMLData, which would be overloaded for basic types. A user can 
overload it for his own types. And for classes, it should take the 
corresponding method of a class. It should be doable with interfaces. 
Then a way to compose one XMLData of many and to save it all in binary, 
or convert it into real XML.

And i have to consider the Pizza contest. Don't expect much though since 
i'm not the major brain here and i'm only 20, i just started to study 
CS. And since i *never* eat at Pizza Hut, but rather in Restaurant 
Italy, Asado Steak, and some others. I still have over 100 restaurants 
to explore. :)

 As a result, no one has used the binary load/save feature in a decade. 
 It sounds cool. I even wrote code in one of the generators to do it.  It 
 just hasn't been as usefull as I thought it would be.

For static languages binary dumps are much less useful that to dynamic ones.

 Instead of building functions like binary load/save into the language, 
 I'd recommend providing the hooks for users to do it with code 
 generators.  Even if there's no direct generation capability in the 
 language, there are a few things that could make D work better than C++ 
 does with code generators.  In particular:
 
 - Having a way to split up class definitions into multiple parts.
 
 For example, an 'extend' keyword in front of a class could mean we're 
 adding to an existing class.  This isn't inheritance.  We'd be modifying 
 a class directly rather than creating a new one.
 
 - Do the same thing for modules, functions, variables, and class methods.
 
 It's kind of nice for code generator to be able to put a few fields 
 here, add a few statements there, and add a couple functions to an 
 existing module.  For example, the auto-generated recursive destructors 
 we use were hell to write for C.  Every kind of class relationship 
 supported had to be considered in big switch statements to generate all 
 the different parts of the function.  Really ugly.  When targetting a 
 language that supports these after-the-fact extensions, the complexity 
 of the code gerator was reduced tremendously.  The same code that adds 
 fields to the parent and child classes also adds a few statements to the 
 recursive destructor.  It's much nicer.

These are all good ideas. Also consider, that one could possibly have 
very few classes in the application, but very many methods to add to 
them. Then it would make sense to split up the class across multiple 
files for easy navigation and editing. This means however, that all 
these units have to be compiled simultaneously. Dependencies can be 
awful to track.

 Extensions like these allow code generators like ClassWizard to simply 
 add files to your project, and not need to modify your hand written 
 files.  No more parsing the whole language to do a simple generator.  No 
 mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.
 
 If you were to go the whole 9 yards, you might also allow a similar 
 feature:  not just extensions... replacement!  You could use something 
 like a replace keyword in front of your module or class or function or 
 method or variable.

Ouch.

 With this syntax, you can add little edit files to your projects that 
 fix problems in a library you've been handed.
 
 For example, if you run into a performance problem with a third party 
 library (like that never happens ;-)) and track it down to the use of a 
 singly linked list instead of doubly linked, you type a few lines of 
 code in a patch file, and problem solved!  For the next ten years that 
 it takes your library vendor to get around to fixing the problem, you 
 have a work around that usually works with their new releases.

Cool :)

 What do you think?
 
 Bill Cox

Mar 05 2003

Bill Cox <bill viasic.com> writes:

 And i have to consider the Pizza contest. Don't expect much though since 
 i'm not the major brain here and i'm only 20, i just started to study 
 CS. And since i *never* eat at Pizza Hut, but rather in Restaurant 
 Italy, Asado Steak, and some others. I still have over 100 restaurants 
 to explore. :)

You've got a lot of knowledge about computer languages for being only 
20.  Pretty impressive.  I'm 39, just old enough to have actually had a 
job programming in Fortran on a PDP-11/45.

-- Bill

Mar 06 2003

"Walter" <walter digitalmars.com> writes:

"Bill Cox" <bill viasic.com> wrote in message
news:3E6734FB.5060406 viasic.com...
 I'm 39, just old enough to have actually had a
 job programming in Fortran on a PDP-11/45.

Been there, done that <g>.

Mar 08 2003

Bill Cox <Bill_member pathlink.com> writes:

I>> We also found that a simple memory image 
 of binary data structures typically takes up more space than a carefully 
 designed ASIC format (which takes up more than a carefully designed 
 binary format).

Hm. You have mentioned dynamic properties a while ago. With them, 

you 
probably wouldn't have such difficulties.

I thought a simple example might illustrate the trouble I had with binary 
save formats.  Suppose we're saving a directed graph to disk.  It's classes 
look like:

class Node {
LinkedList<Edge> inEdges, outEdges;
bool visited, marked;
char *name;
}

class Edge {
Node fromNode, toNode;
}

Now, let's assume I have a graph that in a text file would be represented 
as:

A B C
B C E
C A D
D A B C
E B D E

The first colum is node names, and the remaining symbols are 
destinations of edges.  This takes 34 bytes.

If we stream binary to the disk, I assume all Edges and Nodes wind up 
there.  Assume the LinkedList class has a head pointer a name, and two 
Booleans that I could pack into 1 byte.  Each Node would take 7 bytes.  
Each Edge has two Node pointers and two next pointers.  They would take 
16 bytes.

On disk, the simple binary dump takes 5*7 + 12*16 = 227 bytes.  That's a 
whole lot worse than 34 bytes.

As for compatibility, suppose we later on convert our LinkedList 
relationships to DoublyLinkedList.  First, the binary size gets worse, while 
the text file doesn't.  Second, we now have to write converters to be able to 
load the old binary files.  We could gain some backward compatibility by 
using an even larger binary format that tags all the fields, but what's the 
point?  Are we trying to be efficient, or just trying to avoid writing a parser?

File size isn't important for most apps.  Look at how large MS Word files 
are.  No one cares.  I work with design files representing .13u chips.  A 
small file for us migh be 100 meg.  Not only does the text version reduce 
the size, but our users demand text so they can hack our data structurs 
with Perl scripts.

Bill

Mar 06 2003

"Walter" <walter digitalmars.com> writes:

"Bill Cox" <Bill_member pathlink.com> wrote in message
news:b47fib$5io$1 digitaldaemon.com...
 File size isn't important for most apps.  Look at how large MS Word files
 are.  No one cares.  I work with design files representing .13u chips.  A
 small file for us migh be 100 meg.  Not only does the text version reduce
 the size, but our users demand text so they can hack our data structurs
 with Perl scripts.

You hit on a big advantage with text files - they can be checked visually
for correctness, and can be editted with ordinary text editors. Binary files
require a custom dumper/editor to be written.

One reason I don't use .doc files is because I need a specific version of
the word processor installed to read them. 20 years from now, who will have
that? (Yes, I have 20 year old files I still use.) With ascii text format,
I'm covered.

Mar 08 2003

Burton Radons <loth users.sourceforge.net> writes:

Ilya Minkov wrote:
 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.

This is in DLI under the pickle.d module.  It transfers a class field 
image, so new and reordered fields don't matter, and handles single 
transferrence of pointers, references, and arrays.  The only 
non-portable part is a dependency on IEEE.

 For this i thing a special problem are Unions. A smart union type has to 
 be introduced(switch?), which would keep information on active field, 
 and thus provide debugging capabilities. BTW, a parsing library and many 
 other usages would draw profit of such a "switch", being shorter to 
 write and easier to maintain than a union.

Unions don't get serialisation.  If you want to save a union, save the 
active state.

Jan 29 2003

Ilya Minkov <midiclub 8ung.at> writes:

Burton Radons wrote:
 Ilya Minkov wrote:
 
 It would be very good to be able to save classes to disk in a safe 
 manner, so that (maybe only public?) fields can be saved and then read 
 in, even if a class has been sublassed or expanded (not too hard, with 
 current memory model), or even if the underlying machine is different 
 (hard). But even saving would probably become much harder if powerful 
 data reordering for arrays of classes is implemented.

 
 
 This is in DLI under the pickle.d module.  It transfers a class field 
 image, so new and reordered fields don't matter, and handles single 
 transferrence of pointers, references, and arrays.  The only 
 non-portable part is a dependency on IEEE.
 

Cool. Thanks.
So it handles endianness.

 
 Unions don't get serialisation.  If you want to save a union, save the 
 active state.
 

OK...
But do you doubt usefulness of a switching union?


Thanks a lot.

-i.

Jan 29 2003

D Programming

C/C++ Programming

Other

D - Advanced features (for future)