www.digitalmars.com         C & C++   DMDScript  

D - D symbol table extension proposal

reply whitis freelabs.com writes:
Hi, I was a user of Zortech C++ v1.x and v2.x; I just read the complete D
language specification and the language looks very interesting.

D has a weakness found in most other programming languages; the symbol
table is thrown away after compilation.

My suspicion is that adding this would add only minor complexity to
the compiler.  It is not the kind of language feature that causes
serious scaling issues.   And there would be many benifits.

I have already written a library that tries to shoe horn symbol
tables into the C language but there is considerable redundancy
in coding since you don't have access to the compilers symbol tables.
Even so, it is of considerable benifit.  If it were built into the
language, it would be easier to use and have additional documentation
benifits.

To add compiler support:
- Create a runtime class (even if it doesn't actually do anything yet)

- Add one pointer to each compiler symbol table entry which can be
be used to add extra detail in another structure.

- add keyword symtab_of()
when you use this keyword, it causes the compiler symtab format to
be copied and converted into the runtime symtab format and an object
of that type to be put into static storage and a pointer returned
to that object.

- If the compiler is sneaky, it will read the precompiled run time
class file and use that to copy members by name allowing the class
to be completely reimplemented without changing the compiler.

- add the extra_info keyword.  When this is found, everything
enclosed in the following { } will be parsed as if it was an
initializer for a symtab_t class and the pointer to that class
will be stored in the extra pointer in the compiler symbol table.

At some point, the info in the compiler symbol table is copied
into the symtab_t class add on before the symbtab_t object
is made a part of the program.

- The compiler could even be extended to make the default .print
and .format methods for classes which do not specify them
automatically call the symbol table routines.

- symbol table objects which have extra_info but never used by
symtab_of() can be discarded by linker.

One of the cool aspects of this is that a lot of documentation now becomes
structured as part of the symbol table extras instead of random comments
and thus can be parsed by utilities such as code annotation programs (
doxygen, etc.) and debuggers.

Symbol tables often work a lot better than other object oriented
paradigms which are much harder to implement.


The C version is at
http://www.freelabs.com/~whitis/software/symbol/
If you have access to a copy of the first edition of _Linux Programming
Unleashed_, I wrote a chapter which gives a tutorial on using the
symbol table package.

Symbol tables can be used in a number of ways:
- Parsing command line parameters
- Reading configuration files
rfc822 (name: value), name=value, XML, windoze style
- writing configuration files
- reading data files
- writing data files
- web forms
- GUI forms (preferences, etc).
- remote proceedure call protocols, network transactions, etc.
- dumping data structures for debugging

- external utilities also could benefit from the additional info
about each object.
- code browsers: doxygen, kernel browser, etc.
- debuggers
----------------------begin sample code----------------------------
import symtab;

int debug_level extra_info {
external_name: "debug";
description: "Debugging Level (0=none, 1=some, ... 9=lots)"
};

char[] save_options_filename extra_info {
description: "If set, options will be written to file";
} = "";

int show_options extra_info {
external_name: "show";
}

struct foo_t {
real x extra_info {
description="X Coordinate";
}
real y extra_info {
description="Y Coordinate";
}
real z extra_info {
description="Z Coordinate";
}
} extra_info {
// standard keywords
uxternal_name: "foo";
description: "A 3D Coordinate";
xml_is_tag, true;
extra_pairs {
{"html_css_class", "coordinate"}
}

symtab_protocol_send_message
}

symtab_t foo_st = symtab_of(foo_t);
foo_t center;
foo_t viewpoint;

smart_pointer_t center_sp = smart_pointer(&foo_st, &center);
// this symbol table lists variables accessable on the command
// line and in the config file
symtab_t[] parameters_st = {
symtab_of(show_options),
symtab_of(save_options_filename),
symtab_of(center),
symtab_of(viewpoint),
symtab_of(debug_level),
}

// This symbol table lists variables sent as part of a protocol
// remote procedure call message
symtab_t[] protocol_st = {
symtab_of(center),
symtab_of(viewpoint)
}

// this symbol table lists variables received in response to
// protocol remote proceedure call message.

struct results_t {
enum {
STATUS_OK               extra_info { external_name: "OK" },
STATUS_PERMANENT_ERROR  extra_info { external_name, "ERROR" },
STATUS_TEMPORARY_ERROR, extra_info { external_name, "TRYAGAIN" },
STATUS_WARNING          extra_info { external_name, "WARNING" }
} status;
int lineno;
char[] error_line;
char[] error_text;
}

symtab_t results_t_st = symtab_of(results_t);
results_t results;

// we don't define a symbol table for results (which would include the
address
// of the struct) so we can illustrate how smart pointers allow the
address
// and symbol table data to be combined later.  This would be more useful
// if we had many variables of type results_t.

main()
{
parse_cmd_options(parameters_st, args);
// --viewpoint.x=1.0 --viewpoint.y=2.0 --viewpoint.z=3.0 --debug=2
// --viewpoint={1.0,2.0,3.0} --debug=2
// --viewpoint={x=1.0, y=2.0, z=2.0} --debug=2

symtab_read_config_options_t rc_options =
symtab_read_file_options_t_defaults;
rc_options.format=READ_CONFIG_FORMAT_XML;
symtab_write_config_options_t wc_options =
symtab_write_file_options_t_defaults;
wc_options.format=WRITE_CONFIG_FORMAT_XML;

parameters_st.read_file(, "~/.myprog", rc_options);

if(show_options) {
parameters_st.write_file("-", options);
}

if(save_options_filename.length > 0) {
char[] junk;
junk = save_options_filename;
save_options_filename="";

parameters_st.write_file(save_options_file, options);
}   

if(debug_level>0) {
protocol_st.write_file(stderr, options);
}

protocol_send_message(
smart_pointer(&protocol_st, null),
smart_pointer(&results_t_st, &results),
}

if(debug_level>0) {
smart_pointer(&results_t_st, &results)..write_file(stderr, options);
}
}

----------------------end sample code----------------------------

To make things a little cleaner, symtab_of() should probably return
a smart_pointer.


When I wrote the original symbol table routines in C, I discovered that
the compiler would not let you initialize arbitrary byte streams
containing data types of mixed sizes.  I had to force everything to 4
bytes.  This would be a useful feature in the D language:

byte_stream[] symbol_table = {
(token) ST_BEGIN,
(token) ST_BEGIN,
(token) ST_IDENTIFER,   (char[]) "x"
(token) ST_TYPE,        (symtab_t *) &int32u_st,
(token) ST_AT,          (far void*) &x,
(token) ST_MIN,         (long) MAX_ULONG,
(token) ST_MAX,         (long) MIN_ULONG,
(token) ST_END,
(token) ST_BEGIN,
(token) ST_IDENTIFER,   (char[]) "y"
(token) ST_TYPE,        (symtab_t *) &int32u_st,
(token) ST_AT,          (far void*) &y,
(token) ST_MIN,         (long) MAX_ULONG,
(token) ST_MAX,         (long) MIN_ULONG,
(token) ST_END,
(token) ST_END, 
}



When I wrote the original symbol table routines in C, I discovered that
the compiler would not let you initialize arbitrary byte streams
containing data types of mixed sizes.  I had to force everything to 4
bytes.  This would be a useful feature in the D language:

byte_stream[] symbol_table = {
(token) ST_BEGIN,
(token) ST_BEGIN,
(token) ST_IDENTIFER,   (char[]) "x"
(token) ST_TYPE,        (symtab_t *) &int32u_st,
(token) ST_AT,          (far void*) &x,
(token) ST_MIN,         (long) MAX_ULONG,
(token) ST_MAX,         (long) MIN_ULONG,
(token) ST_END,
(token) ST_BEGIN,
(token) ST_IDENTIFER,   (char[]) "y"
(token) ST_TYPE,        (symtab_t *) &int32u_st,
(token) ST_AT,          (far void*) &y,
(token) ST_MIN,         (long) MAX_ULONG,
(token) ST_MAX,         (long) MIN_ULONG,
(token) ST_END,
(token) ST_END, 
}


--
Mark Whitis   http://www.freelabs.com/~whitis/       NO SPAM
Author of many open source software packages. 
Coauthor: Linux Programming Unleashed (1st Edition)
Apr 20 2004
parent Dave Sieber <dsieber spamnot.sbcglobal.net> writes:
whitis freelabs.com wrote:

 - dumping data structures for debugging
I could automate dumping of huge numbers of structs in a recent project. And this was not for debugging, it was for data comparisons across large numbers of files. Maintaining it all by hand was incredibly tedious and really brought home to me the fact that our languages are not as helpful as they could be -- especially because the compiler HAS the information! Even a simple API to access the symbol table/debug info would be helpful. Why insist that only an external application can access information which is right there and readily available? -- dave
Apr 20 2004