www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Void-safety (and related things)

reply bearophile <bearophileHUGS lycos.com> writes:
Found on Lambda the Ultimate blog, Void-safety in Eiffel language, another
attempt at solving this problem:
http://docs.eiffel.com/sites/default/files/void-safe-eiffel.pdf


I think to solve this problem a language like D can use three different
strategies at the same time. Three kinds of object references can be defined:
1) the default one (its syntax is the shorter one, they are defined using the
like current ones) is the "non nullable object reference". Many objects in a
program are like this. The type system assures the code to be correct, you
don't need to test such references for null. As in C# the compiler keeps eyes
open to avoid the usage of uninitialized references of such kind. (this is a
situation where "good" is better than "perfect". C# seems to work well enough
in its ability to spot uninitialized variables).
2) The second kind is the current one, "unsafe nullabile object reference",
it's faster, its syntax is a bit longer, to be used only where max performance
is necessary.
3) The third kind is the "safe nullabile object reference". You can define it
like using the syntax "Foo? f;". It's a "fat" reference, so beside the pointer
this reference contains an integer number that represents the class. If your
program has 500 classes, you need 500 different values for it. On the other
hand usually in a program a specific reference can't be of 500 different
classes, so the maximum number can be decreased, and you can keep at runtime
sono conversion tables that convert some subsets of such numbers into a full
pointer to class info. Such tables are a bit slow to use (but they don't need
too much memory), but the program uses them only when a reference (of the third
kind) is null, so it's not a bit problem. On 64-bit systems such numeric tag
can be put into the most significant bits of the pointer itself (so when such
pointer isn't null you just need a test and a mask, the shift is required only
in the uncommon case of null). This also means that the max number of possible
class instances decreases, but not so much (you can have some conversion tables
to reduce such such numeric tag to 2-5 bits in most programs). When the code
uses a method of a null reference of such kind the program may call the correct
method of a "default" instance of that class (or even a user-specified
instance).

Do you like? :-)

Bye,
bearophile
Aug 11 2009
parent reply Jason House <jason.james.house gmail.com> writes:
I've recently convinced myself that nullability should be the exception instead
of the norm. So much of the code I write in C#/D uses reference objects
assuming they're non-null. Only in certain special cases do I handle null
explicitly. The issue is that if any special case is missed/mishandled, it can
spread to other code.

I'm also too lazy to write non-null contracts in D. They also have far less
value since violations are not caught at compile time (or better yet, in my IDE
as I write code).

It may be as simple as having the following 3 types:
T // non-nullable
T? // nullable, safe
T* //  nullable, unsafe

I'd also like to remove all default initialization in favor of use of
uninitialized variable errors. Default initialization in D is cute, but it is
not a solution for programmer oversight. Single-threaded code will reproducibly
do the wrong thing, but may be harder to notice in the first place. The very
fact that the signalling nan change has made it into D shows that people want
this type of behavior!


bearophile Wrote:

 Found on Lambda the Ultimate blog, Void-safety in Eiffel language, another
attempt at solving this problem:
 http://docs.eiffel.com/sites/default/files/void-safe-eiffel.pdf
 
 
 I think to solve this problem a language like D can use three different
strategies at the same time. Three kinds of object references can be defined:
 1) the default one (its syntax is the shorter one, they are defined using the
like current ones) is the "non nullable object reference". Many objects in a
program are like this. The type system assures the code to be correct, you
don't need to test such references for null. As in C# the compiler keeps eyes
open to avoid the usage of uninitialized references of such kind. (this is a
situation where "good" is better than "perfect". C# seems to work well enough
in its ability to spot uninitialized variables).
 2) The second kind is the current one, "unsafe nullabile object reference",
it's faster, its syntax is a bit longer, to be used only where max performance
is necessary.
 3) The third kind is the "safe nullabile object reference". You can define it
like using the syntax "Foo? f;". It's a "fat" reference, so beside the pointer
this reference contains an integer number that represents the class. If your
program has 500 classes, you need 500 different values for it. On the other
hand usually in a program a specific reference can't be of 500 different
classes, so the maximum number can be decreased, and you can keep at runtime
sono conversion tables that convert some subsets of such numbers into a full
pointer to class info. Such tables are a bit slow to use (but they don't need
too much memory), but the program uses them only when a reference (of the third
kind) is null, so it's not a bit problem. On 64-bit systems such numeric tag
can be put into the most significant bits of the pointer itself (so when such
pointer isn't null you just need a test and a mask, the shift is required only
in the uncommon case of null). This also means that the max number of possible
class instances decreases, but not so much (you can have some conversion tables
to reduce such such numeric tag to 2-5 bits in most programs). When the code
uses a method of a null reference of such kind the program may call the correct
method of a "default" instance of that class (or even a user-specified
instance).
 
 Do you like? :-)
 
 Bye,
 bearophile

Aug 11 2009
next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Jason House wrote:
 I've recently convinced myself that nullability should be the exception
instead of the norm. So much of the code I write in C#/D uses reference objects
assuming they're non-null. Only in certain special cases do I handle null
explicitly. The issue is that if any special case is missed/mishandled, it can
spread to other code.
 
 I'm also too lazy to write non-null contracts in D. They also have far less
value since violations are not caught at compile time (or better yet, in my IDE
as I write code).
 
 It may be as simple as having the following 3 types:
 T // non-nullable
 T? // nullable, safe
 T* //  nullable, unsafe
 
 I'd also like to remove all default initialization in favor of use of
uninitialized variable errors. Default initialization in D is cute, but it is
not a solution for programmer oversight. Single-threaded code will reproducibly
do the wrong thing, but may be harder to notice in the first place. The very
fact that the signalling nan change has made it into D shows that people want
this type of behavior!

Yes. Default initialization is really week against uninitialized variables errors. You notice the errors of the first one at runtime, and the errors of the second one at compile-time. But I don't see that changing anytime soon... (I think it's because "it gets hard").
Aug 11 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Ary Borenszweig:
(I think it's because "it gets hard").<

You can't ask a single person to be able to do everything. Are you able to implement that thing? Probably I am not able. If someone here is able and willing to do it then I suggest such person to ask Walter permission to implement it. Bye, bearophile
Aug 11 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
bearophile wrote:

 You can't ask a single person to be able to do everything. Are you able to
 implement that thing? Probably I am not able. If someone here is able and
 willing to do it then I suggest such person to ask Walter permission to
 implement it.

I doubt it's the direction D wants to go. Because proving correctness at compile-time requires the holy grail, and testing correctness at runtime requires extra space for each variable and extra time for each access. -- Michiel Helvensteijn
Aug 11 2009
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Michiel Helvensteijn wrote:
 bearophile wrote:
 
 You can't ask a single person to be able to do everything. Are you able to
 implement that thing? Probably I am not able. If someone here is able and
 willing to do it then I suggest such person to ask Walter permission to
 implement it.

I doubt it's the direction D wants to go. Because proving correctness at compile-time requires the holy grail, and testing correctness at runtime requires extra space for each variable and extra time for each access.

What do you mean by "holy grail"?
Aug 11 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Ary Borenszweig wrote:

 I doubt it's the direction D wants to go. Because proving correctness at
 compile-time requires the holy grail, and testing correctness at runtime
 requires extra space for each variable and extra time for each access.

What do you mean by "holy grail"?

You missed that discussion, did you? Basically, if you want to know at compile-time whether a variable is initialized, there are several possibilities: * Be overly conservative: Make sure every possible computational path has an assignment to the variable, otherwise give an error. This would throw out the baby with the bathwater. Many valid programs would cause an error. * Actually analyze the control flow: Make sure that exactly all reachable states have the variable initialized, otherwise give an error. Dubbed "holy grail", because this sort of analysis is still some time off, and would allow some very cool correctness verification. -- Michiel Helvensteijn
Aug 11 2009
parent reply "Joel C. Salomon" <joelcsalomon gmail.com> writes:
Michiel Helvensteijn wrote:
 I doubt it's the direction D wants to go. Because proving correctness at
 compile-time requires the holy grail, and testing correctness at runtime
 requires extra space for each variable and extra time for each access.


Basically, if you want to know at compile-time whether a variable is initialized, there are several possibilities: * Be overly conservative: Make sure every possible computational path has an assignment to the variable, otherwise give an error. This would throw out the baby with the bathwater. Many valid programs would cause an error. * Actually analyze the control flow: Make sure that exactly all reachable states have the variable initialized, otherwise give an error. Dubbed "holy grail", because this sort of analysis is still some time off, and would allow some very cool correctness verification.

Third (stop-gap) option: • Be conservative, but trust the programmer: Allow some sort of pragma to tell the compiler that the programmer has done the flow analysis and the variable really is set (or non-null, or…). It will be an unchecked error to lie to the compiler--until the holy grail is implemented, when it will become a checked error. This is a feature of the Plan 9 C compilers (cf. “The compile-time environment” in <http://plan9.bell-labs.com/sys/doc/comp.html>). “If you lie to the compiler, it will get its revenge.” —Henry Spencer —Joel Salomon
Aug 21 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Joel C. Salomon:

http://plan9.bell-labs.com/sys/doc/comp.html<

Thank you for that link. I can see some interesting things in that very C-like language:
The #if directive was omitted because it greatly complicates the preprocessor,
is never necessary, and is usually abused. Conditional compilation in general
makes code hard to understand; the Plan 9 source uses it sparingly. Also,
because the compilers remove dead code, regular if statements with constant
conditions are more readable equivalents to many #ifs.<

Can the "static if" be removed from D then? ------------------ Variables inside functions can have any order, are D compilers too doing this?
Unlike its counterpart on other systems, the Plan 9 loader rearranges data to
optimize access. This means the order of variables in the loaded program is
unrelated to its order in the source. Most programs dont care, but some assume
that, for example, the variables declared by

int b; will appear at adjacent addresses in memory. On Plan 9, they wont.< ------------------ Plan 9 uses this strategy to solve endianess-induced troubles in integer I/O:
Plan 9 is a heterogeneous environment, so programs must expect that external
files will be written by programs on machines of different architectures. The
compilers, for instance, must handle without confusion object files written by
other machines. The traditional approach to this problem is to pepper the
source with #ifdefs to turn byte-swapping on and off. Plan 9 takes a different
approach: of the handful of machine-dependent #ifdefs in all the source, almost
all are deep in the libraries. Instead programs read and write files in a
defined format, either (for low volume applications) as formatted text, or (for
high volume applications) as binary in a known byte order. If the external data
were written with the most significant byte first, the following code reads a
4-byte integer correctly regardless of the architecture of the executing
machine (assuming an unsigned long holds 4 bytes):

ulong getlong(void) { ulong l; l = (getchar()&0xFF)<<24; l |= (getchar()&0xFF)<<16; l |= (getchar()&0xFF)<<8; l |= (getchar()&0xFF)<<0; return l; } Note that this code does not swap the bytes; instead it just reads them in the correct order. Variations of this code will handle any binary format and also avoid problems involving how structures are padded, how words are aligned, and other impediments to portability. Be aware, though, that extra care is needed to handle floating point data.< ------------------ I don't fully understand this:
the declaration

(this appearance of the register keyword is not ignored) allocates a global register to hold the variable reg. External registers must be used carefully: they need to be declared in all source files and libraries in the program to guarantee the register is not allocated temporarily for other purposes. Especially on machines with few registers, such as the i386, it is easy to link accidentally with code that has already usurped the global registers and there is no diagnostic when this happens. Used wisely, though, external registers are powerful. The Plan 9 operating system uses them to access per-process and per-machine data structures on a multiprocessor. The storage class they provide is hard to create in other ways.< Bye, bearophile
Aug 21 2009
parent "Joel C. Salomon" <joelcsalomon gmail.com> writes:
bearophile wrote, re. <http://plan9.bell-labs.com/sys/doc/comp.html>:
 I can see some interesting things in that very C-like language:
 
 The #if directive was omitted because it greatly complicates the preprocessor,
is never necessary, and is usually abused. Conditional compilation in general
makes code hard to understand; the Plan 9 source uses it sparingly. Also,
because the compilers remove dead code, regular if statements with constant
conditions are more readable equivalents to many #ifs.

Can the "static if" be removed from D then?

D uses "static if" for things other than versioning. But this attitude is relevant when considering “enhancements” to D’s version(foo).
 I don't fully understand this:
 
 the declaration
     extern register reg;
 (this appearance of the register keyword is not ignored) allocates a global
register to hold the variable reg. External registers must be used carefully:
they need to be declared in all source files and libraries in the program to
guarantee the register is not allocated temporarily for other purposes.
Especially on machines with few registers, such as the i386, it is easy to link
accidentally with code that has already usurped the global registers and there
is no diagnostic when this happens. Used wisely, though, external registers are
powerful. The Plan 9 operating system uses them to access per-process and
per-machine data structures on a multiprocessor. The storage class they provide
is hard to create in other ways.


Generally, the Plan 9 C compilers ignore the "register" keyword, preferring to handle this sort of optimization themselves. The "extern register" declaration is not for optimization, but to allocate a register as a global variable. This register will never be used by the compiler as a temporary, or to pass arguments, or whatever compilers use registers for; it has been completely given over for the programmer’s use. Apparently, this was helpful in writing the Plan 9 kernel. —Joel Salomon
Aug 21 2009
prev sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Fri, Aug 21, 2009 at 12:58 PM, bearophile<bearophileHUGS lycos.com> wrot=
e:
 Joel C. Salomon:

http://plan9.bell-labs.com/sys/doc/comp.html<

Thank you for that link. I can see some interesting things in that very C-like language:
The #if directive was omitted because it greatly complicates the preproce=


general makes code hard to understand; the Plan 9 source uses it sparingly= . Also, because the compilers remove dead code, regular if statements with = constant conditions are more readable equivalents to many #ifs.<
 Can the "static if" be removed from D then?

No. Not only can 'static if' appear where 'if' can't (like at module scope and inside templates), it also does not create a scope unlike a normal 'if'. They're similar, but different enough to warrant being different constructs.
 Variables inside functions can have any order, are D compilers too doing =

None currently do, but I think it's allowed by the D spec. Please don't go beg the LDC developers for this as soon as you read this. They really do have better things to do.
Aug 21 2009