|
Archives
D Programming
digitalmars.D
digitalmars.D.bugs
digitalmars.D.dtl
digitalmars.D.ide
digitalmars.D.dwt
digitalmars.D.announce
digitalmars.D.learn
digitalmars.D.debugger
D.gnu
D
C/C++ Programming
c++
c++.announce
c++.atl
c++.beta
c++.chat
c++.command-line
c++.dos
c++.dos.16-bits
c++.dos.32-bits
c++.idde
c++.mfc
c++.rtl
c++.stl
c++.stl.hp
c++.stl.port
c++.stl.sgi
c++.stlsoft
c++.windows
c++.windows.16-bits
c++.windows.32-bits
c++.wxwindows
digitalmars.empire
digitalmars.DMDScript
electronics
|
digitalmars.D - int nan
The following comes partially from a friend of mine. If you are busy you can
skip this post of musings.
From the docs:
http://www.digitalmars.com/d/1.0/faq.html#nan
Because of the way CPUs are designed, there is no NaN value for integers, so D
uses 0 instead. It doesn't have the advantages of error detection that NaN has,
but at least errors resulting from unintended default initializations will be
consistent and therefore more debuggable.<
Seeing how abs(int.min) gives problems, and seeing how CPUs manage nans of FPs
efficiently enough, it can be nice for int.min to become the nan of integers
(and similar for short, long, and maybe tiny too). Such nan may also be useful
for purposes similar to nullable integers of C#.
Bye,
bearophile
== Quote from bearophile (bearophileHUGS lycos.com)'s article
The following comes partially from a friend of mine. If you are busy you can
From the docs:
http://www.digitalmars.com/d/1.0/faq.html#nan
Because of the way CPUs are designed, there is no NaN value for integers, so D
but at least errors resulting from unintended default initializations will be
consistent and therefore more debuggable.<
Seeing how abs(int.min) gives problems, and seeing how CPUs manage nans of FPs
similar for short, long, and maybe tiny too). Such nan may also be useful for
purposes similar to nullable integers of C#.
Bye,
bearophile
This is IMHO (at least at first glance) a reasonable idea in the very long run.
However, it isn't practical here and now for D2, because NaN behavior is
implemented partly in hardware, and mathematically undefined integer operations
throw hardware exceptions instead of returning int.nan on current hardware.
"bearophile" <bearophileHUGS lycos.com> wrote in message
news:h237c9$orl$1 digitalmars.com...
The following comes partially from a friend of mine. If you are busy you
can skip this post of musings.
From the docs:
http://www.digitalmars.com/d/1.0/faq.html#nan
Because of the way CPUs are designed, there is no NaN value for integers,
so D uses 0 instead. It doesn't have the advantages of error detection
that NaN has, but at least errors resulting from unintended default
initializations will be consistent and therefore more debuggable.<
Seeing how abs(int.min) gives problems, and seeing how CPUs manage nans of
FPs efficiently enough, it can be nice for int.min to become the nan of
integers (and similar for short, long, and maybe tiny too). Such nan may
also be useful for purposes similar to nullable integers of C#.
Bye,
bearophile
Interesting idea, but IMO using NaN as a default initializer is just a
crutch for not having a real system of compile-time detecting/preventing of
uninitialized variables from being read (C#'s system for this works very
well in my experience). Ie, Default initing to NaN is certainly better than
default-initing to a commonly-used value, but it still isn't the right
long-term solution.
Barring that "correct" solution though, I do think it would make far more
sense for the default-initializer to be something that isn't so commonly
used as 0. So yea, either int.min, or 0x69696969 or 0xB00BB00B, etc, ie
something that will actually stand out and scream "Hey! Double-check this!
It might not be right!".
Nick Sabalausky:
Ie, Default initing to NaN is certainly better than
default-initing to a commonly-used value, but it still isn't the right
long-term solution.
Having a nan has other purposes beside initialization values. You can represent
missing values, like C# nullable ints (that are bigger in size, 8 bytes, I
think).
So yea, either int.min, or 0x69696969 or 0xB00BB00B, etc, ie
something that will actually stand out and scream "Hey! Double-check this!
It might not be right!".
The good thing of using int.min (and short.min, etc) is that then the numbers
become symmetric, you have a positive number for each negative one, and abs()
works in all cases.
Bye,
bearophile
"bearophile" <bearophileHUGS lycos.com> wrote in message
news:h250ve$1dvr$1 digitalmars.com...
Nick Sabalausky:
Ie, Default initing to NaN is certainly better than
default-initing to a commonly-used value, but it still isn't the right
long-term solution.
Having a nan has other purposes beside initialization values. You can
represent missing values, like C# nullable ints (that are bigger in size,
8 bytes, I think).
Yes, I know. I only said that "default initing to nan" was a sub-optimal
approach, not having nans. But I may have misunderstood you, I thought
default init values was what you were talking about?
So yea, either int.min, or 0x69696969 or 0xB00BB00B, etc, ie
something that will actually stand out and scream "Hey! Double-check
this!
It might not be right!".
The good thing of using int.min (and short.min, etc) is that then the
numbers become symmetric, you have a positive number for each negative
one, and abs() works in all cases.
Good point.
Having a nan has other purposes beside initialization values. You can
represent missing values, like C# nullable ints (that are bigger in size, 8
bytes, I think).
You're saying C# nullable ints require more memory than native ints, but
just how would you represent int.nan with 32 bits?
The correct solution would be to add nullable value types as additional
types. It'd be nice if we could have non-nullable object references at
the same time.
But figuring out and agreeing on a concrete design seems to be too
complicated, and D will never have it. "Stop dreaming."
grauzone:
You're saying C# nullable ints require more memory than native ints, but
just how would you represent int.nan with 32 bits?
Have you read my posts? I have said to use the value that currently is int.min
as null, and I've explained why.
I'll keep dreaming some more years,
bye,
bearophile
Have you read my posts? I have said to use the value that currently is int.min
as null, and I've explained why.
That wasn't very explicit. Anyway, we need int.min for, you know, doing
useful stuff. We can't just define a quite random number to be a special
value. Checking math operations for nullable integers would also be
quite expensive (you had to check both operands before the operation).
If you realize nullable ints by making them a tuple of a native int and
a bool signaling nan, for most operations you only need to or the
nan-bools of both operands, and store it in the result. At least I
imagine that to be better, because you don't need additional jumps in
the generated asm code. And this implementation choice clearly is
superior, because it doesn't restrict the value range of the original
type. There's no int value, that the nullable int type can't represent.
Now there's the space overhead, but if you need performance, you'd
restrict yourself to hardware supported operations anyway.
Although it's pointless to discuss about implementation details of a
feature that will never be implemented, what do you think?
PS: I'd prefer "checked" math operations (as in C#, I think) over
int.nan. Overflows or illegal operations would just trigger exceptions.
grauzone:
That wasn't very explicit. Anyway, we need int.min for, you know, doing
useful stuff.
Like for what? Have you used a Lisp? Their tagged integers show that a smaller
range is fine. And I'm just talking about 1 value in 4 billions, I don't think
you will miss it much. And it's a value that has no symmetric positive.
We can't just define a quite random number to be a special value.<
It's not a random value, is a specific one, and it's an asymmetric extrema too.
Checking math operations for nullable integers would also be
quite expensive (you had to check both operands before the operation).
I was talking about a hardware-managed nan of ints, shorts, longs, tinys.
That's why I have defined the original posts of musings.
Although it's pointless to discuss about implementation details of a
feature that will never be implemented, what do you think?
Inventions sometimes come from dreams too :-)
PS: I'd prefer "checked" math operations (as in C#, I think) over
int.nan. Overflows or illegal operations would just trigger exceptions.
I'll do my best to have them in LDC (LLVM supports them already!), it's
probably the only new feature I'll ask to LDC developers. If necessary I may
even create a personal version of LDC that has this single extra feature.
Bye,
bearophile
I'm sorry but I think it would be an ugly feature.
What would be the NaN of uint ?
What if you actually need 2^32 different values (such as in a linear
congruential random number generator) ?
Besides, there would be no cheap way to ensure NaN propagation (no hardware
support).
Cheers.
ponce:
What would be the NaN of uint ?
Having a NaN in just signed integral values (of 1, 2, 4, 8, 16 bytes) looks
enough to me, see below.
What if you actually need 2^32 different values (such as in a linear
congruential random number generator) ?<
I agree that there are many situations where you want 2^32 different values, or
2^16, etc, in such situations you can use an utiny/ushort/uint/ulong/ucent that
has no nan (and once in while you may even use a nullable uint like in C#).
But I think it's much less common to need 2^32 or 2^64 different signed
integers.
Besides, there would be no cheap way to ensure NaN propagation (no hardware
support).<
I was talking about having hardware support, of course.
Bye,
bearophile
bearophile wrote:
grauzone:
That wasn't very explicit. Anyway, we need int.min for, you know, doing
useful stuff.
Like for what? Have you used a Lisp? Their tagged integers show that a smaller
range is fine. And I'm just talking about 1 value in 4 billions, I don't think
you will miss it much. And it's a value that has no symmetric positive.
It's fine for Lisp because any Lisp I've ever seen auto-upgrades out-of-range
integers to (heap-allocated) bigints.
PS: I'd prefer "checked" math operations (as in C#, I think) over
int.nan. Overflows or illegal operations would just trigger exceptions.
I'll do my best to have them in LDC (LLVM supports them already!), it's
probably the only new feature I'll ask to LDC developers. If necessary I may
even create a personal version of LDC that has this single extra feature.
I'd like to point out you don't need a new built-in type (or changes to a
existing one) to use those LLVM intrinsics with LDC. Just import
ldc.intrinsics,
define a struct MyInt and overload operators on it using
llvm_sadd_with_overflow
and friends.
That doesn't work for external libraries of course, but those should be free to
handle overflow situations and undefined operations however they want without
having to worry about int.nan...
Frits van Bommel:
It's fine for Lisp because any Lisp I've ever seen auto-upgrades out-of-range
integers to (heap-allocated) bigints.
I think it can be fine even if you have just fixnums with that single value
missing from signed integrals.
I'd like to point out you don't need a new built-in type (or changes to a
existing one) to use those LLVM intrinsics with LDC. Just import
ldc.intrinsics,
define a struct MyInt and overload operators on it using
llvm_sadd_with_overflow
and friends.
That doesn't work for external libraries of course, but those should be free
to
handle overflow situations and undefined operations however they want without
having to worry about int.nan...
Probably I have not expressed myself well in this part of my post, because here
I was not taking about a new int type or about int nans.
I was talking about int overflows. I'll explain better in #ldc.
Bye,
bearophile
Hello Nick,
Interesting idea, but IMO using NaN as a default initializer is just a
crutch for not having a real system of compile-time
detecting/preventing of uninitialized variables from being read (C#'s
system for this works very well in my experience).
I think you can prove that it is impossible to do this totally correctly:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?
BCS wrote:
Interesting idea, but IMO using NaN as a default initializer is just a
crutch for not having a real system of compile-time
detecting/preventing of uninitialized variables from being read (C#'s
system for this works very well in my experience).
I think you can prove that it is impossible to do this totally correctly:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?
Complete static analysis of the flow of program control is the holy grail of
compiler construction. It would allow automatic proof of many program
properties (such as initialization). It may not be impossible, but it is
extremely complicated.
If nothing is known about the post-condition of 'foo', the sensible
conclusion would be that 'i' may not be initialized after the loop. If you
know that the return value of 'foo' is always positive under the given
conditions, then you know otherwise.
In the general case, however, you can't guarantee correct static analysis.
This leaves a language/compiler with two options, I believe:
* Do nothing about it. Let the programmer use int.min or set a bool to test
initialization at runtime.
* Add 'uninitialized' to the set of possible states of each type. Every time
a variable is read, assert that it is initialized first. Use the static
analysis techniques that *are* available (a set that will continue to grow)
to eliminate these tests (and the extended state) where possible.
The first method has the advantage of simplicity for the compiler and better
runtime performance in most cases.
The second method has the advantage of automatic detection of subtle bugs
and more simplicity for the programmer.
--
Michiel Helvensteijn
Michiel Helvensteijn Wrote:
BCS wrote:
Interesting idea, but IMO using NaN as a default initializer is just a
crutch for not having a real system of compile-time
detecting/preventing of uninitialized variables from being read (C#'s
system for this works very well in my experience).
I think you can prove that it is impossible to do this totally correctly:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns -5?
Complete static analysis of the flow of program control is the holy grail of
compiler construction. It would allow automatic proof of many program
properties (such as initialization). It may not be impossible, but it is
extremely complicated.
extremely complicated? it's machine haltin' dood.
superdan wrote:
Complete static analysis of the flow of program control is the holy grail
of compiler construction. It would allow automatic proof of many program
properties (such as initialization). It may not be impossible, but it is
extremely complicated.
extremely complicated? it's machine haltin' dood.
Ok, since 'complete static analysis' may include undecidable problems such
as halting, I agree that in the general case, it's impossible.
However, in many practical cases, it may not be. Additionally, the burden of
providing loop invariants and ranking functions (to prove termination)
could be given to the programmer instead of the compiler.
--
Michiel Helvensteijn
Michiel Helvensteijn wrote:
* Add 'uninitialized' to the set of possible states of each type. Every time
a variable is read, assert that it is initialized first. Use the static
analysis techniques that *are* available (a set that will continue to grow)
to eliminate these tests (and the extended state) where possible.
I believe this is what valgrind does by instrumenting each variable at
runtime.
Denis Koroskin wrote:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns
-5?
This code doesn't compile in C# and fails with the following error at
first attempt to use 'i':
error CS0165: Use of unassigned local variable 'i'
Ah, so C# is overly conservative. That's another option, of course.
It has the advantage of always knowing at compile time that you're not
reading an uninitialized value. The disadvantage is that C# will often
throw out the baby with the bath water. The example program may be
perfectly valid if 'foo' always returns positive.
--
Michiel Helvensteijn
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message
news:h25fbk$28mg$1 digitalmars.com...
Denis Koroskin wrote:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns
-5?
This code doesn't compile in C# and fails with the following error at
first attempt to use 'i':
error CS0165: Use of unassigned local variable 'i'
Ah, so C# is overly conservative. That's another option, of course.
It has the advantage of always knowing at compile time that you're not
reading an uninitialized value. The disadvantage is that C# will often
throw out the baby with the bath water. The example program may be
perfectly valid if 'foo' always returns positive.
Yes, this approach is what I was getting at. In fact, I would (and already
have in the past) argue that this is *better* than the "holy grail"
approach, because because it's based on very simple and easy to remember
rules. Conversely, the "holy grail" approach leads to difficult-to-predict
cases of small, seemingly-innocent changes in one place causing some other
code to suddenly switch back and forth between "compiles" and "doesn't
compile". Take this modified version of your example:
------------
// Imagine foo resides in a completely different package
int foo() { return 5; }
int i;
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
------------
Now make a perfectly acceptable-looking change to foo:
------------
int foo() { return 2; }
------------
And all of a sudden non-local code starts flip-flopping between "compiles"
and "doesn't compile".
Additionally, even the "holy grail" approach still has to reduce itself to
being overly conservative in certain cases anyway:
------------
int foo()
{
auto rnd = new RandomGenerator();
rnd.seed(systemClock);
return rnd.fromRange(1,10);
}
------------
So, we only have two initial choices:
- Overly conservative (C#-style or "holy grail")
- Overly permissive (current D approach)
And if we choose "overly conservative", then our next choice is:
- Overly conservative with simple, easy-to-use rules (C#-style)
- Overly conservative with complex rules that have seemingly-random
non-localized effects ("holy grail")
"Nick Sabalausky" <a a.a> wrote in message
news:h2623m$73u$1 digitalmars.com...
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message
news:h25fbk$28mg$1 digitalmars.com...
Denis Koroskin wrote:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns
-5?
This code doesn't compile in C# and fails with the following error at
first attempt to use 'i':
error CS0165: Use of unassigned local variable 'i'
Ah, so C# is overly conservative. That's another option, of course.
It has the advantage of always knowing at compile time that you're not
reading an uninitialized value. The disadvantage is that C# will often
throw out the baby with the bath water. The example program may be
perfectly valid if 'foo' always returns positive.
Yes, this approach is what I was getting at. In fact, I would (and already
have in the past) argue that this is *better* than the "holy grail"
approach, because because it's based on very simple and easy to remember
rules. Conversely, the "holy grail" approach leads to difficult-to-predict
cases of small, seemingly-innocent changes in one place causing some other
code to suddenly switch back and forth between "compiles" and "doesn't
compile". Take this modified version of your example:
------------
// Imagine foo resides in a completely different package
int foo() { return 5; }
int i;
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
------------
Now make a perfectly acceptable-looking change to foo:
------------
int foo() { return 2; }
------------
And all of a sudden non-local code starts flip-flopping between "compiles"
and "doesn't compile".
Additionally, even the "holy grail" approach still has to reduce itself to
being overly conservative in certain cases anyway:
------------
int foo()
{
auto rnd = new RandomGenerator();
rnd.seed(systemClock);
return rnd.fromRange(1,10);
}
------------
So, we only have two initial choices:
- Overly conservative (C#-style or "holy grail")
- Overly permissive (current D approach)
And if we choose "overly conservative", then our next choice is:
- Overly conservative with simple, easy-to-use rules (C#-style)
- Overly conservative with complex rules that have seemingly-random
non-localized effects ("holy grail")
Additionally, in the C# approach (and this is speaking from personal
experience), anytime you do come across a provably-correct case that the
compiler rejects, not only is it always obvious to see why the compiler
rejected it, but it's also trivially easy to fix. So in practice, it's
really not much of a "baby with the bathwater" situation at all.
Nick Sabalausky wrote:
Yes, this approach is what I was getting at. In fact, I would (and
already have in the past) argue that this is *better* than the "holy
grail" approach, because because it's based on very simple and easy to
remember rules. Conversely, the "holy grail" approach leads to
difficult-to-predict cases of small, seemingly-innocent changes in one
place causing some other code to suddenly switch back and forth between
"compiles" and "doesn't compile". Take this modified version of your
example:
------------
// Imagine foo resides in a completely different package
int foo() { return 5; }
int i;
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
------------
Now make a perfectly acceptable-looking change to foo:
------------
int foo() { return 2; }
------------
And all of a sudden non-local code starts flip-flopping between
"compiles" and "doesn't compile".
Better than a flipflop between "runs correctly" and "runs incorrectly",
wouldn't you agree? But of course, you're arguing on the other end of the
spectrum. Read on.
Additionally, even the "holy grail" approach still has to reduce itself
to being overly conservative in certain cases anyway:
------------
int foo()
{
auto rnd = new RandomGenerator();
rnd.seed(systemClock);
return rnd.fromRange(1,10);
}
------------
I wouldn't call the "holy grail" overly conservative in this instance. The
post-condition of 'foo' would simply be (1 <= returnValue <= 10). With no
more information than that, the compiler would have to give an error,
since 'foo' *may return a value* that results in an uninitialized read
of 'i'. That's how it should work. No errors if and only if there is no
possible execution path that results in failure, be it uninitialized-read
failure, null-dereference failure or divide-by-zero failure.
So, we only have two initial choices:
- Overly conservative (C#-style or "holy grail")
- Overly permissive (current D approach)
I tend to agree with BCS that the programmer should have the last say,
unless the compiler can absolutely prove that (s)he is wrong. Given the
choice between overly conservative and overly permissive, I would pick
overly permissive.
But the beauty of the holy grail is that it's neither.
Additionally, in the C# approach (and this is speaking from personal
experience), anytime you do come across a provably-correct case that the
compiler rejects, not only is it always obvious to see why the compiler
rejected it, but it's also trivially easy to fix. So in practice, it's
really not much of a "baby with the bathwater" situation at all.
But what would the fix be in the case of our example? Surely you're not
suggesting initializing 'i' to 0? Then we'd be back in the old situation
where we might get unexpected runtime behavior if we were wrong
about 'foo'.
An acceptable solution would be:
int i;
assert(foo() > 3);
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
Would C# swallow that?
--
Michiel Helvensteijn
Simen Kjaeraas wrote:
But the beauty of the holy grail is that it's neither.
While the ugliness of it is that it's both.
Care to elaborate?
--
Michiel Helvensteijn
Simen Kjaeraas wrote:
But the beauty of the holy grail is that it's neither.
While the ugliness of it is that it's both.
Care to elaborate?
As has already been mentioned, one of the biggest problems with the holy
grail is that it leads to capricious states of "possibly compilable".
There are also bunches of examples in which it will not be able to
deduce if it should compile or not, at least not without breaking
modularity,
The modularity thing is a good point. I assume you're talking about
encapsulation. The designer of a function should make its definition
public. The stuff it requires and the stuff it guarantees. The stuff it
requires can be of the form of a logical precondition. The stuff it
guarantees could be, at the choice of the designer, the function body
itself or a logical postcondition (with access to the initial state of the
function). The postcondition is used if you want to encapsulate the
function implementation.
Remember that the definition should be known to the caller of the function
anyway, or why would he/she call it? Often this is in the form of
documentation, but ideally it would be in an assertion language the
compiler can understand.
and even then, functions called from outside sources (dlls,
SOs, OS functions, compiled libraries, etc) will break the system.
You're right. If nothing is known about them, they must automatically
receive the weakest possible postcondition: true. Pretty much anything can
happen if you call them. However, it's acceptable for either the designers
of those outside functions or other programmers to supply public contracts
for them. The correctness of the code on the calling side would then be
contingent upon the correctness of those contracts. An acceptable
compromise.
This means the system has to be either permissive or conservative when
encountering an problem insoluble to its logic, and this fall-back
mechanism will then work counter-intuitively to its normal working
order, thus giving birth to the system's dualism of both
conservativeness and permissiveness.
Well, it would still be either one or the other. Not both. Or perhaps I
still don't understand your point.
I do find this topic fascinating.
--
Michiel Helvensteijn
Michiel Helvensteijn escribió:
Nick Sabalausky wrote:
Yes, this approach is what I was getting at. In fact, I would (and
already have in the past) argue that this is *better* than the "holy
grail" approach, because because it's based on very simple and easy to
remember rules. Conversely, the "holy grail" approach leads to
difficult-to-predict cases of small, seemingly-innocent changes in one
place causing some other code to suddenly switch back and forth between
"compiles" and "doesn't compile". Take this modified version of your
example:
------------
// Imagine foo resides in a completely different package
int foo() { return 5; }
int i;
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
------------
Now make a perfectly acceptable-looking change to foo:
------------
int foo() { return 2; }
------------
And all of a sudden non-local code starts flip-flopping between
"compiles" and "doesn't compile".
Better than a flipflop between "runs correctly" and "runs incorrectly",
wouldn't you agree? But of course, you're arguing on the other end of the
spectrum. Read on.
Additionally, even the "holy grail" approach still has to reduce itself
to being overly conservative in certain cases anyway:
------------
int foo()
{
auto rnd = new RandomGenerator();
rnd.seed(systemClock);
return rnd.fromRange(1,10);
}
------------
I wouldn't call the "holy grail" overly conservative in this instance. The
post-condition of 'foo' would simply be (1 <= returnValue <= 10). With no
more information than that, the compiler would have to give an error,
since 'foo' *may return a value* that results in an uninitialized read
of 'i'. That's how it should work. No errors if and only if there is no
possible execution path that results in failure, be it uninitialized-read
failure, null-dereference failure or divide-by-zero failure.
So, we only have two initial choices:
- Overly conservative (C#-style or "holy grail")
- Overly permissive (current D approach)
I tend to agree with BCS that the programmer should have the last say,
unless the compiler can absolutely prove that (s)he is wrong. Given the
choice between overly conservative and overly permissive, I would pick
overly permissive.
But the beauty of the holy grail is that it's neither.
Additionally, in the C# approach (and this is speaking from personal
experience), anytime you do come across a provably-correct case that the
compiler rejects, not only is it always obvious to see why the compiler
rejected it, but it's also trivially easy to fix. So in practice, it's
really not much of a "baby with the bathwater" situation at all.
But what would the fix be in the case of our example? Surely you're not
suggesting initializing 'i' to 0? Then we'd be back in the old situation
where we might get unexpected runtime behavior if we were wrong
about 'foo'.
An acceptable solution would be:
int i;
assert(foo() > 3);
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
Would C# swallow that?
Of course not:
int foo() {
return rand() % 10;
}
Ary Borenszweig wrote:
int i;
assert(foo() > 3);
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
Would C# swallow that?
Of course not:
int foo() {
return rand() % 10;
}
My mistake. For some reason I was assuming 'foo' was pure.
int i;
int j = foo();
assert(j > 3);
for(; j > 3; j--) i = j;
auto k = i;
Would C# allow this?
--
Michiel Helvensteijn
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message
news:h2810s$hl1$1 digitalmars.com...
Nick Sabalausky wrote:
Yes, this approach is what I was getting at. In fact, I would (and
already have in the past) argue that this is *better* than the "holy
grail" approach, because because it's based on very simple and easy to
remember rules. Conversely, the "holy grail" approach leads to
difficult-to-predict cases of small, seemingly-innocent changes in one
place causing some other code to suddenly switch back and forth between
"compiles" and "doesn't compile". Take this modified version of your
example:
------------
// Imagine foo resides in a completely different package
int foo() { return 5; }
int i;
for(int j = foo(); j > 3; j--) i = j;
auto k = i; // Compiles at the moment...
------------
Now make a perfectly acceptable-looking change to foo:
------------
int foo() { return 2; }
------------
And all of a sudden non-local code starts flip-flopping between
"compiles" and "doesn't compile".
Better than a flipflop between "runs correctly" and "runs incorrectly",
wouldn't you agree? But of course, you're arguing on the other end of the
spectrum. Read on.
Additionally, even the "holy grail" approach still has to reduce itself
to being overly conservative in certain cases anyway:
------------
int foo()
{
auto rnd = new RandomGenerator();
rnd.seed(systemClock);
return rnd.fromRange(1,10);
}
------------
I wouldn't call the "holy grail" overly conservative in this instance. The
post-condition of 'foo' would simply be (1 <= returnValue <= 10). With no
more information than that, the compiler would have to give an error,
since 'foo' *may return a value* that results in an uninitialized read
of 'i'. That's how it should work. No errors if and only if there is no
possible execution path that results in failure, be it uninitialized-read
failure, null-dereference failure or divide-by-zero failure.
So, we only have two initial choices:
- Overly conservative (C#-style or "holy grail")
- Overly permissive (current D approach)
I tend to agree with BCS that the programmer should have the last say,
unless the compiler can absolutely prove that (s)he is wrong. Given the
choice between overly conservative and overly permissive, I would pick
overly permissive.
But the beauty of the holy grail is that it's neither.
Additionally, in the C# approach (and this is speaking from personal
experience), anytime you do come across a provably-correct case that the
compiler rejects, not only is it always obvious to see why the compiler
rejected it, but it's also trivially easy to fix. So in practice, it's
really not much of a "baby with the bathwater" situation at all.
But what would the fix be in the case of our example? Surely you're not
suggesting initializing 'i' to 0? Then we'd be back in the old situation
where we might get unexpected runtime behavior if we were wrong
about 'foo'.
The fix would be one of the following, depending on what the code is
actually doing:
---------------
// Instead of knee-jerking i to 0, we default init it to
// whatever safe value we want it to be if the loop
// doesn't set it. This, of course, may or may not
// be zero, depending on the code, but regardless,
// there are times when this IS perfectly safe.
int i = contextDependentInitVal;
for(int j = foo(); j > 3; j--) i = j;
auto k = i;
---------------
---------------
int i;
bool isSet = false; // making i nullable would be better
for(int j = foo(); j > 3; j--) {
i = j;
isSet = true;
}
if(isSet) {
auto k = i;
} else { /* handle the problem */ }
---------------
Also, keep in mind that while, under this mechanism, it is certainly
possible for a coder to cause bugs by always knee-jerking the value to zero
whenever the compiler complains, that's also a possibility under the "holy
grail" approach.
"Nick Sabalausky" <a a.a> wrote in message
news:h28gqc$1duk$1 digitalmars.com...
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message
news:h2810s$hl1$1 digitalmars.com...
Additionally, in the C# approach (and this is speaking from personal
experience), anytime you do come across a provably-correct case that the
compiler rejects, not only is it always obvious to see why the compiler
rejected it, but it's also trivially easy to fix. So in practice, it's
really not much of a "baby with the bathwater" situation at all.
But what would the fix be in the case of our example? Surely you're not
suggesting initializing 'i' to 0? Then we'd be back in the old situation
where we might get unexpected runtime behavior if we were wrong
about 'foo'.
The fix would be one of the following, depending on what the code is
actually doing:
---------------
// Instead of knee-jerking i to 0, we default init it to
// whatever safe value we want it to be if the loop
// doesn't set it. This, of course, may or may not
// be zero, depending on the code, but regardless,
// there are times when this IS perfectly safe.
int i = contextDependentInitVal;
for(int j = foo(); j > 3; j--) i = j;
auto k = i;
---------------
---------------
int i;
bool isSet = false; // making i nullable would be better
for(int j = foo(); j > 3; j--) {
i = j;
isSet = true;
}
if(isSet) {
auto k = i;
} else { /* handle the problem */ }
---------------
Also, keep in mind that while, under this mechanism, it is certainly
possible for a coder to cause bugs by always knee-jerking the value to
zero whenever the compiler complains, that's also a possibility under the
"holy grail" approach.
I would also be perfectly ok with this compiling:
---------------
int foo()
out
{
assert(ret >= 5 && ret <= 10);
}
body
{
auto rnd = new RandomGenerator();
rnd.seed(systemClock);
int ret = rnd.fromRange(5,10);
return ret;
}
int i;
for(int j = foo(); j > 3; j--) i = j;
auto k = i;
---------------
Ie, I can agree that the compiler should be able to take advantage of a
function's contract when determining whether or not to throw a "may not get
inited" error, but I strongly disagree that the contract used should be
implicity defined by the actual behavior of the function.
IMO, In the sans-"out" versions of foo, the *only* post-condition contract
is that it returns an int. If foo's creater really does intend for foo's
result to always be within a certain subset of that, no matter what
revisions are eventually made to foo (without actually changing the whole
purpose of foo), then that should be put in a formal post-condition contract
anyway, such as above.
Nick Sabalausky wrote:
Ie, I can agree that the compiler should be able to take advantage of a
function's contract when determining whether or not to throw a "may not
get inited" error, but I strongly disagree that the contract used should
be implicity defined by the actual behavior of the function.
Ah, we are starting to agree. :-)
However, in some cases, a function is so short and/or so simple that it
would be extremely redundant to provide a formal contract. Think about
setters, getters and the like. Functions whose implementations are
extremely unlikely to change.
So while I agree in general that the definition of a function should be its
contract - not its implementation - in simple cases, I would find it
acceptable for the creator of a function to explicitly indicate that it is
defined by its implementation.
--
Michiel Helvensteijn
Nick Sabalausky wrote:
The fix would be one of the following, depending on what the code is
actually doing:
---------------
// Instead of knee-jerking i to 0, we default init it to
// whatever safe value we want it to be if the loop
// doesn't set it. This, of course, may or may not
// be zero, depending on the code, but regardless,
// there are times when this IS perfectly safe.
int i = contextDependentInitVal;
for(int j = foo(); j > 3; j--) i = j;
auto k = i;
---------------
---------------
int i;
bool isSet = false; // making i nullable would be better
for(int j = foo(); j > 3; j--) {
i = j;
isSet = true;
}
if(isSet) {
auto k = i;
} else { /* handle the problem */ }
---------------
Keep in mind that we're talking about a situation in which we're sure 'i'
will always be set. If this is not so, the program is incorrect, and we
would want to see one error or another. Your first solution would be
misleading in that case. Any initial value you choose would be a hack to
silence the compiler. A variation on your second solution then:
int i;
bool isSet = false; // making i nullable would be better
for(int j = foo(); j > 3; j--) {
i = j;
isSet = true;
}
assert(isSet);
auto k = i;
This is the basic solution I would always choose in the absence of the
grail. As you say, ideally, the 'uninitialized' state should be part
of 'i', not a separate variable. Reading 'i' would then automatically
assert its initialization at runtime.
I guess that brings us back to one of those scenario's I mentioned in
another subthread. As compilers become more sophisticated, they will be
able to remove the explicit initialization, the test and the extended state
in more complex situations.
Also, keep in mind that while, under this mechanism, it is certainly
possible for a coder to cause bugs by always knee-jerking the value to
zero whenever the compiler complains, that's also a possibility under the
"holy grail" approach.
That's true. But if we did have the grail, the compiler would also be able
to see that knee-jerking 'i' would not satisfy the contract of the outer
function.
Programmers would learn to say what they mean, not what the compiler wants
to hear.
--
Michiel Helvensteijn
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in message
news:h28i61$1hl3$1 digitalmars.com...
Nick Sabalausky wrote:
The fix would be one of the following, depending on what the code is
actually doing:
---------------
// Instead of knee-jerking i to 0, we default init it to
// whatever safe value we want it to be if the loop
// doesn't set it. This, of course, may or may not
// be zero, depending on the code, but regardless,
// there are times when this IS perfectly safe.
int i = contextDependentInitVal;
for(int j = foo(); j > 3; j--) i = j;
auto k = i;
---------------
---------------
int i;
bool isSet = false; // making i nullable would be better
for(int j = foo(); j > 3; j--) {
i = j;
isSet = true;
}
if(isSet) {
auto k = i;
} else { /* handle the problem */ }
---------------
Keep in mind that we're talking about a situation in which we're sure 'i'
will always be set. If this is not so, the program is incorrect, and we
would want to see one error or another. Your first solution would be
misleading in that case. Any initial value you choose would be a hack to
silence the compiler.
It's a situation where we're *initially* sure 'i' will always be set. But
once we see that error from the compiler, we have to reassess that belief.
There are three possibilities when that happens:
1. It will always be set because of the function's contract. In this case,
we do the formal contract stuff I advocated earlier. And we can certainly
come up with ways to be minimally-verbose with this for trivial cases. So
this case gets eliminated.
2. It will always be set, but *only* because of the function's
implementation. This *should* cause a compiler error, because if it's
allowed by the function's formal contract, then that very fact means that we
*should* assume that this may very well flip-flop anytime that either foo or
anything foo may rely upon is changed.
3. We were, in fact, *mistaken* in thinking that what we were doing would
always leave 'i' inited (this does happen). This causes the programmer to
reassess their approach. Depending on what they're trying to do, the
solution might involve rewriting a loop that's basically fubared already, or
in some cases it may very well be as simple as adding a default init value
(this does happen).
A variation on your second solution then:
int i;
bool isSet = false; // making i nullable would be better
for(int j = foo(); j > 3; j--) {
i = j;
isSet = true;
}
assert(isSet);
auto k = i;
This is the basic solution I would always choose in the absence of the
grail. As you say, ideally, the 'uninitialized' state should be part
of 'i', not a separate variable. Reading 'i' would then automatically
assert its initialization at runtime.
Yea, that works too. It's effectively a sub-case of my "if(isSet) else {/*
handle this somehow*/ }", and more-or-less what I had in mind.
I guess that brings us back to one of those scenario's I mentioned in
another subthread. As compilers become more sophisticated, they will be
able to remove the explicit initialization, the test and the extended
state
in more complex situations.
Agreed, but with the caveat that care should be taken to ensure these new
rules don't allow non-localized flip-flopping when something's[1]
implementation is changed, because then the programmer has to start
remembering and analyzing an increasingly complex set of rules.
[1] Side-trip to spell-check land again: Apparently OpenOffice doesn't think
"something" can be made possessive. (But then again, maybe it technically
can't in super-anal-grammar-police land, not like I would know ;) )
Also, keep in mind that while, under this mechanism, it is certainly
possible for a coder to cause bugs by always knee-jerking the value to
zero whenever the compiler complains, that's also a possibility under the
"holy grail" approach.
That's true. But if we did have the grail, the compiler would also be able
to see that knee-jerking 'i' would not satisfy the contract of the outer
function.
No, it wouldn't, because it would have no way of knowing that's a knee-jerk
fix for a "using uninited var" error. But maybe I misunderstand you?
Nick Sabalausky wrote:
Init, assert, spell-check land, etc.
We're agreed.
Also, keep in mind that while, under this mechanism, it is certainly
possible for a coder to cause bugs by always knee-jerking the value to
zero whenever the compiler complains, that's also a possibility under
the "holy grail" approach.
That's true. But if we did have the grail, the compiler would also be
able to see that knee-jerking 'i' would not satisfy the contract of the
outer function.
No, it wouldn't, because it would have no way of knowing that's a
knee-jerk fix for a "using uninited var" error. But maybe I misunderstand
you?
I mean that if the programmer has provided a postcondition of the outer
function (the function that contains the variable 'i'), a verifying
compiler will be able to give an error if knee-jerking 'i' results in a
subtle bug in the function; one that would invalidate the postcondition.
Of course, if no postcondition is supplied, the compiler can only assume you
meant for exactly that thing to happen. The bug becomes a feature. :-)
Anyway, the verifying compiler is a project I'm working on. I'm designing a
language based on the assumption that compilers will become more and more
sophisticated in the area of static analysis. Contracts are the most
important feature of this language and assertions even have their own
syntax (because they'll be used so much). Where the correctness of a piece
of code cannot be proved at compile-time, a managed runtime environment is
used. This offers the guarantee that the current state will always satisfy
the contract. Assertions cannot be 'caught' and discarded.
Many optimizations may also be based on contracts.
It's a really fun project.
--
Michiel Helvensteijn
Hello Nick,
Also, keep in mind that while, under this mechanism, it is certainly
possible for a coder to cause bugs by always knee-jerking the value to
zero whenever the compiler complains, that's also a possibility under
the "holy grail" approach.
How about letting the user signal that they know what they are doing by using:
int i = void;
Hello Nick,
"Michiel Helvensteijn" <m.helvensteijn.remove gmail.com> wrote in
message news:h25fbk$28mg$1 digitalmars.com...
Ah, so C# is overly conservative. That's another option, of course.
It has the advantage of always knowing at compile time that you're
not reading an uninitialized value. The disadvantage is that C# will
often throw out the baby with the bath water. The example program may
be perfectly valid if 'foo' always returns positive.
already have in the past) argue that this is *better* than the "holy
grail" approach, because because it's based on very simple and easy to
remember rules. Conversely, the "holy grail" approach leads to
difficult-to-predict cases of small, seemingly-innocent changes in one
place causing some other code to suddenly switch back and forth
between "compiles" and "doesn't compile".
Yes, trying to solve the problem for all cases won't work, but I think the
default should be to trust the programer. If you can show for sure with a
trivial set of rules that I use a variable before setting it give me an error.
If not, get the heck out of my way!
On Sat, 27 Jun 2009 17:50:11 +0400, BCS <none anon.com> wrote:
Hello Nick,
Interesting idea, but IMO using NaN as a default initializer is just a
crutch for not having a real system of compile-time
detecting/preventing of uninitialized variables from being read (C#'s
system for this works very well in my experience).
I think you can prove that it is impossible to do this totally correctly:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns
-5?
This code doesn't compile in C# and fails with the following error at
first attempt to use 'i':
error CS0165: Use of unassigned local variable 'i'
Hello Denis,
On Sat, 27 Jun 2009 17:50:11 +0400, BCS <none anon.com> wrote:
Hello Nick,
Interesting idea, but IMO using NaN as a default initializer is just
a crutch for not having a real system of compile-time
detecting/preventing of uninitialized variables from being read
(C#'s system for this works very well in my experience).
correctly:
int i;
for(int j = foo(); j > 0; j--) i = bar(j); // what if foo() returns
-5?
first attempt to use 'i':
error CS0165: Use of unassigned local variable 'i'
And if foo() is never <=0 then the error is valid, but incorrect. I like
the int.nan idea better. Not one unassigned local variable error I have ever
seen has pointed me at a bug.
Michiel Helvensteijn wrote:
But the beauty of the holy grail is that it's neither.
While the ugliness of it is that it's both.
--
Simen
Michiel Helvensteijn wrote:
Simen Kjaeraas wrote:
But the beauty of the holy grail is that it's neither.
While the ugliness of it is that it's both.
Care to elaborate?
As has already been mentioned, one of the biggest problems with the holy
grail is that it leads to capricious states of "possibly compilable".
There are also bunches of examples in which it will not be able to
deduce if it should compile or not, at least not without breaking
modularity, and even then, functions called from outside sources (dlls,
SOs, OS functions, compiled libraries, etc) will break the system.
This means the system has to be either permissive or conservative when
encountering an problem insoluble to its logic, and this fall-back
mechanism will then work counter-intuitively to its normal working
order, thus giving birth to the system's dualism of both
conservativeness and permissiveness.
--
Simen
On Sun, Jun 28, 2009 at 6:02 PM, bearophile<bearophileHUGS lycos.com> wrote:
Besides, there would be no cheap way to ensure NaN propagation (no hardware
support).<
I was talking about having hardware support, of course.
Let me know when x86 gets that.
"Jarrett Billingsley" <jarrett.billingsley gmail.com> wrote in message
news:mailman.315.1246226874.13405.digitalmars-d puremagic.com...
On Sun, Jun 28, 2009 at 6:02 PM, bearophile<bearophileHUGS lycos.com>
wrote:
Besides, there would be no cheap way to ensure NaN propagation (no
hardware support).<
I was talking about having hardware support, of course.
Let me know when x86 gets that.
Geez, it's a hypothetical discussion, for cryin out loud. Not everything has
to be immediately feasable to be worthy of debate.
|
|